Thursday, 26 May 2016

How to run Spark Job server and spark jobs

Spark Job server provides a RESTful interface for submission and management of Spark jobs, jars and job contexts. It facilitates sharing of jobs and RDD data in a single context. It can run standalone job as well. Job History and configuration is persisted.

Few of the features are listed here:
    ·       Simple REST interface
    ·       Separate JVM per SparkContext for isolation
    ·       Separate Jar uploading step for faster job execution
    ·       Supports low-latency jobs via long running job contexts
    ·       Asynchronous and synchronous job API.
    ·       Kill running job via stop context and delete job.
    ·     Named Objects (RDD/Dataframes) to cache and retrieve by name, improving object sharing and reuse among jobs.
    ·      Preliminary support for Java.

Setup Spark Job Server:

Sbt Version
Spark version

To setup the server, pre-requisites are:

               ·      64 bit Operating system
        ·      Java 8
        ·      sbt
        ·      curl
        ·      git
        ·      Spark

Please make sure sbt version should be compatible to spark version.  Here is the list of compatible versions.

You can install Java8 from here.

For sbt, you can refer sbt official site .

  For CentOS users:
Yum install curl
Yum install git
Yum install sbt

   For Ubuntu users:
sudo apt-get install curl
sudo apt-get install git
sudo apt-get install sbt

Download the Spark package and setup it. Windows User can refer the link How to setup Spark on windows. Once spark setup is done, run the spark master and worker daemon.
[xuser@machine123 spark-1.6.1-bin-hadoop2.6]$ sbin/

Now clone the spark job server repo on your local.

[xuser@machine123 ~]$ git clone

Run sbt command in the cloned repo. It will build the project and give the sbt shell. If you are running sbt command first time,it will take much time. Then type re-start  to start the server on sbt shell:

[xuser@machine123 spark-jobserver]$ sbt
[info] Loading project definition from /home/xuser/softwares/spark-jobserver/project
Missing bintray credentials /home/xuser/.bintray/.credentials. Some bintray features depend on this.
Missing bintray credentials /home/xuser/.bintray/.credentials. Some bintray features depend on this.
Missing bintray credentials /home/xuser/.bintray/.credentials. Some bintray features depend on this.
Missing bintray credentials /home/xuser/.bintray/.credentials. Some bintray features depend on this.
[info] Set current project to root (in build file:/home/xuser/spark-jobserver/)
> re-start

If you want to use any specific configuration to start the server, You can also specify JVM parameters after "---". Including all the options looks like this:
 > re-start config/application.conf  --- -Xmx512m

It will start the spark job server on http://localhost:8090 url. You can see all daemons using jps.

Sample SparkJobs Walkthrough:

Spark-job-server has some sample Spark jobs written in Scala.To package the test jar, run command. 
[xuser@machine123 spark-jobserver]$ sbt job-server-tests/package

It will give you a jar in job-server-tests/target/scala-2.10 directory.Now upload the jar to the server:

[xuser@machine123 spark-jobserver]$ curl --data-binary @job-server-tests/target/scala-2.10/job-server-tests_2.10-0.7.0-SNAPSHOT.jar localhost:8090/jars/test

This jar is uploaded as app test. You can view same information on webUI.
We can run the jobs in two mode: Transient Context mode, Persistent Content mode.

Unrelated jobs -with Transient Context:

In this mode, each job will create its own spark context.  Let's submit the WordCount job on the server:
[xuser@machine123 ~]$ curl -d "input.string = a b c a b see" 'localhost:8090/jobs?appName=test&classPath=spark.jobserver.WordCountExample'
  "status": "STARTED",
  "result": {
    "jobId": "5453779a-f004-45fc-a11d-a39dae0f9bf4",
    "context": "b7ea0eb5-spark.jobserver.WordCountExample"

Persistent Context mode- Related Jobs:

In this mode, jobs can use the existing Spark context. Create a spark context named ‘test-context’:
[xuser@machine123 ~]$ curl -d "" 'localhost:8090/contexts/test-context?num-cpu-cores=4&memory-per-node=512m'

To see the existing contexts:
[xuser@machine123 ~]$ curl localhost:8090/contexts

To run the job in existing context:
[xuser@machine123 ~]$ curl -d "input.string = a b c a b see" 'localhost:8090/jobs?appName=test&classPath=spark.jobserver.WordCountExample&context=test-context&sync=true'
  "result": {
    "a": 2,
    "b": 2,
    "c": 1,
    "see": 1

You can run the job without any input argument passing -d "":

[xuser@machine123 ~]$ curl -d "" 'localhost:8090/jobs?appName=test&classPath=spark.jobserver.LongPiJob&context=test-context&sync=true'
  "result": 3.1403460207612457

You can check the job status by giving job ID in following command:

[xuser@machine123 ~]$ curl localhost:8090/jobs/<jobID>

You can see the all the running, completed, failed jobs on Job Server UI.  Now you are ready to write your jobs to run of SparkJobServer..!!!


  1. Wanna make a career in Hotel jobs?
    Logon to our web portal dedicated to all the #jobSeekers.

    1. Hello Nishu,

      Just the information I was looking for. Thanks for the detailed instructions. I haven’t used it yet but I guess the time has come.
      I have a question about UiPath.
      Is UiPath the Server RPA?
      If I want to use the Robot multiple computers.
      So, can I use these workflows by Orchestrator?
      If you have any ideas, please tell me how these workflows do on some computers.
      I want to know about the server RPA.

      Anyways great write up, your efforts are much appreciated.

      Thank you,

  2. [Windows]I got this error while i was trying to start the jobserver(re-start command). to fix that run "job-server/reStart" to run the jobserver

    [info] Starting application job-server-python in the background ...
    job-server-python Starting spark.jobserver.JobServer.main()
    job-server-python[ERROR] Error: Could not find or load main class spark.jobserve
    job-server-python[ERROR] Java HotSpot(TM) 64-Bit Server VM warning: ignoring opt
    ion MaxPermSize=256m; support was removed in 8.0
    job-server-python ... finished with exit code 1

  3. This comment has been removed by the author.

  4. Thanks for the detailed tutorial!
    I keep getting an Error "Jar is not of the right format".

    ~/programme/spark-jobserver/job-server-tests/target$ curl --data-binary @job-server-tests/target/TestSparkJar.jar localhost:8090/jars/test
    Warning: Couldn't read data from file
    Warning: "job-server-tests/target/TestSparkJar.jar", this makes an empty POST.

    Warning: {
    "status": "ERROR",
    "result": "Jar is not of the right format"

    Any idea what possibly could be wrong?

  5. You have done a great job on this article. It’s very readable and highly intelligent. You have even managed to make it understandable and easy to read. You have some real writing talent. Thank you.
    best erp for small business 

  6. Hi Everyone, try 'job-server/re-start' instead of 're-start' if it is throwing errors like Multiple main classes detected, select one to run:

    [1] spark.jobserver.LongPiJob
    [2] spark.jobserver.NoOpJob
    [3] spark.jobserver.VeryShortDoubleJob
    [4] spark.jobserver.WordCountExample
    [info] Resolving org.fusesource.jansi#jansi;1.4 ...
    [info] Done updating.


  7. I appreciate your work on Data Science. It's such a wonderful read on Data Science. Keep sharing stuffs like this. I am also educating people on similar technologies so if you are interested to know more you can watch this:-

  8. Thanks for the Blog,
    Hotel jobs at your finger tips. This hotelierjobz provides you thousands of hotel jobs, chef jobs, hospitality jobs in different places in world like Asia , Dubai many more
    Address: 8th block, Janis Alpine meadows, Tiruneermalai Road, Tiruneermalai ,Chennai