Friday, 3 June 2016

How to write Spark jobs in Java for Spark Job Server

In the previous post, we learnt about setting up Spark job server, and running the spark jobs. So far, we have used Scala programs to run on job server. Now we’ll see, how to write the Spark jobs in java to run on job server.

As in Scala, job must implement the SparkJob trait.  So the job looks like this:

object SampleJob  extends SparkJob {
    override def runJob(sc:SparkContext, jobConfig: Config): Any = ???
    override def validate(sc:SparkContext, config: Config): SparkJobValidation = ???

What these methods are:
  • runJob method contains the implementation of the Job. The SparkContext is managed by the JobServer and will be provided to the job through this method. This relieves the developer from the boiler-plate configuration management that comes with the creation of a Spark job and allows the Job Server to manage and re-use contexts.
  • validate method allows for an initial validation of the context and any provided configuration. If the context and configuration are OK to run the job, returning spark.jobserver.SparkJobValid will let the job execute, otherwise returning spark.jobserver.SparkJobInvalid(reason) prevents the job from running and provides means to convey the reason of failure. In this case, the call immediately returns an HTTP/1.1 400 Bad Request status code.
    validate helps preventing running jobs that will eventually fail due to missing or wrong configuration and save both time and resources.

In Java, we need to extend JavaSparkJob class. It has following methods which will be overridden in the program:
  • runJob(jsc: JavaSparkContext, jobConfig: Config)
  • validate(sc: SparkContext, config: Config)
  • invalidate(jsc: JavaSparkContext, config: Config)

JavaSparkJob class is available in job-server-api package. Build the job-server-api source code and add this jar to your project.  Add spark and other required dependencies in your pom.xml. 

Let’s start with the basic WordCount example:

package spark.jobserver;

import java.util.Arrays;
import java.util.List;
import java.util.regex.Pattern;

import org.apache.commons.lang.StringUtils;
import org.apache.spark.SparkContext;
import scala.Tuple2;
import spark.jobserver.JavaSparkJob;
import spark.jobserver.SparkJobInvalid;
import spark.jobserver.SparkJobValid$;
import spark.jobserver.SparkJobValidation;
import com.typesafe.config.Config;
public class Wordcount extends JavaSparkJob implements Serializable { 
       private static final long serialVersionUID = 1L;
private static final Pattern SPACE = Pattern.compile(" ");
static String fileName = StringUtils.EMPTY;

       public Object runJob(JavaSparkContext jsc, Config config) {
              try {
                     JavaRDD<String> lines = jsc.textFile(
                                  config.getString("input.filename"), 1);
                     JavaRDD<String> words = lines
                                  .flatMap(new FlatMapFunction<String, String>() {
                                         public Iterable<String> call(String s) {
                                                return Arrays.asList(SPACE.split(s));
                     JavaPairRDD<String, Integer> counts = words.mapToPair(
                                  new PairFunction<String, String, Integer>() {
                                         public Tuple2<String, Integer> call(String s) {
                                                return new Tuple2<String, Integer>(s, 1);
                                  }).reduceByKey(new Function2<Integer, Integer, Integer>() {
                           public Integer call(Integer i1, Integer i2) {
                                  return i1 + i2;
                     List<Tuple2<String, Integer>> output = counts.collect();
                     return output;
              } catch (Exception e) {
                     return null;

       public SparkJobValidation validate(SparkContext sc, Config config) {
              String filename = config.getString("input.filename");
              if (!filename.isEmpty()) {
                     return SparkJobValid$.MODULE$;
              } else {
                     return new SparkJobInvalid(
                                  "Input paramerter is missing. Please mention the filename");

       public String invalidate(JavaSparkContext jsc, Config config) {
              return null;

Next step is : compile the code and build the jar. Then upload it to the Job server.

So your Spark job is ready to run on JobServer....!!!  


  1. I am getting error when I copied this example to my project:
    The return types are incompatible for the inherited methods SparkJobBase.validate(Object, JobEnvironment, Config), JavaSparkJob.validate(Object, JobEnvironment, Config)

    1. Hey Krish, I'm having the same issue, were able to solve it?

    2. Hi Krish, I'm also facing the same problem. Where you able to find a solution for this problem yet ?

    3. Hi Krish,

      Thanks for the tip, appreciate it. Your article definitely helped me to understand the core concepts.
      I’m most excited about the details your article touch based! I assume it doesn’t come out of the box, it sounds like you are saying we’d need to write in the handlers ourselves.
      Are there any other articles you would recommend to understand this better?
      I want to download and use UiPath Community Edition to study.
      I am planning to download it to the company’s computer where I work and use it.
      The company I work for has more than 250 computers and has sales of over $1 million throughout the company. but I want to download it for personal study.
      In the case, is it possible to use UiPath Community Edition?
      By the way do you have any YouTube videos, would love to watch it. I would like to connect you on LinkedIn, great to have experts like you in my connection (In case, if you don’t have any issues).

      Please keep providing such valuable information.


  2. @Nishu Tayal, thank you for this example! It's helped me wrap my head around getting this thing going. Would be nice to have a blog post on running StreamingContext as well!

    1. Hello Nishu,

      Thank you for making your blogs an embodiment of perfection and simplicity. You make everything so easy to follow.

      Tested mainly with Microsoft Cloud OCR.
      I’ve used the same exact image from UiPath and from normal C# code.
      The latter works normally (although the results are not that great Uipath Training USA .
      The former throws an InvalidImageInput for all configurations I could think of - passing UiPath.Core.Image, System.Drawing.Image, loading from file, using the activity GetOcrText with nested engine, using the engine activity directly… always same result.
      Thanks a lot. This was a perfect step-by-step guide. Don’t think it could have been done better.


    2. Hi Man,

      Muchas Gracias Mi Amigo! You make learning so effortless. Anyone can follow you and I would not mind following you to the moon coz I know you are like my north star.

      test automation.(not always but in general). If they write unit tests, that will be great!
      ♦ API Testing libraries of that programming language.
      ♦ BDD support of that language.
      ♦ Visual Test Automation support.
      I am in a .NET house but I changed the test automation language from C# to JAVA. Because of above reasons. In addition, when you want to search test automation experts, you can find JAVA experienced test UiPath Training USAautomation experts much easily in our region. If you are a manager or decision maker, you should also consider this too.
      But great job man, do keep posted with the new updates.


  3. @Nishu Hi i'm doing one development in spark need of some passionated bigdata/spark developer. Let me know if you are interested. Contact : or whatsapp : +91 9619663272

  4. When i did my technical work for projects.This helpful my project like java jobs in hyderabad

    1. Hey,
      Thank You so much for this blog. It helped me lot. I am a Technical Recruiter by profession and first time working on this technology was bit tough for me, this article really helped me a lot to understand the details to get started with.
      As per Forrester Wave 2017 Q1[1] report best RPA vendors are
      • Automation Anywhere
      • UiPath
      • NICE
      • BluePrism
      • EdgeVerve
      • Workfusion
      • Pega / OpenSpan

      forrester.png602x521 114 KB

      UiPath scores best in the technology category, AA has the biggest market presence and breadth of use-cases while BP scores best when it comes to bot governance and deployment features though, I think, they are a bit underrated.

      Secondly it depends on the use case and who you are. For example WF is great when it comes to digitization(OCR) processes while UiPath offers a free community edition.
      Appreciate your effort for making such useful blogs and the community

  5. I really appreciate information shared above. It’s of great help. If someone want to learn Online (Virtual) instructor lead live training in Apache Spark TECHNOLOGY , kindly contact us
    MaxMunus Offer World Class Virtual Instructor-led training on TECHNOLOGY. We have industry expert trainer. We provide Training Material and Software Support. MaxMunus has successfully conducted 100000+ pieces of training in India, USA, UK, Australia, Switzerland, Qatar, Saudi Arabia, Bangladesh, Bahrain and UAE etc.
    For Demo Contact us.
    Pratik Shekhar
    Ph:(0) +91 9066268701

  6. Hi, Great.. Tutorial is just awesome..It is really helpful for a newbie like me.. I am a regular follower of your blog. Really very informative post you shared here. Kindly keep blogging. If anyone wants to become a Java developer learn from Java Training in Chennai. or learn thru Java EE Online Training from India . Nowadays Java has tons of job opportunities on various vertical industry.

    Java Online Training

  7. Learned a lot of new things from your post , Thanks for sharing

    Java Online Training Hyderabad

  8. Hello Nishu,

    Allow me to show my gratitude ASHA24 bloggers. You guys are like unicorns. Never seen but always spreading magic. Your content is yummy. So satisfied. Uipath certification

    The best way to implement what you want to do (target individual vertices of plane shapes many times over) is by using an IDE form. The reason is: while the running form is waiting for input (point selection) from you, you can change anything you want in the active view (including Display Control settings). When you finally select a point, you can process it in the form code (as Kendall noted, this is what Mentor does), and then move on to the next point.

    I have provided a form that defines its own command (Select Plane Vertex) to which a mouse click event is attached. The event is only sensitive to left mouse clicks that do not include any combination of the Shift, Ctrl, or Alt keys. All other mouse events are passed to the application for processing, so you are free to use strokes, menus, pan, zoom, etc., while the form is running. But a solitary left mouse click will register the cursor location, then search all conductor graphics items that exist on the displayed layers. It will only examine plane shapes that are not contained in cells, and will find the closest coordinate to the mouse click. It will then select that plane shape, zoom to its extents, and report the layer, net name, index into the points array for the shape, and the X/Y coordinate at that index. Now that you can locate that point in its array, you can apply whatever processing you desire (per input from Patrick). When you finally close the form, it will unselect and unhighlight everything, reset the view to the board extents, and terminate the Select Plane Vertex command.

    When you save the form, don't forget to set the read-only property for production use.

    It was cool to see your article pop up in my google search for the process yesterday. Great Guide.
    Keep up the good work!

    Thanks and Regards


  9. Greetings Mate,

    I love all the posts, I really enjoyed.
    I would like more informa tion about this, because it is very nice., Thanks for sharing.

    It all depends on the framework. If you already have a good framework then make few modifications to it. UiPath Training Tutorial
    I mean, use modular driven framework, so whenever there occur changes in the application, you just need to change particular module. Secondly you need to integrate your tests with CI. So whenever dev gives the build your tests will be executed
    If you don't have a good framework, I would suggest Katalon Studio. I am really impressed with that tool. It allows anyone to automate with ease and we can organize tests as per your need. Many companies started using it. Why don't you try it.
    Just develop few test scripts and share it your developer or integrate with CI
    Appreciate your effort for making such useful blogs and helping the community.

    Kind Regards,

  10. "• Nice and good article. It is very useful for me to learn and understand easily. Thanks for sharing your valuable information and time. Please keep updating IOT Online Training