Saturday, 21 February 2015

Writing Java UDF in Apache Pig

Isn’t it good that we can write user defined functions (UDF) for custom processing in Pig also? Here we’ll talk about writing UDF in java.

How to write the Java UDF:
First of all, add pig dependency in the java project.
Now, define a UDF class (eg. HexConversion) . Each UDF will extend EvalFunc<T> class. Here ‘T’ denotes the return type i.e. DataByteArray, DataBag,Tuple,String  etc.
The exec(Tuple input) method is implemented in the UDF  which is invoked on every input tuple. It takes tuple with input parameters in the order they are passed to function in the Pig Script.

Here in the following example, we are writing UDF to convert entire tuple into hexadecimal.

package com.test.udf;
import org.apache.pig.EvalFunc;

public class HexConversion extends EvalFunc<DataByteArray> {
        * UDF to convert ASCII to hexadecimal.It returns the string into Hex format as DataByteArray
        public DataByteArray exec(final Tuple input) throws IOException {
                    DataByteArray output = new DataByteArray();
                    if (input == null) {
                                output = null;
                    try {
                                final String str = input.get(0).toString();
                                String code;
                                int strlength = str.length();
                                StringBuilder builder = new StringBuilder();
                                char[] charArr = new char[strlength];
                                for (int i = 0; i < str.length(); i++) {
                                            char ch = str.charAt(i);
                                            code = Integer.toHexString(ch).toUpperCase();
                                            charArr[i] = code;
                    } catch (final Exception e) {
                                output.append(new byte[0]);
                    return output;

In case of Tuple or DataBag return type, Schema information needs to be passed explicitly in outputSchema method. You need to import following two classes and implement this method:

import org.apache.pig.impl.logicalLayer.schema.Schema;

   public Schema outputSchema(Schema input) {
            Schema tupleSchema = new Schema();
            return new Schema(new      Schema.FieldSchema(getSchemaName(this.getClass().getName().toLowerCase(),  input),tupleSchema, DataType.TUPLE));
        }catch (Exception e){
                return null;

Build the above UDF as a jar file : hexConvertor.jar
Now let’s see how to call this UDF in PigScript. Register the jar file and call the method.

  REGISTER hexConvertor.jar;
  A = LOAD 'sample_data' AS (field1: bytearray, age: int);
  B = FOREACH A GENERATE com.test.udf.HexConversion(field1);

Now you can also write your own UDF. Cheers..!!!


  1. Worthful Hadoop tutorial. Appreciate a lot for taking up the pain to write such a quality content on Hadoop tutorial. Just now I watched this similar Hadoop tutorial and I think this will enhance the knowledge of other visitors for sure. Thanks anyway.:


  2. Thanks for your article. Its very helpful.As a beginner in hadoop ,i got depth knowlege. Thanks for your informative article. Hadoop training in chennai | Hadoop Training institute in chennai

  3. Top Trending Technologies of 2019. Watch here:


  4. I appreciate your work on Big Data Hadoop. It's such a wonderful read on Big Data Hadoop .Keep sharing stuffs like this. I am also educating people on similar Big Data Hadoop Tutorial so if you are interested to know more you can watch this Big Data Hadoop Tutorial:-