Monday, 3 August 2015

Setting up Hadoop Cluster in Pseudo-distributed mode on Ubuntu

Here we’ll discuss the pseudo-distributed mode Hadoop cluster setup on linux environment. We are using Hadoop 2.x for this.
     -     Java 7
     -      Adding a dedicated user
     -     Configuring ssh

Step 1: Install Java:

324532@ubuntu:~$ sudo apt-get install openjdk-7-jdk
324532@ubuntu:~$ java –version
java version "1.7.0_79"
OpenJDK Runtime Environment (IcedTea 2.5.5) (7u79-2.5.5-0ubuntu0.14.04.2)
OpenJDK Client VM (build 24.79-b02, mixed mode, sharing)

Step2 : Add a dedicated hadoop user

Though it is not mandatory,we create it for separating the Hadoop installation from other packages.
324532@ubuntu:~$sudo addgroup hadoop
324532@ubuntu:~$sudo adduser –ingroup hadoop hduser
It will add hduser user in hadoop group.

Step 3: Install ssh:

324532@ubuntu:~$ sudo apt-get install ssh
324532@ubuntu:~$ sudo apt-get install openssh-server
Once it is installed, make sure ssh service is running.

Step4 : Configure ssh

Hadoop uses ssh to manages its nodes. So we need to make ssh running and configured for authentication

First generate an SSH key for hduser.

324532@ubuntu:~$ su - hduser
hduser@ubuntu:~$ ssh-keygen -t rsa -P ""
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hduser/.ssh/id_rsa):
Created directory '/home/hduser/.ssh'.
Your identification has been saved in /home/hduser/.ssh/id_rsa.
Your public key has been saved in /home/hduser/.ssh/
The key fingerprint is:
9b:82:ea:58:b4:e0:35:d7:ff:19:66:a6:ef:ae:0e:d2 hduser@ubuntu
The key's randomart image is:

Once the key is generated, copy the public key to authorized keys.

hduser@ubuntu:~$ cat $HOME/.ssh/ >> $HOME/.ssh/authorized_keys

Once key is copied, you can ssh to localhost and continue the Hadoop setup.

hduser@ubuntu:~$ ssh localhost

Step 4: Setup Hadoop cluster

Download a release from Apache download mirrors. And extract it into a folder i.e. ‘/usr/local/hadoop/’. Set JAVA_HOME and other Hadoop related environment variables in .bash_profile file of hduser. 
# set to the root of your Java installation
 export JAVA_HOME=/usr/java/latest
 export HADOOP_INSTALL=/usr/local/hadoop
Hadoop can run in 3 modes :
     1.    Single node cluster
     2.    Pseudo distributed mode
     3.    Fully distributed mode
Single Distributed Mode : All daemons run in  non-distributed manner as a single java process. Local filesystem is used for data storage.

Pseudo Distributed Mode : Hadoop can also be run on single node in pseudo distributed mode where each daemon runs as a separate java process.

Fully Distributed Mode : Hadoop runs on multiple nodes in master slave architecture where each daemon runs as a separate java process.

Configuration :

Following are the minimal configuration you need to add in the configuration files to start a cluster.

<!-- It is namenode filesystem path -->


For yarn daemons : etc/hadoop/mapred-site.xml
        <value>yarn</value> <!--other values are local, classic -->


Once the above configuration is done, next step is to format the namenode.

To format the filesystem:
  hduser@ubuntu:~$ bin/hdfs namenode -format

To start NameNode and DataNode  daemon:
  hduser@ubuntu:~$ sbin/

Browse the NameNode web interface. By default it is : http://localhost:50070

To start yarn daemons (Resource manager and Node manager), run following:
hduser@ubuntu:~ $ sbin/

You can browse resource manager at http://localhost:8088

If you want to run all daemons together, you can run following:
hduser@ubuntu:~$ sbin/

Now your cluster is successfully started. You can see all Hadoop daemons running using jps command.
Now start writing Map Reduce job..!!!!!!!!!


  1. Hi, Great.. Tutorial is just awesome..It is really helpful for a newbie like me.. I am a regular follower of your blog. Really very informative post you shared here. Kindly keep blogging. If anyone wants to become a Java developer learn from Java Training in Chennai. or learn thru Java EE Online Training from India . Nowadays Java has tons of job opportunities on various vertical industry.

    Java Online Training