Hadoop 2: Configure Pseudo-distributed mode

In the last post, I show how to get start with Apache Hadoop by installing the software, and testing the pi program in the default local standalone mode.  However, Hadoop is more for writing data intensive distributed application, and it intends to run in distributed mode. The following will show how to configure Hadoop to run in Pseudo-distributed mode. Although it doesn’t run on the full  distributed mode, it can demonstrate how it works using HDFS.

1.  Setup SSH

1) create a new OpenSSL key pair with empty passphrase


[hadoop@localhost ~]$ ssh-keygen

Generating public/private rsa key pair.

Enter file in which to save the key (/home/hadoop/.ssh/id_rsa):

Created directory '/home/hadoop/.ssh'.

Enter passphrase (empty for no passphrase):

Enter same passphrase again:

Your identification has been saved in /home/hadoop/.ssh/id_rsa.

Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.

The key fingerprint is:

88:49:13:47:88:55:60:d9:c4:1f:54:f4:6c:f2:c9:f2 hadoop@localhost.localdomain

2) copy the new public key to the list of authorized keys


[hadoop@localhost ~]$ ls .ssh

id_rsa id_rsa.pub

[hadoop@localhost ~]$ cp .ssh/id_rsa.pub .ssh/authorized_keys

3) connect to local host


[hadoop@localhost ~]$ ssh localhost

The authenticity of host 'localhost (127.0.0.1)' can't be established.

RSA key fingerprint is 07:15:bb:a2:a6:ba:60:3f:c3:31:a9:c9:4a:7c:51:6a.

Are you sure you want to continue connecting (yes/no)? yes

Warning: Permanently added 'localhost' (RSA) to the list of known hosts.

Last login: Wed Sep 25 11:49:51 2013

[hadoop@localhost ~]$ exit

logout

4) Confirm that the password-less SSH is working


[hadoop@localhost ~]$ ssh localhost

Last login: Wed Sep 25 12:20:06 2013 from localhost.localdomain

2. Configure Pseudo-distributed mode

1) gedit $HADOOP_HOME/conf/core-site.xml


<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>

<property>

<name>fs.default.name</name>

<value>hdfs://localhost:9000</value>

</property>

<property>

<name>hadoop.tmp.dir</name>

<value>/home/hadoop/datadir</value>

</property>

</configuration>

2) gedit $HADOOP_HOME/conf/hdfs-site.xml


<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>

<property>

<name>fs.default.name</name>

<value>hdfs://localhost:9000</value>

</property>

<property>

<name>hadoop.tmp.dir</name>

<value>/home/hadoop/datadir</value>

</property>

</configuration>

3) gedit $HADOOP_HOME/conf/mapred-site.xml


<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>

<property>

<name>mapred.job.tracker</name>

<value>localhost:9001</value>

</property>

</configuration>

4) Create the base directory for Hadoop files


[hadoop@localhost ~]$ pwd

/home/hadoop

[hadoop@localhost ~]$ mkdir datadir

[hadoop@localhost ~]$ ls -l /home/hadoop

total 37268

drwxrwxr-x 2 hadoop hadoop 4096 Sep 25 12:35 datadir

drwxr-xr-x 2 hadoop hadoop 4096 Sep 13 08:17 Desktop

drwxr-xr-x 14 hadoop hadoop 4096 Sep 13 10:23 hadoop-1.2.1

5) Format the HDFS filesystem


[hadoop@localhost ~]$ hadoop namenode -format

Warning: $HADOOP_HOME is deprecated.

13/09/25 12:43:53 INFO namenode.NameNode: STARTUP_MSG:

/************************************************************

STARTUP_MSG: Starting NameNode

STARTUP_MSG: host = localhost.localdomain/127.0.0.1

STARTUP_MSG: args = [-format]

STARTUP_MSG: version = 1.2.1

STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152; compiled by 'mattf' on Mon Jul 22 15:23:09 PDT 2013

STARTUP_MSG: java = 1.6.0_20

************************************************************/

13/09/25 12:43:53 INFO util.GSet: Computing capacity for map BlocksMap

13/09/25 12:43:53 INFO util.GSet: VM type = 32-bit

13/09/25 12:43:53 INFO util.GSet: 2.0% max memory = 1013645312

13/09/25 12:43:53 INFO util.GSet: capacity = 2^22 = 4194304 entries

13/09/25 12:43:53 INFO util.GSet: recommended=4194304, actual=4194304

13/09/25 12:43:54 INFO namenode.FSNamesystem: fsOwner=hadoop

13/09/25 12:43:54 INFO namenode.FSNamesystem: supergroup=supergroup

13/09/25 12:43:54 INFO namenode.FSNamesystem: isPermissionEnabled=true

13/09/25 12:43:54 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100

13/09/25 12:43:54 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)

13/09/25 12:43:54 INFO namenode.FSEditLog: dfs.namenode.edits.toleration.length = 0

13/09/25 12:43:54 INFO namenode.NameNode: Caching file names occuring more than 10 times

13/09/25 12:43:56 INFO common.Storage: Image file /home/hadoop/datadir/dfs/name/current/fsimage of size 112 bytes saved in 0 seconds.

13/09/25 12:43:56 INFO namenode.FSEditLog: closing edit log: position=4, editlog=/home/hadoop/datadir/dfs/name/current/edits

13/09/25 12:43:56 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/home/hadoop/datadir/dfs/name/current/edits

13/09/25 12:43:57 INFO common.Storage: Storage directory /home/hadoop/datadir/dfs/name has been successfully formatted.

13/09/25 12:43:57 INFO namenode.NameNode: SHUTDOWN_MSG:

/************************************************************

SHUTDOWN_MSG: Shutting down NameNode at localhost.localdomain/127.0.0.1

************************************************************/

6) start DFS


[hadoop@localhost ~]$ start-dfs.sh

Warning: $HADOOP_HOME is deprecated.

starting namenode, logging to /home/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-namenode-localhost.localdomain.out

localhost: Warning: $HADOOP_HOME is deprecated.

localhost:

localhost: starting datanode, logging to /home/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-datanode-localhost.localdomain.out

localhost: Warning: $HADOOP_HOME is deprecated.

localhost:

localhost: starting secondarynamenode, logging to /home/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-secondarynamenode-localhost.localdomain.out

[hadoop@localhost ~]$ jps

4346 Jps

4018 NameNode

4258 SecondaryNameNode

4136 DataNode

[hadoop@localhost ~]$ hadoop dfs -mkdir test

Warning: $HADOOP_HOME is deprecated.

[hadoop@localhost ~]$ hadoop dfs -ls /

Warning: $HADOOP_HOME is deprecated.

Found 1 items

drwxr-xr-x - hadoop supergroup 0 2013-09-25 12:54 /user

[hadoop@localhost ~]$ hadoop dfs -ls /user/hadoop

Warning: $HADOOP_HOME is deprecated.

Found 1 items

drwxr-xr-x - hadoop supergroup 0 2013-09-25 12:54 /user/hadoop/test

7) start MAPRED


[hadoop@localhost ~]$ start-mapred.sh

Warning: $HADOOP_HOME is deprecated.

starting jobtracker, logging to /home/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-jobtracker-localhost.localdomain.out

localhost: Warning: $HADOOP_HOME is deprecated.

localhost:

localhost: starting tasktracker, logging to /home/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-tasktracker-localhost.localdomain.out

[hadoop@localhost ~]$ jps

4887 JobTracker

5102 Jps

4018 NameNode

4258 SecondaryNameNode

4136 DataNode

5010 TaskTracker

8) stop ALL


[hadoop@localhost ~]$ stop-all.sh

Warning: $HADOOP_HOME is deprecated.

stopping jobtracker

localhost: Warning: $HADOOP_HOME is deprecated.

localhost:

localhost: stopping tasktracker

stopping namenode

localhost: stopping datanode

localhost: Warning: $HADOOP_HOME is deprecated.

localhost:

localhost: stopping secondarynamenode

localhost: Warning: $HADOOP_HOME is deprecated.

localhost:

9) start ALL


[hadoop@localhost ~]$ start-all.sh

Warning: $HADOOP_HOME is deprecated.

starting namenode, logging to /home/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-namenode-localhost.localdomain.out

localhost: Warning: $HADOOP_HOME is deprecated.

localhost:

localhost: starting datanode, logging to /home/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-datanode-localhost.localdomain.out

localhost: Warning: $HADOOP_HOME is deprecated.

localhost:

localhost: starting secondarynamenode, logging to /home/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-secondarynamenode-localhost.localdomain.out

starting jobtracker, logging to /home/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-jobtracker-localhost.localdomain.out

localhost: Warning: $HADOOP_HOME is deprecated.

localhost:

localhost: starting tasktracker, logging to /home/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-tasktracker-localhost.localdomain.out

[hadoop@localhost ~]$ jps

5883 JobTracker

5530 NameNode

6028 TaskTracker

5654 DataNode

6123 Jps

5801 SecondaryNameNode

10) Web based admin interface

Web based Interface for NameNode
http://localhost:50070
Web based Interface for JobTracker
http://localhost:50030
Web based Interface for TaskTracker
http://localhost:50060

About henry416
I am a computer technology explorer and an university student based on Toronto. If you have any question, please feel free to discuss and comment here

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s