Hadoop | Henry Chen

Hadoop 3: Writing a MapReduce program

October 24, 2013 Leave a comment

MapReduce is a programming framework to take a specification of how the data will be input and output from its two stages (map and reduce) and apply it across large amount of data stored distributedly on HDFS.
The tutorial of MapReduce programming is here . I will only provide the steps to compile the Java program and execute it in Hadoop as the following:

-- compile
mkdir wordcount_classes
javac -classpath ${HADOOP_HOME}/hadoop-core-1.2.1.jar -d wordcount_classes WordCount.java

-- create a jar frrom wordcount_classes
jar -cvf /home/hadoop/test/wordcount.jar -C wordcount_classes/ .

-- start hadoop dfs / mapred
start-dfs.sh
start-mapred.sh

[hadoop@localhost test]$ hadoop dfs -ls
Warning: $HADOOP_HOME is deprecated.

Found 1 items
drwxr-xr-x   - hadoop supergroup          0 2013-09-25 12:54 /user/hadoop/test

[hadoop@localhost test]$ hadoop dfs -mkdir test/wordcount
Warning: $HADOOP_HOME is deprecated.

[hadoop@localhost test]$ hadoop dfs -mkdir test/wordcount/input
Warning: $HADOOP_HOME is deprecated.

[hadoop@localhost test]$ hadoop dfs -ls
Warning: $HADOOP_HOME is deprecated.

Found 1 items
drwxr-xr-x   - hadoop supergroup          0 2013-09-26 12:08 /user/hadoop/test

[hadoop@localhost test]$ hadoop dfs -ls /user/hadoop/test/wordcount/input
Warning: $HADOOP_HOME is deprecated.

[hadoop@localhost test]$ hadoop dfs -copyFromLocal file01 /user/hadoop/test/wordcount/input
Warning: $HADOOP_HOME is deprecated.

[hadoop@localhost test]$ hadoop dfs -ls /user/hadoop/test/wordcount/input
Warning: $HADOOP_HOME is deprecated.

Found 1 items
-rw-r--r--   1 hadoop supergroup         22 2013-09-26 12:14 /user/hadoop/test/wordcount/input/file01
[hadoop@localhost test]$ hadoop dfs -copyFromLocal file02 /user/hadoop/test/wordcount/input
Warning: $HADOOP_HOME is deprecated.

[hadoop@localhost test]$ hadoop dfs -ls /user/hadoop/test/wordcount/input
Warning: $HADOOP_HOME is deprecated.

Found 2 items
-rw-r--r--   1 hadoop supergroup         22 2013-09-26 12:14 /user/hadoop/test/wordcount/input/file01
-rw-r--r--   1 hadoop supergroup         28 2013-09-26 12:14 /user/hadoop/test/wordcount/input/file02

[hadoop@localhost test]$ hadoop dfs -rmr /user/hadoop/test/wordcount/output
Warning: $HADOOP_HOME is deprecated.

Deleted hdfs://localhost:9000/user/hadoop/test/wordcount/output
[hadoop@localhost test]$ hadoop jar wordcount.jar WordCount  /user/hadoop/test/wordcount/input  /user/hadoop/test/wordcount/output
Warning: $HADOOP_HOME is deprecated.

13/09/26 12:19:30 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
13/09/26 12:19:30 INFO util.NativeCodeLoader: Loaded the native-hadoop library
13/09/26 12:19:30 WARN snappy.LoadSnappy: Snappy native library not loaded
13/09/26 12:19:30 INFO mapred.FileInputFormat: Total input paths to process : 2
13/09/26 12:19:30 INFO mapred.JobClient: Running job: job_201309261202_0002
13/09/26 12:19:31 INFO mapred.JobClient:  map 0% reduce 0%
13/09/26 12:19:45 INFO mapred.JobClient:  map 33% reduce 0%
13/09/26 12:19:46 INFO mapred.JobClient:  map 66% reduce 0%
13/09/26 12:19:50 INFO mapred.JobClient:  map 100% reduce 0%
13/09/26 12:19:56 INFO mapred.JobClient:  map 100% reduce 33%
13/09/26 12:19:57 INFO mapred.JobClient:  map 100% reduce 100%
13/09/26 12:19:58 INFO mapred.JobClient: Job complete: job_201309261202_0002
13/09/26 12:19:58 INFO mapred.JobClient: Counters: 30
13/09/26 12:19:58 INFO mapred.JobClient:   Job Counters
13/09/26 12:19:58 INFO mapred.JobClient:     Launched reduce tasks=1
13/09/26 12:19:58 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=27861
13/09/26 12:19:58 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
13/09/26 12:19:58 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
13/09/26 12:19:58 INFO mapred.JobClient:     Launched map tasks=3
13/09/26 12:19:58 INFO mapred.JobClient:     Data-local map tasks=3
13/09/26 12:19:58 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=12045
13/09/26 12:19:58 INFO mapred.JobClient:   File Input Format Counters
13/09/26 12:19:58 INFO mapred.JobClient:     Bytes Read=53
13/09/26 12:19:58 INFO mapred.JobClient:   File Output Format Counters
13/09/26 12:19:58 INFO mapred.JobClient:     Bytes Written=41
13/09/26 12:19:58 INFO mapred.JobClient:   FileSystemCounters
13/09/26 12:19:58 INFO mapred.JobClient:     FILE_BYTES_READ=79
13/09/26 12:19:58 INFO mapred.JobClient:     HDFS_BYTES_READ=395
13/09/26 12:19:58 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=224903
13/09/26 12:19:58 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=41
13/09/26 12:19:58 INFO mapred.JobClient:   Map-Reduce Framework
13/09/26 12:19:58 INFO mapred.JobClient:     Map output materialized bytes=91
13/09/26 12:19:58 INFO mapred.JobClient:     Map input records=2
13/09/26 12:19:58 INFO mapred.JobClient:     Reduce shuffle bytes=91
13/09/26 12:19:58 INFO mapred.JobClient:     Spilled Records=12
13/09/26 12:19:58 INFO mapred.JobClient:     Map output bytes=82
13/09/26 12:19:58 INFO mapred.JobClient:     Total committed heap usage (bytes)=445067264
13/09/26 12:19:58 INFO mapred.JobClient:     CPU time spent (ms)=1600
13/09/26 12:19:58 INFO mapred.JobClient:     Map input bytes=50
13/09/26 12:19:58 INFO mapred.JobClient:     SPLIT_RAW_BYTES=342
13/09/26 12:19:58 INFO mapred.JobClient:     Combine input records=8
13/09/26 12:19:58 INFO mapred.JobClient:     Reduce input records=6
13/09/26 12:19:58 INFO mapred.JobClient:     Reduce input groups=5
13/09/26 12:19:58 INFO mapred.JobClient:     Combine output records=6
13/09/26 12:19:58 INFO mapred.JobClient:     Physical memory (bytes) snapshot=470908928
13/09/26 12:19:58 INFO mapred.JobClient:     Reduce output records=5
13/09/26 12:19:58 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=1494261760
13/09/26 12:19:58 INFO mapred.JobClient:     Map output records=8
[hadoop@localhost test]$ hadoop dfs -ls /user/hadoop/test/wordcount/output
Warning: $HADOOP_HOME is deprecated.

Found 3 items
-rw-r--r--   1 hadoop supergroup          0 2013-09-26 12:19 /user/hadoop/test/wordcount/output/_SUCCESS
drwxr-xr-x   - hadoop supergroup          0 2013-09-26 12:19 /user/hadoop/test/wordcount/output/_logs
-rw-r--r--   1 hadoop supergroup         41 2013-09-26 12:19 /user/hadoop/test/wordcount/output/part-00000
[hadoop@localhost test]$ hadoop dfs -cat /user/hadoop/test/wordcount/output/part-00000
Warning: $HADOOP_HOME is deprecated.

Bye     1
Goodbye 1
Hadoop  2
Hello   2
World   2
[hadoop@localhost test]$

Filed under Hadoop Tagged with Hadoop, MapReduce

Hadoop 2: Configure Pseudo-distributed mode

October 23, 2013 Leave a comment

In the last post, I show how to get start with Apache Hadoop by installing the software, and testing the pi program in the default local standalone mode. However, Hadoop is more for writing data intensive distributed application, and it intends to run in distributed mode. The following will show how to configure Hadoop to run in Pseudo-distributed mode. Although it doesn’t run on the full distributed mode, it can demonstrate how it works using HDFS.

1. Setup SSH

1) create a new OpenSSL key pair with empty passphrase


[hadoop@localhost ~]$ ssh-keygen

Generating public/private rsa key pair.

Enter file in which to save the key (/home/hadoop/.ssh/id_rsa):

Created directory '/home/hadoop/.ssh'.

Enter passphrase (empty for no passphrase):

Enter same passphrase again:

Your identification has been saved in /home/hadoop/.ssh/id_rsa.

Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.

The key fingerprint is:

88:49:13:47:88:55:60:d9:c4:1f:54:f4:6c:f2:c9:f2 hadoop@localhost.localdomain

2) copy the new public key to the list of authorized keys


[hadoop@localhost ~]$ ls .ssh

id_rsa id_rsa.pub

[hadoop@localhost ~]$ cp .ssh/id_rsa.pub .ssh/authorized_keys

3) connect to local host


[hadoop@localhost ~]$ ssh localhost

The authenticity of host 'localhost (127.0.0.1)' can't be established.

RSA key fingerprint is 07:15:bb:a2:a6:ba:60:3f:c3:31:a9:c9:4a:7c:51:6a.

Are you sure you want to continue connecting (yes/no)? yes

Warning: Permanently added 'localhost' (RSA) to the list of known hosts.

Last login: Wed Sep 25 11:49:51 2013

[hadoop@localhost ~]$ exit

logout

4) Confirm that the password-less SSH is working


[hadoop@localhost ~]$ ssh localhost

Last login: Wed Sep 25 12:20:06 2013 from localhost.localdomain

2. Configure Pseudo-distributed mode

1) gedit $HADOOP_HOME/conf/core-site.xml


<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>

<property>

<name>fs.default.name</name>

<value>hdfs://localhost:9000</value>

</property>

<property>

<name>hadoop.tmp.dir</name>

<value>/home/hadoop/datadir</value>

</property>

</configuration>

2) gedit $HADOOP_HOME/conf/hdfs-site.xml


<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>

<property>

<name>fs.default.name</name>

<value>hdfs://localhost:9000</value>

</property>

<property>

<name>hadoop.tmp.dir</name>

<value>/home/hadoop/datadir</value>

</property>

</configuration>

3) gedit $HADOOP_HOME/conf/mapred-site.xml


<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>

<property>

<name>mapred.job.tracker</name>

<value>localhost:9001</value>

</property>

</configuration>

4) Create the base directory for Hadoop files


[hadoop@localhost ~]$ pwd

/home/hadoop

[hadoop@localhost ~]$ mkdir datadir

[hadoop@localhost ~]$ ls -l /home/hadoop

total 37268

drwxrwxr-x 2 hadoop hadoop 4096 Sep 25 12:35 datadir

drwxr-xr-x 2 hadoop hadoop 4096 Sep 13 08:17 Desktop

drwxr-xr-x 14 hadoop hadoop 4096 Sep 13 10:23 hadoop-1.2.1

5) Format the HDFS filesystem


[hadoop@localhost ~]$ hadoop namenode -format

Warning: $HADOOP_HOME is deprecated.

13/09/25 12:43:53 INFO namenode.NameNode: STARTUP_MSG:

/************************************************************

STARTUP_MSG: Starting NameNode

STARTUP_MSG: host = localhost.localdomain/127.0.0.1

STARTUP_MSG: args = [-format]

STARTUP_MSG: version = 1.2.1

STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152; compiled by 'mattf' on Mon Jul 22 15:23:09 PDT 2013

STARTUP_MSG: java = 1.6.0_20

************************************************************/

13/09/25 12:43:53 INFO util.GSet: Computing capacity for map BlocksMap

13/09/25 12:43:53 INFO util.GSet: VM type = 32-bit

13/09/25 12:43:53 INFO util.GSet: 2.0% max memory = 1013645312

13/09/25 12:43:53 INFO util.GSet: capacity = 2^22 = 4194304 entries

13/09/25 12:43:53 INFO util.GSet: recommended=4194304, actual=4194304

13/09/25 12:43:54 INFO namenode.FSNamesystem: fsOwner=hadoop

13/09/25 12:43:54 INFO namenode.FSNamesystem: supergroup=supergroup

13/09/25 12:43:54 INFO namenode.FSNamesystem: isPermissionEnabled=true

13/09/25 12:43:54 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100

13/09/25 12:43:54 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)

13/09/25 12:43:54 INFO namenode.FSEditLog: dfs.namenode.edits.toleration.length = 0

13/09/25 12:43:54 INFO namenode.NameNode: Caching file names occuring more than 10 times

13/09/25 12:43:56 INFO common.Storage: Image file /home/hadoop/datadir/dfs/name/current/fsimage of size 112 bytes saved in 0 seconds.

13/09/25 12:43:56 INFO namenode.FSEditLog: closing edit log: position=4, editlog=/home/hadoop/datadir/dfs/name/current/edits

13/09/25 12:43:56 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/home/hadoop/datadir/dfs/name/current/edits

13/09/25 12:43:57 INFO common.Storage: Storage directory /home/hadoop/datadir/dfs/name has been successfully formatted.

13/09/25 12:43:57 INFO namenode.NameNode: SHUTDOWN_MSG:

/************************************************************

SHUTDOWN_MSG: Shutting down NameNode at localhost.localdomain/127.0.0.1

************************************************************/

6) start DFS


[hadoop@localhost ~]$ start-dfs.sh

Warning: $HADOOP_HOME is deprecated.

starting namenode, logging to /home/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-namenode-localhost.localdomain.out

localhost: Warning: $HADOOP_HOME is deprecated.

localhost:

localhost: starting datanode, logging to /home/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-datanode-localhost.localdomain.out

localhost: Warning: $HADOOP_HOME is deprecated.

localhost:

localhost: starting secondarynamenode, logging to /home/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-secondarynamenode-localhost.localdomain.out

[hadoop@localhost ~]$ jps

4346 Jps

4018 NameNode

4258 SecondaryNameNode

4136 DataNode

[hadoop@localhost ~]$ hadoop dfs -mkdir test

Warning: $HADOOP_HOME is deprecated.

[hadoop@localhost ~]$ hadoop dfs -ls /

Warning: $HADOOP_HOME is deprecated.

Found 1 items

drwxr-xr-x - hadoop supergroup 0 2013-09-25 12:54 /user

[hadoop@localhost ~]$ hadoop dfs -ls /user/hadoop

Warning: $HADOOP_HOME is deprecated.

Found 1 items

drwxr-xr-x - hadoop supergroup 0 2013-09-25 12:54 /user/hadoop/test

7) start MAPRED


[hadoop@localhost ~]$ start-mapred.sh

Warning: $HADOOP_HOME is deprecated.

starting jobtracker, logging to /home/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-jobtracker-localhost.localdomain.out

localhost: Warning: $HADOOP_HOME is deprecated.

localhost:

localhost: starting tasktracker, logging to /home/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-tasktracker-localhost.localdomain.out

[hadoop@localhost ~]$ jps

4887 JobTracker

5102 Jps

4018 NameNode

4258 SecondaryNameNode

4136 DataNode

5010 TaskTracker

8) stop ALL


[hadoop@localhost ~]$ stop-all.sh

Warning: $HADOOP_HOME is deprecated.

stopping jobtracker

localhost: Warning: $HADOOP_HOME is deprecated.

localhost:

localhost: stopping tasktracker

stopping namenode

localhost: stopping datanode

localhost: Warning: $HADOOP_HOME is deprecated.

localhost:

localhost: stopping secondarynamenode

localhost: Warning: $HADOOP_HOME is deprecated.

localhost:

9) start ALL


[hadoop@localhost ~]$ start-all.sh

Warning: $HADOOP_HOME is deprecated.

starting namenode, logging to /home/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-namenode-localhost.localdomain.out

localhost: Warning: $HADOOP_HOME is deprecated.

localhost:

localhost: starting datanode, logging to /home/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-datanode-localhost.localdomain.out

localhost: Warning: $HADOOP_HOME is deprecated.

localhost:

localhost: starting secondarynamenode, logging to /home/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-secondarynamenode-localhost.localdomain.out

starting jobtracker, logging to /home/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-jobtracker-localhost.localdomain.out

localhost: Warning: $HADOOP_HOME is deprecated.

localhost:

localhost: starting tasktracker, logging to /home/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-tasktracker-localhost.localdomain.out

[hadoop@localhost ~]$ jps

5883 JobTracker

5530 NameNode

6028 TaskTracker

5654 DataNode

6123 Jps

5801 SecondaryNameNode

10) Web based admin interface

Web based Interface for NameNode
http://localhost:50070
Web based Interface for JobTracker
http://localhost:50030
Web based Interface for TaskTracker
http://localhost:50060

Filed under Hadoop Tagged with Hadoop, HDFS, Pseudo-distributed mode

Hadoop 1: Get Started

September 13, 2013 Leave a comment

Hadoop is an open-source software framework that supports data intensive distributed applications.

There are two architectural layers: HDFS and MapReduce.

HDFS is a filesystem that can store very large data sets by scaling out across a cluster of hosts
MapReduce is a data processing framework that takes a specification of how the data will be input and output from its two stages (map and reduce) and applies it accross large data sets on HDFS.

Apache Hadoop consists of four components: NameNode, DataNode, JobTracker and TaskTracker. These components can be deployed in three modes: local standalone mode, pseudo-distributed mode and fully distributed mode.

The following is how to get Hadoop started in the default (local standalone) mode on Linux which is simple and less configurations. I am using Oracle Enterprise Linux on VirtualBox. But the process can be duplicated on other Linux / Unix.

1. Download

Download the tarball hadoop-1.2.1-bin.tar.gz from http://mirror.its.dal.ca/apache/hadoop/common/hadoop-1.2.1/.

2. Uncompress

tar -xf hadoop-1.2.1-bin.tar.gz

In my case, it was uncompressed to /home/hadoop/hadoop-1.2.1

3. Add Symbolic Link

ln -s /home/hadoop/hadoop-1.2.1 /opt/hadoop

4. Setup Local Profile

I am using bash shell, and I add the following two lines onto my .bashrc.

export HADOOP_HOME=/home/hadoop/hadoop-1.2.1
export PATH=$HADOOP_HOME/bin:$PATH

5. Setup JAVA_HOME

Find out where your java sdk located. In my case, it is /usr/java/latest.

Modify $HADOOP_HOME/conf/hadoop-env.sh script on the following line:

export JAVA_HOME=~~/usr/java/latest~~

6.Test Drive

cd $HADOOP_HOME

hadoop jar ./hadoop-examples-1.2.1.jar pi 2 1000

This will start $HADOOP_HOME/bin/hadoop with $HADOOP_HOME/hadoop-examples-1.2.1.jar, and create 2 MapReduce tasks to calculate pi (class) by using 1000 samples. The java source code for PiEstimator is here.

The hadoop is running on default (local standalone) mode. All the components (NameNode, DataNode, JobTracker and TaskTracker) run in a singal Java process.

The following is the output from this test.

[hadoop@localhost hadoop-1.2.1]$ hadoop jar ./hadoop-examples-1.2.1.jar pi 2 1000
Warning: $HADOOP_HOME is deprecated.

Number of Maps  = 2
Samples per Map = 1000
13/09/13 10:05:00 INFO util.NativeCodeLoader: Loaded the native-hadoop library
Wrote input for Map #0
Wrote input for Map #1
Starting Job
13/09/13 10:05:00 INFO mapred.FileInputFormat: Total input paths to process : 2
13/09/13 10:05:00 INFO mapred.JobClient: Running job: job_local691945750_0001
13/09/13 10:05:00 INFO mapred.LocalJobRunner: Waiting for map tasks
13/09/13 10:05:00 INFO mapred.LocalJobRunner: Starting task: attempt_local691945750_0001_m_000000_0
13/09/13 10:05:00 INFO util.ProcessTree: setsid exited with exit code 0
13/09/13 10:05:00 INFO mapred.Task:  Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@ec4a87
13/09/13 10:05:00 INFO mapred.MapTask: Processing split: file:/home/hadoop/hadoop-1.2.1/PiEstimator_TMP_3_141592654/in/part0:0+118
13/09/13 10:05:00 INFO mapred.MapTask: numReduceTasks: 1
13/09/13 10:05:00 INFO mapred.MapTask: io.sort.mb = 100
13/09/13 10:05:00 INFO mapred.MapTask: data buffer = 79691776/99614720
13/09/13 10:05:00 INFO mapred.MapTask: record buffer = 262144/327680
13/09/13 10:05:00 INFO mapred.MapTask: Starting flush of map output
13/09/13 10:05:00 INFO mapred.MapTask: Finished spill 0
13/09/13 10:05:00 INFO mapred.Task: Task:attempt_local691945750_0001_m_000000_0 is done. And is in the process of commiting
13/09/13 10:05:00 INFO mapred.LocalJobRunner: Generated 1000 samples.
13/09/13 10:05:00 INFO mapred.Task: Task 'attempt_local691945750_0001_m_000000_0' done.
13/09/13 10:05:00 INFO mapred.LocalJobRunner: Finishing task: attempt_local691945750_0001_m_000000_0
13/09/13 10:05:00 INFO mapred.LocalJobRunner: Starting task: attempt_local691945750_0001_m_000001_0
13/09/13 10:05:00 INFO mapred.Task:  Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@2cb49d
13/09/13 10:05:00 INFO mapred.MapTask: Processing split: file:/home/hadoop/hadoop-1.2.1/PiEstimator_TMP_3_141592654/in/part1:0+118
13/09/13 10:05:00 INFO mapred.MapTask: numReduceTasks: 1
13/09/13 10:05:00 INFO mapred.MapTask: io.sort.mb = 100
13/09/13 10:05:01 INFO mapred.MapTask: data buffer = 79691776/99614720
13/09/13 10:05:01 INFO mapred.MapTask: record buffer = 262144/327680
13/09/13 10:05:01 INFO mapred.MapTask: Starting flush of map output
13/09/13 10:05:01 INFO mapred.MapTask: Finished spill 0
13/09/13 10:05:01 INFO mapred.Task: Task:attempt_local691945750_0001_m_000001_0 is done. And is in the process of commiting
13/09/13 10:05:01 INFO mapred.LocalJobRunner: Generated 1000 samples.
13/09/13 10:05:01 INFO mapred.Task: Task 'attempt_local691945750_0001_m_000001_0' done.
13/09/13 10:05:01 INFO mapred.LocalJobRunner: Finishing task: attempt_local691945750_0001_m_000001_0
13/09/13 10:05:01 INFO mapred.LocalJobRunner: Map task executor complete.
13/09/13 10:05:01 INFO mapred.Task:  Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@1d225a7
13/09/13 10:05:01 INFO mapred.LocalJobRunner:
13/09/13 10:05:01 INFO mapred.Merger: Merging 2 sorted segments
13/09/13 10:05:01 INFO mapred.Merger: Down to the last merge-pass, with 2 segments left of total size: 48 bytes
13/09/13 10:05:01 INFO mapred.LocalJobRunner:
13/09/13 10:05:01 INFO mapred.Task: Task:attempt_local691945750_0001_r_000000_0 is done. And is in the process of commiting
13/09/13 10:05:01 INFO mapred.LocalJobRunner:
13/09/13 10:05:01 INFO mapred.Task: Task attempt_local691945750_0001_r_000000_0 is allowed to commit now
13/09/13 10:05:01 INFO mapred.FileOutputCommitter: Saved output of task 'attempt_local691945750_0001_r_000000_0' to file:/home/hadoop/hadoop-1.2.1/PiEstimator_TMP_3_141592654/out
13/09/13 10:05:01 INFO mapred.LocalJobRunner: reduce &gt; reduce
13/09/13 10:05:01 INFO mapred.Task: Task 'attempt_local691945750_0001_r_000000_0' done.
13/09/13 10:05:01 INFO mapred.JobClient:  map 100% reduce 100%
13/09/13 10:05:01 INFO mapred.JobClient: Job complete: job_local691945750_0001
13/09/13 10:05:01 INFO mapred.JobClient: Counters: 21
13/09/13 10:05:01 INFO mapred.JobClient:   File Input Format Counters
13/09/13 10:05:01 INFO mapred.JobClient:     Bytes Read=260
13/09/13 10:05:01 INFO mapred.JobClient:   File Output Format Counters
13/09/13 10:05:01 INFO mapred.JobClient:     Bytes Written=109
13/09/13 10:05:01 INFO mapred.JobClient:   FileSystemCounters
13/09/13 10:05:01 INFO mapred.JobClient:     FILE_BYTES_READ=430323
13/09/13 10:05:01 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=587849
13/09/13 10:05:01 INFO mapred.JobClient:   Map-Reduce Framework
13/09/13 10:05:01 INFO mapred.JobClient:     Map output materialized bytes=56
13/09/13 10:05:01 INFO mapred.JobClient:     Map input records=2
13/09/13 10:05:01 INFO mapred.JobClient:     Reduce shuffle bytes=0
13/09/13 10:05:01 INFO mapred.JobClient:     Spilled Records=8
13/09/13 10:05:01 INFO mapred.JobClient:     Map output bytes=36
13/09/13 10:05:01 INFO mapred.JobClient:     Total committed heap usage (bytes)=475803648
13/09/13 10:05:01 INFO mapred.JobClient:     CPU time spent (ms)=0
13/09/13 10:05:01 INFO mapred.JobClient:     Map input bytes=48
13/09/13 10:05:01 INFO mapred.JobClient:     SPLIT_RAW_BYTES=240
13/09/13 10:05:01 INFO mapred.JobClient:     Combine input records=0
13/09/13 10:05:01 INFO mapred.JobClient:     Reduce input records=4
13/09/13 10:05:01 INFO mapred.JobClient:     Reduce input groups=4
13/09/13 10:05:01 INFO mapred.JobClient:     Combine output records=0
13/09/13 10:05:01 INFO mapred.JobClient:     Physical memory (bytes) snapshot=0
13/09/13 10:05:01 INFO mapred.JobClient:     Reduce output records=0
13/09/13 10:05:01 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=0
13/09/13 10:05:01 INFO mapred.JobClient:     Map output records=4
Job Finished in 1.236 seconds
Estimated value of Pi is 3.14400000000000000000

Filed under Hadoop Tagged with Hadoop, HDFS, MapReduce

Henry Chen

Hadoop 3: Writing a MapReduce program

Hadoop 2: Configure Pseudo-distributed mode

1. Setup SSH

2. Configure Pseudo-distributed mode

Hadoop 1: Get Started

1. Download

2. Uncompress

3. Add Symbolic Link

4. Setup Local Profile

5. Setup JAVA_HOME

6.Test Drive

Follow me on Twitter

Recent Posts

Archives

Categories

Meta

Henry Chen

Hadoop 3: Writing a MapReduce program

Share this:

Hadoop 2: Configure Pseudo-distributed mode

1. Setup SSH

2. Configure Pseudo-distributed mode

Share this:

Hadoop 1: Get Started

1. Download

2. Uncompress

3. Add Symbolic Link

4. Setup Local Profile

5. Setup JAVA_HOME

6.Test Drive

Share this:

Follow me on Twitter

Recent Posts

Archives

Categories

Meta