Hadoop 3: Writing a MapReduce program

MapReduce is a programming framework to take a specification of how the data will be input and output from its two stages (map and reduce) and apply it across large amount of data stored distributedly on HDFS.
The tutorial of MapReduce programming is here . I will only provide the steps to compile the Java program and execute it in Hadoop as the following:

-- compile
mkdir wordcount_classes
javac -classpath ${HADOOP_HOME}/hadoop-core-1.2.1.jar -d wordcount_classes WordCount.java

-- create a jar frrom wordcount_classes
jar -cvf /home/hadoop/test/wordcount.jar -C wordcount_classes/ .

-- start hadoop dfs / mapred
start-dfs.sh
start-mapred.sh

[hadoop@localhost test]$ hadoop dfs -ls
Warning: $HADOOP_HOME is deprecated.

Found 1 items
drwxr-xr-x   - hadoop supergroup          0 2013-09-25 12:54 /user/hadoop/test

[hadoop@localhost test]$ hadoop dfs -mkdir test/wordcount
Warning: $HADOOP_HOME is deprecated.

[hadoop@localhost test]$ hadoop dfs -mkdir test/wordcount/input
Warning: $HADOOP_HOME is deprecated.

[hadoop@localhost test]$ hadoop dfs -ls
Warning: $HADOOP_HOME is deprecated.

Found 1 items
drwxr-xr-x   - hadoop supergroup          0 2013-09-26 12:08 /user/hadoop/test

[hadoop@localhost test]$ hadoop dfs -ls /user/hadoop/test/wordcount/input
Warning: $HADOOP_HOME is deprecated.

[hadoop@localhost test]$ hadoop dfs -copyFromLocal file01 /user/hadoop/test/wordcount/input
Warning: $HADOOP_HOME is deprecated.

[hadoop@localhost test]$ hadoop dfs -ls /user/hadoop/test/wordcount/input
Warning: $HADOOP_HOME is deprecated.

Found 1 items
-rw-r--r--   1 hadoop supergroup         22 2013-09-26 12:14 /user/hadoop/test/wordcount/input/file01
[hadoop@localhost test]$ hadoop dfs -copyFromLocal file02 /user/hadoop/test/wordcount/input
Warning: $HADOOP_HOME is deprecated.

[hadoop@localhost test]$ hadoop dfs -ls /user/hadoop/test/wordcount/input
Warning: $HADOOP_HOME is deprecated.

Found 2 items
-rw-r--r--   1 hadoop supergroup         22 2013-09-26 12:14 /user/hadoop/test/wordcount/input/file01
-rw-r--r--   1 hadoop supergroup         28 2013-09-26 12:14 /user/hadoop/test/wordcount/input/file02

[hadoop@localhost test]$ hadoop dfs -rmr /user/hadoop/test/wordcount/output
Warning: $HADOOP_HOME is deprecated.

Deleted hdfs://localhost:9000/user/hadoop/test/wordcount/output
[hadoop@localhost test]$ hadoop jar wordcount.jar WordCount  /user/hadoop/test/wordcount/input  /user/hadoop/test/wordcount/output
Warning: $HADOOP_HOME is deprecated.

13/09/26 12:19:30 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
13/09/26 12:19:30 INFO util.NativeCodeLoader: Loaded the native-hadoop library
13/09/26 12:19:30 WARN snappy.LoadSnappy: Snappy native library not loaded
13/09/26 12:19:30 INFO mapred.FileInputFormat: Total input paths to process : 2
13/09/26 12:19:30 INFO mapred.JobClient: Running job: job_201309261202_0002
13/09/26 12:19:31 INFO mapred.JobClient:  map 0% reduce 0%
13/09/26 12:19:45 INFO mapred.JobClient:  map 33% reduce 0%
13/09/26 12:19:46 INFO mapred.JobClient:  map 66% reduce 0%
13/09/26 12:19:50 INFO mapred.JobClient:  map 100% reduce 0%
13/09/26 12:19:56 INFO mapred.JobClient:  map 100% reduce 33%
13/09/26 12:19:57 INFO mapred.JobClient:  map 100% reduce 100%
13/09/26 12:19:58 INFO mapred.JobClient: Job complete: job_201309261202_0002
13/09/26 12:19:58 INFO mapred.JobClient: Counters: 30
13/09/26 12:19:58 INFO mapred.JobClient:   Job Counters
13/09/26 12:19:58 INFO mapred.JobClient:     Launched reduce tasks=1
13/09/26 12:19:58 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=27861
13/09/26 12:19:58 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
13/09/26 12:19:58 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
13/09/26 12:19:58 INFO mapred.JobClient:     Launched map tasks=3
13/09/26 12:19:58 INFO mapred.JobClient:     Data-local map tasks=3
13/09/26 12:19:58 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=12045
13/09/26 12:19:58 INFO mapred.JobClient:   File Input Format Counters
13/09/26 12:19:58 INFO mapred.JobClient:     Bytes Read=53
13/09/26 12:19:58 INFO mapred.JobClient:   File Output Format Counters
13/09/26 12:19:58 INFO mapred.JobClient:     Bytes Written=41
13/09/26 12:19:58 INFO mapred.JobClient:   FileSystemCounters
13/09/26 12:19:58 INFO mapred.JobClient:     FILE_BYTES_READ=79
13/09/26 12:19:58 INFO mapred.JobClient:     HDFS_BYTES_READ=395
13/09/26 12:19:58 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=224903
13/09/26 12:19:58 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=41
13/09/26 12:19:58 INFO mapred.JobClient:   Map-Reduce Framework
13/09/26 12:19:58 INFO mapred.JobClient:     Map output materialized bytes=91
13/09/26 12:19:58 INFO mapred.JobClient:     Map input records=2
13/09/26 12:19:58 INFO mapred.JobClient:     Reduce shuffle bytes=91
13/09/26 12:19:58 INFO mapred.JobClient:     Spilled Records=12
13/09/26 12:19:58 INFO mapred.JobClient:     Map output bytes=82
13/09/26 12:19:58 INFO mapred.JobClient:     Total committed heap usage (bytes)=445067264
13/09/26 12:19:58 INFO mapred.JobClient:     CPU time spent (ms)=1600
13/09/26 12:19:58 INFO mapred.JobClient:     Map input bytes=50
13/09/26 12:19:58 INFO mapred.JobClient:     SPLIT_RAW_BYTES=342
13/09/26 12:19:58 INFO mapred.JobClient:     Combine input records=8
13/09/26 12:19:58 INFO mapred.JobClient:     Reduce input records=6
13/09/26 12:19:58 INFO mapred.JobClient:     Reduce input groups=5
13/09/26 12:19:58 INFO mapred.JobClient:     Combine output records=6
13/09/26 12:19:58 INFO mapred.JobClient:     Physical memory (bytes) snapshot=470908928
13/09/26 12:19:58 INFO mapred.JobClient:     Reduce output records=5
13/09/26 12:19:58 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=1494261760
13/09/26 12:19:58 INFO mapred.JobClient:     Map output records=8
[hadoop@localhost test]$ hadoop dfs -ls /user/hadoop/test/wordcount/output
Warning: $HADOOP_HOME is deprecated.

Found 3 items
-rw-r--r--   1 hadoop supergroup          0 2013-09-26 12:19 /user/hadoop/test/wordcount/output/_SUCCESS
drwxr-xr-x   - hadoop supergroup          0 2013-09-26 12:19 /user/hadoop/test/wordcount/output/_logs
-rw-r--r--   1 hadoop supergroup         41 2013-09-26 12:19 /user/hadoop/test/wordcount/output/part-00000
[hadoop@localhost test]$ hadoop dfs -cat /user/hadoop/test/wordcount/output/part-00000
Warning: $HADOOP_HOME is deprecated.

Bye     1
Goodbye 1
Hadoop  2
Hello   2
World   2
[hadoop@localhost test]$

About henry416
I am a computer technology explorer and an university student based on Toronto. If you have any question, please feel free to discuss and comment here

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s