Hadoop 3: Writing a MapReduce program
October 24, 2013 Leave a comment
MapReduce is a programming framework to take a specification of how the data will be input and output from its two stages (map and reduce) and apply it across large amount of data stored distributedly on HDFS.
The tutorial of MapReduce programming is here . I will only provide the steps to compile the Java program and execute it in Hadoop as the following:
-- compile mkdir wordcount_classes javac -classpath ${HADOOP_HOME}/hadoop-core-1.2.1.jar -d wordcount_classes WordCount.java -- create a jar frrom wordcount_classes jar -cvf /home/hadoop/test/wordcount.jar -C wordcount_classes/ . -- start hadoop dfs / mapred start-dfs.sh start-mapred.sh [hadoop@localhost test]$ hadoop dfs -ls Warning: $HADOOP_HOME is deprecated. Found 1 items drwxr-xr-x - hadoop supergroup 0 2013-09-25 12:54 /user/hadoop/test [hadoop@localhost test]$ hadoop dfs -mkdir test/wordcount Warning: $HADOOP_HOME is deprecated. [hadoop@localhost test]$ hadoop dfs -mkdir test/wordcount/input Warning: $HADOOP_HOME is deprecated. [hadoop@localhost test]$ hadoop dfs -ls Warning: $HADOOP_HOME is deprecated. Found 1 items drwxr-xr-x - hadoop supergroup 0 2013-09-26 12:08 /user/hadoop/test [hadoop@localhost test]$ hadoop dfs -ls /user/hadoop/test/wordcount/input Warning: $HADOOP_HOME is deprecated. [hadoop@localhost test]$ hadoop dfs -copyFromLocal file01 /user/hadoop/test/wordcount/input Warning: $HADOOP_HOME is deprecated. [hadoop@localhost test]$ hadoop dfs -ls /user/hadoop/test/wordcount/input Warning: $HADOOP_HOME is deprecated. Found 1 items -rw-r--r-- 1 hadoop supergroup 22 2013-09-26 12:14 /user/hadoop/test/wordcount/input/file01 [hadoop@localhost test]$ hadoop dfs -copyFromLocal file02 /user/hadoop/test/wordcount/input Warning: $HADOOP_HOME is deprecated. [hadoop@localhost test]$ hadoop dfs -ls /user/hadoop/test/wordcount/input Warning: $HADOOP_HOME is deprecated. Found 2 items -rw-r--r-- 1 hadoop supergroup 22 2013-09-26 12:14 /user/hadoop/test/wordcount/input/file01 -rw-r--r-- 1 hadoop supergroup 28 2013-09-26 12:14 /user/hadoop/test/wordcount/input/file02 [hadoop@localhost test]$ hadoop dfs -rmr /user/hadoop/test/wordcount/output Warning: $HADOOP_HOME is deprecated. Deleted hdfs://localhost:9000/user/hadoop/test/wordcount/output [hadoop@localhost test]$ hadoop jar wordcount.jar WordCount /user/hadoop/test/wordcount/input /user/hadoop/test/wordcount/output Warning: $HADOOP_HOME is deprecated. 13/09/26 12:19:30 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 13/09/26 12:19:30 INFO util.NativeCodeLoader: Loaded the native-hadoop library 13/09/26 12:19:30 WARN snappy.LoadSnappy: Snappy native library not loaded 13/09/26 12:19:30 INFO mapred.FileInputFormat: Total input paths to process : 2 13/09/26 12:19:30 INFO mapred.JobClient: Running job: job_201309261202_0002 13/09/26 12:19:31 INFO mapred.JobClient: map 0% reduce 0% 13/09/26 12:19:45 INFO mapred.JobClient: map 33% reduce 0% 13/09/26 12:19:46 INFO mapred.JobClient: map 66% reduce 0% 13/09/26 12:19:50 INFO mapred.JobClient: map 100% reduce 0% 13/09/26 12:19:56 INFO mapred.JobClient: map 100% reduce 33% 13/09/26 12:19:57 INFO mapred.JobClient: map 100% reduce 100% 13/09/26 12:19:58 INFO mapred.JobClient: Job complete: job_201309261202_0002 13/09/26 12:19:58 INFO mapred.JobClient: Counters: 30 13/09/26 12:19:58 INFO mapred.JobClient: Job Counters 13/09/26 12:19:58 INFO mapred.JobClient: Launched reduce tasks=1 13/09/26 12:19:58 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=27861 13/09/26 12:19:58 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 13/09/26 12:19:58 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 13/09/26 12:19:58 INFO mapred.JobClient: Launched map tasks=3 13/09/26 12:19:58 INFO mapred.JobClient: Data-local map tasks=3 13/09/26 12:19:58 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=12045 13/09/26 12:19:58 INFO mapred.JobClient: File Input Format Counters 13/09/26 12:19:58 INFO mapred.JobClient: Bytes Read=53 13/09/26 12:19:58 INFO mapred.JobClient: File Output Format Counters 13/09/26 12:19:58 INFO mapred.JobClient: Bytes Written=41 13/09/26 12:19:58 INFO mapred.JobClient: FileSystemCounters 13/09/26 12:19:58 INFO mapred.JobClient: FILE_BYTES_READ=79 13/09/26 12:19:58 INFO mapred.JobClient: HDFS_BYTES_READ=395 13/09/26 12:19:58 INFO mapred.JobClient: FILE_BYTES_WRITTEN=224903 13/09/26 12:19:58 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=41 13/09/26 12:19:58 INFO mapred.JobClient: Map-Reduce Framework 13/09/26 12:19:58 INFO mapred.JobClient: Map output materialized bytes=91 13/09/26 12:19:58 INFO mapred.JobClient: Map input records=2 13/09/26 12:19:58 INFO mapred.JobClient: Reduce shuffle bytes=91 13/09/26 12:19:58 INFO mapred.JobClient: Spilled Records=12 13/09/26 12:19:58 INFO mapred.JobClient: Map output bytes=82 13/09/26 12:19:58 INFO mapred.JobClient: Total committed heap usage (bytes)=445067264 13/09/26 12:19:58 INFO mapred.JobClient: CPU time spent (ms)=1600 13/09/26 12:19:58 INFO mapred.JobClient: Map input bytes=50 13/09/26 12:19:58 INFO mapred.JobClient: SPLIT_RAW_BYTES=342 13/09/26 12:19:58 INFO mapred.JobClient: Combine input records=8 13/09/26 12:19:58 INFO mapred.JobClient: Reduce input records=6 13/09/26 12:19:58 INFO mapred.JobClient: Reduce input groups=5 13/09/26 12:19:58 INFO mapred.JobClient: Combine output records=6 13/09/26 12:19:58 INFO mapred.JobClient: Physical memory (bytes) snapshot=470908928 13/09/26 12:19:58 INFO mapred.JobClient: Reduce output records=5 13/09/26 12:19:58 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1494261760 13/09/26 12:19:58 INFO mapred.JobClient: Map output records=8 [hadoop@localhost test]$ hadoop dfs -ls /user/hadoop/test/wordcount/output Warning: $HADOOP_HOME is deprecated. Found 3 items -rw-r--r-- 1 hadoop supergroup 0 2013-09-26 12:19 /user/hadoop/test/wordcount/output/_SUCCESS drwxr-xr-x - hadoop supergroup 0 2013-09-26 12:19 /user/hadoop/test/wordcount/output/_logs -rw-r--r-- 1 hadoop supergroup 41 2013-09-26 12:19 /user/hadoop/test/wordcount/output/part-00000 [hadoop@localhost test]$ hadoop dfs -cat /user/hadoop/test/wordcount/output/part-00000 Warning: $HADOOP_HOME is deprecated. Bye 1 Goodbye 1 Hadoop 2 Hello 2 World 2 [hadoop@localhost test]$