Run The Word Count Program Instructions | Coursera

Run%20the%20WordCount%20program%20Instructions%20%7C%20Coursera

User Manual:

Open the PDF directly: View PDF PDF.
Page Count: 6

DownloadRun The Word Count Program Instructions | Coursera
Open PDF In BrowserView PDF
Run the WordCount program Instructions | Coursera

Back to Week 3

Lessons

10/20/18, 4)14 PM

This Course: Introduction to Big Data

Prev

Next

Learning Goals
By the end of this activity, you will be able to:
Execute the WordCount application.
Copy the results from WordCount out of HDFS.

1. Open a terminal shell. Start the Cloudera VM in VirtualBox, if not already running,
and open a terminal shell. Detailed instructions for these steps can be found in the
previous Readings.

2. See example MapReduce programs. Hadoop comes with several example
MapReduce applications. You can see a list of them by running hadoop jar
/usr/jars/hadoop-examples.jar. We are interested in running WordCount.

https://www.coursera.org/learn/big-data-introduction/supplement/2myPr/run-the-wordcount-program-instructions

Page 1 of 6

Run the WordCount program Instructions | Coursera

10/20/18, 4)14 PM

The output says that WordCount takes the name of one or more input files and the
name of the output directory. Note that these files are in HDFS, not the local file
system.

3. Verify input file exists. In the previous Reading, we downloaded the complete
works of Shakespeare and copied them into HDFS. Let's make sure this file is still in
HDFS so we can run WordCount on it. Run hadoop fs -ls

4. See WordCount command line arguments. We can learn how to run WordCount
by examining its command-line arguments. Run hadoop jar /usr/jars/hadoopexamples.jar wordcount.

https://www.coursera.org/learn/big-data-introduction/supplement/2myPr/run-the-wordcount-program-instructions

Page 2 of 6

Run the WordCount program Instructions | Coursera

10/20/18, 4)14 PM

5. Run WordCount. Run WordCount for words.txt: hadoop jar /usr/jars/hadoopexamples.jar wordcount words.txt out

As WordCount executes, the Hadoop prints the progress in terms of Map and Reduce.
When the WordCount is complete, both will say 100%.

6. See WordCount output directory. Once WordCount is finished, let's verify the
output was created. First, let's see that the output directory, out, was created in HDFS
by running hadoop fs –ls

We can see there are now two items in HDFS: words.txt is the text file that we
previously created, and out is the directory created by WordCount.

7. Look inside output directory. The directory created by WordCount contains
several files. Look inside the directory by running hadoop –fs ls out

The file part-r-00000 contains the results from WordCount. The file _SUCCESS means
WordCount executed successfully.

8. Copy WordCount results to local file system. Copy part-r-00000 to the local file
system by running hadoop fs –copyToLocal out/part-r-00000 local.txt

https://www.coursera.org/learn/big-data-introduction/supplement/2myPr/run-the-wordcount-program-instructions

Page 3 of 6

Run the WordCount program Instructions | Coursera

10/20/18, 4)14 PM

9. View the WordCount results. View the contents of the results: more local.txt

Each line of the results file shows the number of occurrences for a word in the input
file. For example, Accuse appears four times in the input, but Accusing appears only
once.

https://www.coursera.org/learn/big-data-introduction/supplement/2myPr/run-the-wordcount-program-instructions

Page 4 of 6

Run the WordCount program Instructions | Coursera

https://www.coursera.org/learn/big-data-introduction/supplement/2myPr/run-the-wordcount-program-instructions

10/20/18, 4)14 PM

Page 5 of 6

Run the WordCount program Instructions | Coursera

10/20/18, 4)14 PM

Mark as completed

https://www.coursera.org/learn/big-data-introduction/supplement/2myPr/run-the-wordcount-program-instructions

Page 6 of 6



Source Exif Data:
File Type                       : PDF
File Type Extension             : pdf
MIME Type                       : application/pdf
Linearized                      : No
Page Count                      : 6
PDF Version                     : 1.4
Title                           : Run the WordCount program Instructions | Coursera
Author                          : John Leonard
Subject                         : 
Producer                        : macOS Version 10.14 (Build 18A391) Quartz PDFContext
Creator                         : Safari
Create Date                     : 2018:10:20 23:14:10Z
Modify Date                     : 2018:10:20 23:14:10Z
Apple Keywords                  : 
EXIF Metadata provided by EXIF.tools

Navigation menu