Spark programming assignment - download the input files


Spark Programming Assignment

This is Hadoop SPARK assignment. Please read carefully Assignment 3 SPARK text file. We use Oracle VM Hortonworks Sandbox 2.4 to simulate hadoop.

Download the input files from Resources section of the course page and upload to your VM.

Copy the files to /user/lab/ in the HDFS.

If you decide to use the file on your local system instead of HDFS, please state this in your submit file.

1. ODD/EVEN NUMBER
(Hint: Note that you are reading the file as text and need to convert the numbers to int())

Input: number_list.txt (a list of 1000 integers)
Output: Count the number of odd numbers and even numbers in the file

2. Top 10 and bottom 10 words
(Hint: Search and try takeOrdered() method)

Input: shakespeare.txt
Output: 10 words with the highest count and 10 words with lowest count

3. Group and Count

Input: fulltext_txt
Output: Count the number of tweets for each user_id and save the results in a text file.

Attachment:- Assignment.rar

Solution Preview :

Prepared by a verified Expert
Programming Languages: Spark programming assignment - download the input files
Reference No:- TGS02810374

Now Priced at $75 (50% Discount)

Recommended (92%)

Rated (4.4/5)