Build a table to be used as the basis for a directed


QUESTION 1. HIVE OR PIG PROGRAMMING

Build a table to be used as the basis for a directed weighted graph (network) that shows which stations are connected. Use the trip count between those stations as the weights. Save this table in ‘/user/lab/q1' in HDFS.

Include the directory listing of the output directory and first five lines of the output file in your submission.

QUESTION 2. HIVE OR PIG PROGRAMMING

Load the data from ‘/user/lab/q1' you created in the previous question. For each route, calculate a ‘traffic index' where

Save the results in ‘/user/lab/q2' in HDFS. Include the directory listing of the output directory and first five lines of the output file in your submission.

QUESTION 3. SPARK PROGRAMMING

Load the data from ‘/user/lab/q2' you created in the previous question.

a) Find the 3 stations with the highest in-degree. What does that mean in real life?

b) Find the 3 stations with the highest out-degree. What does that mean in real life?

Submit scripts/queries (Pig, Hive, Spark, Hadoop) and final output (screen copy or screen shot as appropriate).

Attachment:- Trip and station data.rar

Solution Preview :

Prepared by a verified Expert
Python Programming: Build a table to be used as the basis for a directed
Reference No:- TGS02818285

Now Priced at $60 (50% Discount)

Recommended (93%)

Rated (4.5/5)