Calculate the average and a standard deviation


Problem

To understand how the parallel computing works for data mining, we are going to imitate the work of computers in small groups to calculate simple statistical characteristics (mean and standard deviation) by acting as a node of a distributed computer cluster (1 student = 1 node).

Directions

Download the dataset.csv file. It has recorded data values. The goal is to calculate the average and a standard deviation of that variable as a group.

In your initial post describe the algorithm that a central node and computing nodes will need to do to compute the average and the standard deviation of the dataset, given that computing nodes can only work with the assigned fraction in the dataset. Explain what parallelization technique you will use, and why.

The first student who submits the initial post will be serving as a central node, which should split the dataset and assign each portion to each student in the group (no data should be assigned to himself).

Then each of you should conduct calculation of the fraction of the dataset and post the needed aggregated information in the discussion board. When all partial results are in, the student playing the central node should aggregate them and post the dataset results.

To calculate standard deviation you may need to conduct two iterations. Make sure to complete both of them by Sunday.

If one of the students did not submit the partial results, a student who plays the central node may decide how to distribute the missing fraction of the dataset between other students (nodes).

All communication should be conducted within the discussion board.

Request for Solution File

Ask an Expert for Answer!!
Computer Engineering: Calculate the average and a standard deviation
Reference No:- TGS03312257

Expected delivery within 24 Hours