Math2349 data preprocessing - read the species and surveys, Dissertation

Math2349 data preprocessing - read the species and surveys

Assignment Tasks:

You will use WHO data set for Tasks 1- 5. Read the WHO data using an appropriate function and complete the tasks 1-5.

1- Tidy Task 1:

Use appropriate "tidyr" functions to reshape the WHO data set into the form given below:

2- Tidy Task 2:

The WHO data set is not in a tidy format yet. The "code" column still contains four different variables' information (see variable description section for the details). Separate the "code" column and form four new variables using appropriate "tidyr" functions. The final format of the WHO data set for this task should be in the form given below:

3- Tidy Task 3:

The WHO data set is not in a tidy format yet. The "rel", "ep", "sn", and "sp" keys need to be in their own columns as we will treat each of these as a separate variable. In this step, move the "rel", "ep", "sn", and "sp" keys into their own columns. The final format of the WHO data set for this task should be in the form given below:

4- Tidy Task 4:

There is one more step to tidy the WHO data set. We have two categorical variables "sex" and "age". Use "mutate()" to factorise sex and age. For "age" variable, you need to create labels and also order the variable. Labels would be: <15, 15-24, 25-34, 35-44, 45-54, 55-64, 65>=. The final tidy version of the WHO data set would look like this:

5- Task 5: Filter & Select

Drop the redundant columns "iso2" and "new", and filter any three countries from the tidy version of the WHO data set. Name this subset of the data frame as "WHO_subset".

You will use surveys and species data sets for Tasks 6 - 10. Read the species and surveys data sets using an appropriate function. Name these data frames as "species" and "surveys", respectively.

6- Task 6: Join

Combine "surveys" and "species" data frames using the key variable "species_id". For this task, you need to add the species information ("genus", "species", "taxa") to the "surveys" data. Rename the combined data frame as "surveys_combined".

7- Task 7: Calculate

Using the "surveys_combined" data frame, calculate the average weight and hindfoot length of one of the species observed in each month (irrespective of the year). Make sure to exclude missing values while calculating the average.

8- Task 8: Missing Values

Select one of the years in the "surveys_combined" dataframe, rename this data set as "surveys_combined_year". Using "surveys_combined_year" dataframe, find the total missing values in "weight" column grouped by species. Replace the missing values in "weight" column with the mean values of each species. Save this imputed data as "surveys_weight_imputed".

9- Task 9: Inconsistencies or Special Values

Inspect the "weight" column in "surveys_weight_imputed" dataframe for any further inconsistencies or special values (i.e., NaN, Inf, -Inf). Trace back and explain briefly why you got such a value.

10- Task 10: Outliers

Using the "surveys_combined" data frame, inspect the variable hindfoot length for possible univariate outliers. If you detect any outliers use any of the methods outlined in the Module 6 notes to deal with them. Explain briefly the actions that you take to handle outliers.

Attachment:- Assignment.zip

View Complete Question

Request for Solution File

Ask an Expert for Answer!!

Dissertation: Math2349 data preprocessing - read the species and surveys

Reference No:- TGS02766288

Expected delivery within 24 Hours

Have a Question? (oR Write a Review)

Write atleast 100 words!!

Request for Solution File

Ask an Expert for Answer!!

Dissertation: Math2349 data preprocessing - read the species and surveys

Reference No:- TGS02766288

Have a Question? (oR Write a Review)

Recent Questions Asked Dissertation

Q : Evans emergency response bonds have 4 years to maturity

Q : Recommend best practices for backup plans to a warehouse

Q : Journalize the adjusting entry for inventory shrinkage for

Q : Were looking at a new project we plan to sell 7600 units

Q : Math2349 data preprocessing - read the species and surveys

Q : For roche inc variable manufacturing overhead costs are

Q : What is the average time to read a single

Q : In evaluating projects buford engineers be uses a discount

Q : Generate a program that prompts the user to enter a

What is society dependent upon

What impact does the internet have on deviant behavior

What latent function of deviance does primarily represent

What is a primary theme of this text

Negotiate the challenges and opportunities of your setting

Review one of the feminist theories

Understanding the meaning of human experiences

Request for Solution File

Ask an Expert for Answer!!

Dissertation: Math2349 data preprocessing - read the species and surveys

Reference No:- TGS02766288

Recent Questions Asked Dissertation

Q : Evans emergency response bonds have 4 years to maturity

Q : Recommend best practices for backup plans to a warehouse

Q : Journalize the adjusting entry for inventory shrinkage for

Q : Were looking at a new project we plan to sell 7600 units

Q : Math2349 data preprocessing - read the species and surveys

Q : For roche inc variable manufacturing overhead costs are

Q : What is the average time to read a single

Q : In evaluating projects buford engineers be uses a discount

Q : Generate a program that prompts the user to enter a

Asked Questions