Analyze datasets by interpreting summary statistics


Course Learning Outcomes:

A) Assessed through student ability to apply knowledge of multivariate functions, data transformations and data distributions to summarize data sets.

B) Assessed through the student ability to analyze datasets by interpreting summary statistics, model and function parameters.

C) Assessed through student ability to develop software codes to solve computational problems for real world analytics.

Assignment Aim: This assignment will test your knowledge and understanding of the aggregation functions and their applications for data summarization and prediction. This assignment will also test your ability in R programming, in using specific R commands as well as R packages.

Assignment Task - Problem solving

Using aggregation functions for data analysis.

Assignment Tasks:

1. Understand the data

(i) Use the txt file (Energy20.txt) file (you downloaded now from FutureLearn) and add it to your R working directory.

(ii) Assign the data to a matrix, e.g. using: the.data <- as.matrix(read.table("Energy20.txt "))

(iii) The variable of interest is Energy use of appliances (Y). To investigate Y, generate a subset of 300 data, e.g. using: my.data <- the.data[sample(1:671,300),c(1:6)]

(iv) Using scatter plots and histograms, report on the general relationship between each of the variables X1, X2, X3, X4, X5 and the variable of interest Y. Include 5 scatter plots, 6 histograms, and 1 or 2 sentences for each of the variables, including the variable of interest Y.

2. Transform the data

(i) Choose any four from the five variables (X1, X2,... , X5). Make appropriate transformations to the chosen four variables and the variable of interest Y so that the values can be aggregated in order to predict the variable of interest. Assign your transformed data along with your transformed variable of interest to an array (it should be 300 rows and 5 columns). Save it to a txt file titled "name- transformed.txt" using: write.table(your.data,"name-transformed.txt") where "name" is replaced with your name - you can use your surname or first name.

(ii) Briefly explain the transformations applied for the selected four variables and the variable of interest. (1- 2 sentences each).

3. Build models and investigate the importance of each variable

(i) Use the AggWaFit718.R file (you downloaded from Future Learn) and add it to your working directory and load into the R workspace using, source("AggWaFit718.R")

(ii) Use the fitting functions to learn the parameters for

A weighted arithmetic mean (WAM)

Weighted power means (WPM) with p = 0.5, and p = 2,

An ordered weighted averaging function (OWA), and

A Choquet integral.

(iii) Include two tables in your report - one with the error measures and correlation coefficients, and one summarizing the weights/parameters and any other useful information learned for your data.

(iv) Compare and interpret the data in your tables. Comment on

a. How good the model is?

b. The importance of each of the variables (the four variables that you have selected),

c. Any interaction between any of those variables (are they complementary or redundant?) and

d. Better models favor higher or lower inputs. (1-3 paragraphs for part 3(iv)).

4. Use your model for prediction

(i) Choose your best fitting model. Using your best fitting model, predict the Energy use of appliances for the following input X1=18; X2=44; X3=4; X4=74.8; X5=31.4.

(ii) Give your result and comment on whether you think it is reasonable. (1-2 sentences).

(iii) Comment on the best conditions (in terms of your chosen four variables) under which a low Energy use of appliances will occur. (1-2 sentences).

5. Comparing with a linear regression model

Linear regression is used to predict the value of an outcome variable Y based on one or more input predictor variables X. The equation is Y = β0 + β1X1 + β2X2 + · · · βnXn + ε.

The built-in function lm() is used to fit linear models in R.

(i) Build your linear model using the same dataset in Question 3 and describe the summary statistics for your model using the function summary().

(ii) Compare the performance of the linear model you got with your best fitting model in Question 4. Visualise the predicted Y values of both models on the 300 data and compare them with the true Y values.

(iii) Give your comment on the differences between the linear model and your best fitting model. (2-4 sentences).

A number of students attain top-notch grades by availing our online service. And their positive reviews and feedbacks make our Real World Analytics Assignment Help service number one in the industry. Getting your assignment paper written by the professional tutors can enhance the probability of securing top-notch grades and this is why we are mostly preferred.

Tags: Real World Analytics Assignment Help, Real World Analytics Homework Help, Real World Analytics Coursework, Real World Analytics Solved Assignments, Multivariate Functions Assignment Help, Multivariate Functions Homework Help, Linear Regression Model Assignment Help, Linear Regression Model Homework Help

Attachment:- Real World Analytics.rar

Request for Solution File

Ask an Expert for Answer!!
Other Subject: Analyze datasets by interpreting summary statistics
Reference No:- TGS03032263

Expected delivery within 24 Hours