What is the minimal adequate model and why do we build it


Assignment: Organizations That Were Affected By Hurricane Katrina

a. Introduction:

i. Identify the dependent variable and independent variables in the imports-85 data set.

ii. Based on what you have learned this week about multiple linear regression, provide a one-paragraph masters-level response describing what you anticipate that the lm algorithm will accomplish for the imports-85 data? Be specific about the behavior and structure of multiple linear regression model. (80-100 words)

b. Data Pre-Processing: Load the imports-85 data into R Studio using the read.csv command (do not use File > Import Dataset > From CSV in the R Studio GUI as this uses read_csv() resulting in significant different variable types!!!).

i. Run the commands to remove the following variables: engine_type, make, num_of_cylinders, fuel_system. Include the commands and output screenshot.

ii. What additional data pre-processing (if any) does the lm() method require for the imports-85 data? Include the commands you ran and the output screenshot.

c. Multiple Linear Regression - Running the Method with Training Data:

i. Run ‘set.seed(12345)' and then split the data into a training set consisting of 70% of the instances and a test set containing the remaining 30% of the instances. Includes the commands below.

ii. Run the lm() function to build the multiple linear regression model storing the results in a variable called ‘mlr_model'. Include the command you ran and a brief discussion about the default input parameters used.

iii. Run the command ‘summary(mlr_model)'. Include the output screenshot and answer the following questions:

How does the model represent the relationship between dependent and independent variables in the import-85 dataset? (80 words)

How does the method handle categorical variables?

What does the residuals section of the output mean?

What are the coefficients and what do they mean?

What is an intercept and what does it mean?

What do the p-values tell about the significance of each variable?

What is the overall accuracy of the model?

d. Multiple Linear Regression-Evaluate the Model with Test Data:

i. Run the command to evaluate the ‘mlr_model' on the imports-85 test data Include the command below.

ii. Run the command to build the predicted vs. actual (observed) value scatter plot. Add a diagonal line to this plot. Include the commands and the final plot with the diagonal line below.

iii. What does the distance between points and the diagonal line tell us about the accuracy of the prediction?

e. Multiple Linear Regression - Residual Plots:

i. Run the ‘plot(mlr_model)' command to build the residuals plots. Interpret at least one of the plots. Include the command, the plot, and the interpretation of that plot below.

Interpretation:

f. Multiple Linear Regression - Minimum Adequate Model:

i. What is the minimal adequate model? Why do we build it? Provide a one-paragraph, masters-level response. (80-120 words)

ii. Run the command to build the minimum adequate model and store the model in a variable named ‘mlr_model_min'. Include the command and output screenshot.

iii. Run the ‘summary(mlr_model_min)' command. Include the command, output screenshot, and answers to the following questions:

Which variables were eliminated and which variables remain?

What are the coefficients and the intercept? What do the coefficient and intercept mean?

Compare the prediction accuracy of the minimum adequate model with the prediction accuracy of the original model. Provide a one-paragraph, masters-level response.

g. New Instance:

i. Suppose that we have a new car added to the imports-85 data set. We know the values of the independent variables. How would you use the model to predict the value of the dependent variable for the new car? (Hint: Use the lessons learned and hints from the prior week to complete this exercise). Include the command you would run below:

h. Summary:

i. Is the multiple linear regression method appropriate for predicting the values of dependent variables in the imports-85 dataset? Explain why or why not. Provide a one-paragraph, masters-level response. (80-120 words)

ii. Which part of this exercise did you find the most challenging and what steps did you take to resolve the challenge?

Format your assignment according to the following formatting requirements:

1. The answer should be typed, double spaced, using Times New Roman font (size 12), with one-inch margins on all sides.

2. The response also includes a cover page containing the title of the assignment, the student's name, the course title, and the date. The cover page is not included in the required page length.

3. Also include a reference page. The Citations and references should follow APA format. The reference page is not included in the required page length.

Attachment:- Imports.rar

Request for Solution File

Ask an Expert for Answer!!
Database Management System: What is the minimal adequate model and why do we build it
Reference No:- TGS02977800

Expected delivery within 24 Hours