Create scatter plots and pairwise correlations between four


Assignment

Problem 1 Download the data set onstudent achievement in secondary education math education of two Portuguese schools (use the data set Students Math). Using any packages you wish, complete the following tasks:

• Create scatter plots and pairwise correlations between four continuous variables and the final grade (G3) using the pairs.panels() function in R. Pick the variables you believe are most useful.

• Build a multiple regression model predicting final math grade (G3) using as many features as you like but you must use at least four. Include at least one categorical variables and be sure to properly convert it to dummy codes. Select the features that you believe are useful -- you do not have to include all features.

• Use stepwise backward elimination to remove all non-significant variables and then state the final model as an equation. State the backward elimination measure you applied (p-value, AIC, Adjusted R2). This tutorial shows how to use various feature elimination techniques.

• Calculate the 95% confidence interval for a prediction -- you may choose any data you wish for some new student.

• What is the RMSEfor this model -- use the entire data set for both training and validation. You may find the residuals() function useful. Alternatively, you can inspect the model object, e.g., if your model is in the variable m, then the residuals (errors) are in m$residuals and your predicted values (fitted values) are in m$fitted.values.

Problem 2 For this problem, the following short tutorial might be helpful in interpreting the logistic regression output.

1. Using the same data set as in Problem (1), add another column, PF -- pass-fail. Mark any student whose final grade is less than 10 as F, otherwise as P and then build a dummy code variable for that new column. Use the new dummy variable column as the response variable.

2. Build a binomial logistic regression model classifying a student as passing or failing. Eliminate any non-significant variable using an elimination approach of your choice. Use as many features as you like but you must use at least four -- choose the ones you believe are most useful.

3. State the regression equation.

4. What is the accuracy of your model? Use the entire data set for both training and validation.

Request for Solution File

Ask an Expert for Answer!!
Computer Engineering: Create scatter plots and pairwise correlations between four
Reference No:- TGS02686590

Expected delivery within 24 Hours