Now create a multiple linear regression using numerical


Prostate cancer Data:

Hastie, Tibshirani and Friedman (2001) analyze data taken from Stamey et al. (1989). According to Hastie, Tibshirani and Friedman: The goal is to predict the log-cancer volume (lacavol) from a number of measurements including log prostate weight (lweight), age, log of benign prostatic hyperplasia (lpbh), seminal vesicle invasion (svi) "Categorical Variable", log of capsular penetration (lcp), Gleason score (gleason), and percent of Gleason scores 4 or 5, train "Categorical Variable".

Guide: Please try to answer the following questions:

1) Start by indentifying your response variable and your potential predictors.

2) Create a summary statistics for each of the variables in your data. What can you tell about each variable?

3) Find the best predictor (just one numerical variable) of your response variable the find the Least Squares Regression line. Check the assumptions (Diagnostics). Create a correlation matrix (you may graph it).

4) Find X`X, then use the matrix notations and calculations to get the best linear model of your best predictor. Show how you can get all elements of the anova table for your simple linear regression. Find an estimate for σ2.

5) Use both Inverse response plot and Box-Cox to find the best power of the predicted variable and perform the transformation, check your model, is there any enhancement?

6) Find the best predictor (just one categorical variable) of your response variable the find the Least Squares Regression line. Check the assumptions (Diagnostics).

7) Use the t.test analysis and calculations to get the best linear model of your best predictor.

8) Now create a Multiple linear regression using numerical predictors only. Summaries your results and check the assumptions. Show how you can get all elements of the anova table for your multiple linear regression. . Find an estimate for σ2.

9) Do you have a multicollinearity problem? Do you have a heteroscedasticity problem? What can you do to fix those? Summaries your final model. Show how would you calculate the VIF for one predictor (Best one).

10) Now with the fixed model (part 9)that has the less number of predictors (reduced). Show that the difference in the R2 is not significant using F test as well as using anova command in R.

11) Use matrix notations and calculations to get the reduced model coefficients. Show that both approaches give the same results. Find the var-cov matrix of the Betas. Find the Hat matrix (H) then calculate the sum of its diagonal.

12) Add the two categorical variables to your reduced model. Is your R2 better? Are those variables significant? Explicitly write all the possible models derived from the main model for each category then interpret the coefficients of each model.

13) State your final model, check its assumptions, Is there any room to enhance it (MMPs, AVPlots, Box-Cox and Inverse response plot. create the anova table for your model. Find an estimate for σ2.

14) Make a table for the AIC, AICc and BIC for your simple linear model, full model and your final reduced model.

15) Write a paragraph explaining your rational of ending up with such a model.

Attachment:- Data.rar

Request for Solution File

Ask an Expert for Answer!!
Basic Statistics: Now create a multiple linear regression using numerical
Reference No:- TGS01461116

Expected delivery within 24 Hours