Do the residuals appear to have a normal distribution does


Assignment

Data on air pollution were collected from 41 U.S. cities. The type of air pollution under study was the annual mean concentration of sulfur dioxide. The values of six explanatory variables were also recorded. The variables in the data are as follows:

y : the annual mean concentration of sulfur dioxide (micrograms per cubic meter)
?x1 : average annual temperature in oF
?x2 : number of manufacturing enterprises emplying 20 or more workers
?x3 : population size (thousands)
?x4 : average annual wind speed (mph)
?x5 : average annual precipitation (inches)
?x6 : average number of days with precipitation per year

A model relating y to the six explanatory variables is of interest in order to determine which of the six explanatory variables are related to sulfur dioxide pollution and to be able to predict air pollution for given values of the explanatory variables.

1. Relationship between y and x, and collinearity.

(a) Plot y versus each of the explanatory variables. From your plots determine if higher order terms are needed in any of the explanatory variables.

(b) Using correlation coecients, determine whether there is any evidence of collinearity in the data.

(c) Obtain VIF for each of the explanatory variables from tting a regresson model with y as the response and all six explanatory variables, x1 through x6, as predictors. Does there appear to be any collinearity problems based on the VIF values?

2. Model selection.

(a) Use the best subset regression to obtain the two best models of all possible sizes of p. Obtain values for R2, R2adj , Cp, and s(i.e.,s) for each of the models.

(b) Based on the information from part (a) and using R2adj as your model selection criterion, select the model that you think is best.

(c) Using the information from part (b), which variables were most highly related to sulfur dioxide air pollution?

3. Checking model assumptions. Using your model selected from 2(b), do the following:

(a) Do the residuals appear to have a normal distribution? Justify your answer.

(b) Does the condition of constant variance appear to be satised? Justify your answer.

(c) Find an appropriate transformation of Y so that the assumptions for regression will be satised. Find the "best" model using the transformed Y and the backward variable selection method.

4. Outlying and inuential observations: based on the model you selected in problem 3(c) using transformed Y , do the following:

(a) Do any of the data points appear to have high inuence? Leverage? Justify your answer.

(b) If you identied any high leverage or high inuence points in part (a), compare the estimated models with and without these points

5. Prediction for new observations: based on the model you selected in problem 3(c) using transformed Y , do the following:

(a) Estimate the average level of sulfur dioxide content of the air in a city having the following values for the six explanatory variables: x1 = 60, x2 = 150, x3 = 600, x4 = 10, x5 = 40, and x6 = 100.

(b) Place a 95% condence interval on your estimated sulfur dioxide level and interpret this interval.

(c) Place a 95% prediction interval on your estimated sulfur dioxide level and interpret this interval.

Request for Solution File

Ask an Expert for Answer!!
Simulation in MATLAB: Do the residuals appear to have a normal distribution does
Reference No:- TGS02294022

Expected delivery within 24 Hours