Define a sum of squares to measure the total contribution, Engineering Mathematics

Define a sum of squares to measure the total contribution

Question 1 -

Here we will repeat parts 4 through 8 from the Midterm. These have been slightly reworded and re-ordered to make the intention clearer.

We will now also consider x₂. Using both categorical (x₁) and continuous (x₂) covariates often referred to as the Analysis of Covariance (ANCOVA), even if Giles thinks it's all just part of linear regression.

For this, we will write the average value of x₂ among subjects with x₁ = 0 to be x^-_2,0 and among subjects with x₁ = 1 to be x^-_2,1 and write x^~₂ to be x₂ with the group mean subtracted:

and we will set X₂ = [1, x₁, x^~₂].

a. Show that x^~₂ can be written as x2 - α₁1 - α₂x₁. What are α₁ and α₂? You may ?nd earlier questions useful.

b. Write out X₂^TX₂ for this new model. Show that your estimates β^{^}₀ and β^{^}₁ are unchanged from Question 2 in the midterm.

If we are interested in β₁, was there any point to adding x₂?

c. By writing out the prediction equation β^{^}₀ + β^{^}₁x₁ + β^{^}₂x^~₂ in terms of x₂ = x^~₂ + α₁1 + α₂x₁, ?nd β^{^}*₁, the estimate of β^{^}₁ in a model where we used X*₂ = [1, x₁, x₂] instead of X.

Why has β^{^}₂ not changed? What is the variance of β^{^}*₁?

d. Show that the variance of β^{^}*₁ obtained above is equal to the variance pf β^{^}₁ times the VIF for x₂. The following will be helpful:

x^~T₂H₁CH₁x^~₂ = n0(x^-_2,0 - x^-₂)² + n₁(x^-_2,1 - x^-₂)² = n/n₀n₁(x^-_2,1 - x^-_2,0)²

e. There is a concern that the slope on x₂ might be different between the x₁ = 1 group and the x₁ = 0 group. For this reason, the researcher considers adding an interaction term to produce a design matrix X = [1, x₁, x^~₂, x₁x^~₂] where the last column is the element-wise product of x₁ and x^~₂.

Define a sum of squares to measure the total contribution of x^~₂ to the model in this case.

Bonus - The paper "A Two-State Algorithm for Fairness-aware Machine Learning" by Junpei Komiyama and Hajime Shimao appeared on ArXiv on October 15 this year - Giles read it during the midterm.

Recently, there has been considerable interest in the possibility that machine learning can exacerbate social biases, with examples including face recognition that performs much worse on african americans and evidence that tools used to predict re-offending in parol hearings giving worse scores to disadvantaged groups.

A particular problem is that a tool does not have to explicitly use a protected value like sex or race or age in order to discriminate. It could use something correlated like zip code. One notion of fairness in these circumstances is that the average prediction in classes of a protected value should be the same - men and women should, on average, be treated as having the same probability of committing a crime.

(There are many notions of fairness and this is a topic of very current debate in machine learning.)

Komiyama and Shimao consider using linear regression as a prediction tool in a situation where you have protected variables Z, useful covariates X and a response to predict y. They suggest the following 1. Regress each column of X on Z and take the residuals to get X^~2. Now predict y using X^~.

We will look at this in the context of the questions above. Here we think of x₁ as a prodcted variable, and x₂ as something we want to use to predict. x^~₂ is the residual after regressing x₂ on x₁.

Consider using the linear model

y = β₀1 + β₁x^~₂ + ∈

and show that the average of the fitted values when x₁ = 0 is the same as the average of the fitted values when x₁ = 1.

Can you generalize this to using a matrix of protected values Z and a matrix of covariates X?

Question 2 -

Here we will repeat the analysis above but more generally, with the idea of getting specific about the interpretation of a sequential ANOVA test.

We know that the sums of squares for each covariate is unchanged when the covariates are orthogonal. When they aren't, we need to ask "What is the null hypothesis for this test?"

We describe the test as being "the additional effect of x_j after controlling for X_j-1", but what does that mean, mathematically?

To do this, we'll break up the covariate matrix X = [X_j-1, X^-_j] where X^-_j = [x_j, . . . , x_p] and similarly, the coefficient vector will be broken into β = (β_j-1^T, β_j^-T)^T so that we can write the linear regression model as y = X_j-1β_j-1 + X^-_jβ^-_j + e.

We will not assume that X_j is orthogonal to X^-_j. Note that this can be done for any choice of j ∈ {1, . . . , p}.

a. Consider regressing y on only X_j-1. Give an expression for the estimated β^{^}_j-1.

b. Show the fitted values (written in terms of true coefficients and errors) from the full regression can be re-written using the fitted values from Part 1a. and the matrix of residuals R^-_j obtained from regressing each column of X^-_j on X_j-1.

c. Show that the sums of squares y^T = (H - H_j-1)y for X^-_j|X_j-1 is the same if you replace X^-_j with R^-_j (Why must Hy be the same in both cases?)

d. Within the sequential test for x_j|X_j-1 show that the sum of squares y^T(H_j - H_j-1)y the same whether you use the original X or X* = [X_j-1, (I - H_j-1)x_j, (I - H)X^-_j+1].

e. Show that, when using X* (with corresponding coefficients β*, the sum of squares y^T = (H_j - H_j-1)y is only affected by the true value of β_j.

f. Hence give a detailed interpretation of the meaning of rejecting the jth sequential test.

Question 3 -

Here we will illustrate the results from Question 1 with a real world data set. We will use the study of mortality in 55 US cities as it is influenced by pollutants NOX (nitrous oxide) and SO2 (sulfur dioxide), while controlling weather (PRECIP) and sociological variables (EDUC and NONWHITE) that appeared on the midterm. In this case we will be interested in the sequential test for EDUC with the covariates taken in the order in the data set.

You can find the data in airpollution.csv on CMS.

a. Create a new data set (referred to X* below) in which NONWHITE, NOX and SO2 are replaced with the residuals after regressing each of them on PRECIP and EDUC.

b. Show that when producing a model to predict MORT with either the original covariates or the new covariates, you get the same predicted values (use the maximum absolute difference in predictions to show this).

c. Add SO2 to MORT (this increases the coefficient of SO2 in the model by 1) and obtain a sequential ANOVA table (using the function anova) using the new response. Show that this changes the sum of squares for EDUC when using the original data.

d. Do the same thing using the new data set X* and observe that the sum of squares for EDUC does not change.

e. What happens if you add EDUC to MORT (ie, make its coefficient larger) instead? Are there differences between the two data sets? Why?

Attachment:- Assignment Files.rar

View Complete Question

Request for Solution File

Ask an Expert for Answer!!

Engineering Mathematics: Define a sum of squares to measure the total contribution

Reference No:- TGS02488833

Expected delivery within 24 Hours

Have a Question? (oR Write a Review)

Write atleast 100 words!!

Request for Solution File

Ask an Expert for Answer!!

Engineering Mathematics: Define a sum of squares to measure the total contribution

Reference No:- TGS02488833

Have a Question? (oR Write a Review)

Recent Questions Asked Engineering Mathematics

Q : Kester uses normal costing and applies overhead on the

Q : The firm wants to earn a minimal average accounting return

Q : Develop the skills to assess the success of the plan make

Q : Explain why they are strong or weak finally explain why you

Q : Define a sum of squares to measure the total contribution

Q : The required rate of return is 20 percent the net present

Q : What is an mnc and how might cultural difference affect the

Q : For each of the given types of industries give an example

Q : The concept of separate legal personality is fundamental to

Discuss moderate physical activity to maintain weight loss

When using a transfer belt it should be placed

Various treatments involve non-invasive deep brain

Discussion about health benefits of soluble fiber

What when a patient is recieving an enema

Why botswana has endured a greater burden of disease

Discuss effective mental health promotion strategies

Request for Solution File

Ask an Expert for Answer!!

Engineering Mathematics: Define a sum of squares to measure the total contribution

Reference No:- TGS02488833

Recent Questions Asked Engineering Mathematics

Q : Kester uses normal costing and applies overhead on the

Q : The firm wants to earn a minimal average accounting return

Q : Develop the skills to assess the success of the plan make

Q : Explain why they are strong or weak finally explain why you

Q : Define a sum of squares to measure the total contribution

Q : The required rate of return is 20 percent the net present

Q : What is an mnc and how might cultural difference affect the

Q : For each of the given types of industries give an example

Q : The concept of separate legal personality is fundamental to

Asked Questions