Construct and plot a 95 confidence ellipse for the pair and


1. The vector of random variables (X1, X2, X3)T follows a trivariate normal distribution with mean and covariance matrix given by


1
3 1 -2
μ =    -2 Σ = 1 2 -1

0
-2 -1 1.5

(a) Find the joint distribution of (X1, X3).
(b) Find the joint conditional distribution of (X1, X3)|X2 = 1.

(c) Find the joint distribution of

2X1 - X3
X2 + 4X3 + 1

2. Let X ~ Np(µ, Σ). Show via moment generating function that the quadratic shown below is distributed as a central Chi-Square distribution with degrees of freedom p.

(X - µ)T Σ-1 (X - µ) ∼ χp2

Recall that the moment generating function of a Chi-Square distribution with degrees of freedom p is given by M (t) = (1 - 2t)-p/2. A helpful property here is that for generic independent random variables Y1, ..., Yn: MY1+...+Yn (t) = E(etΣt=1n Yi) = Πni=1 E(etYi)

3. Consider the regression problem Y|X = Xβ + R, in which R ~ N (0 , σ2I), X is an n × p matrix, βp×1 is the parameter vector, and Yn×1 is the vector of response variable.

Show that

(a) β^M LE = (XT X)-1XTY

(b) σ^2MLE = (Y-Xβ^MLE )T(Y - Xβ^MLB)

4. We often mention that n (sample size) must be much larger than p (the dimension of each observation) in order for the Central Limit Theorem to be an accurate approximation particularly when the data do not come from a normal distribution.

Recall for the univarite t-distribution, the smaller the degrees of freedom, the larger the kurtosis. Similarly, in the multivariate case, the lower the degrees of freedom, the further the distribution deviates from normality (particularly via kurtosis). The following code simulates data from a p-variate t distribution with degrees of freedom 6, and a covariance matrix that was simulated from a Wishart with p degrees of freedom:

Σ ∼ Wishart(p, Ip)

X1, ...Xn ii~d tp(Σ, df = 6)

Use the code below to input atleast three values of p that contain one low, medium, and high value (e.g. 2, 5, 20), and assess the normality of the sample means for each values of p using n = (10, 100, 1000). Report the qqplots and formal test results for the normality of the sample means. Feel free to test more p's and n's, but you do not need to show qqplots and normality tests for extra results. Provide a written summary of your findings.

library (mvt)

p = p0
N = 5000
means = matrix ( 0 , ncol = ( p ) , nrow = N)

Sigma <- matrix ( rWishart ( 1 , df = p , Sigma = diag ( p ) ), byrow = TRUE, ncol = p)

## Keep the same Sigma for fixed p and varying n

n = n0

for ( i in 1 :N) {

x <- rmvt ( n , sigma = Sigma , df = 6 )

means [ i , ] = apply ( x , 2 , mean)

}

5. Stiffness and bending strength are two variables of interest in the quality of lumber. A sample of 30 pieces of a particular type of wood is provided in the file lumber.txt.

(a) Construct and plot a 95% confidence ellipse for the pair µ = (µ1, µ2), where µ1 = E(Stiffness) and µ2 = E(Bending Strength).

(b) Suppose high quality lumber has µ = (2000, 10000)T . Given the result in part (a), do the data in lumber.txt represent a sample of high quality lumber? Explain.

(c) Given the data, do you think bivariate normal distribution is a good model for the data? Use a QQ-plot, as well as a formal test, to answer this question.

6. Consider the random vector X where

X ~ N3

3
10 5 4
2 , 5 18 7
1
4 7 9

Below, you see 5 simulated samples from this distribution.

6:171516  4:605047  5:8303953
7:595643  1:754275  1:8826819
4:047683   1:791576  0:7613451
1:672295   3:434457  2:1768536
2:904052 3:906055 4:6161726

Of course, the choice of data is arbitrary. Here is how I generated the 5 observations above. Feel free to generate more observations, change the mean, covariance, etc.

library(mvtnorm) mu <- c(3,-2,1)
Sigma <- matrix(c(10,5,4,5,18,7,4,7,9),nrow=3) X <- rmvnorm (5,mu,Sigma)

Now, suppose two of the observations in the data-set above are missing at random, the one on the fist row and first column, as well as the one on the third row and third column. The data-set with the missing components is shown below.

NA 4:605047 5:8303953
7:595643 1:754275 1:8826819
4:047683 1:791576 NA
 1:672295 3:434457 2:1768536
2:904052 3:906055 4:6161726

Use EM algorithm described in your text book to estimate the missing data, the MLE for the mean vector and the MLE for the covariance matrix. Be sure that you run the algorithm long enough to reach convergence say within 1e - 5. Also, consider the algorithm in which we only update the missing xj˜(1) for each subject/observation j = 1, ..., n and then recompute the MLE's directly from the updated dataset. In other words, we skip (5-39), and update Σ˜ from the entire dataset as opposed to trying to separately estimate each x(˜1) (1)T (note that the estimate for x(˜1) (2)T ∼j xj ∼j xj are the same under both algorithms). Discuss your thoughts on the implications of both EM methods. Do you prefer one over the other? Discuss any theoretical benefits/downfalls that you see.

7. Bootstrap is an efficient method in calculating the p-value of a test when the theoretical distribution of the test statistic is not available, and/or if the sample size is too small for the asymptotic approximations. The data file T est.txt includes 30 observations of 3 variables. Interest lies in testing the null hypothesis


4

4
H0:μ =    8 vs. Ha:μ ≠ 8

-2

-2

To calculate the bootstrap p-value, generate 10,000 samples, each of size 30 (with re- placement), from the original sample. For each sample set compute the test statistic:

W = - 2 log((maxΣ∈?0, L(µ0,Σ))/(maxµ,Σ∈?L(µ~, Σ)).

Let Wobs be the above computation for the originally observed dataset. Estimate the p-value Pr(W > Wobs) using the bootstrap samples. Compare your answer to the p-value calculated from the asymptotic distribution of the test statistic (Result 5.2 in the book). Provide a plot of your choice to compare the asymptotic distribution of W to its empirical distribution estimated based on bootstrap samples.

Attachment:- Assignment.rar

Request for Solution File

Ask an Expert for Answer!!
Advanced Statistics: Construct and plot a 95 confidence ellipse for the pair and
Reference No:- TGS01628860

Expected delivery within 24 Hours