Construct and plot a 95 confidence ellipse for the pair and, Advanced Statistics

Construct and plot a 95 confidence ellipse for the pair and

1. The vector of random variables (X₁, X₂, X₃)^T follows a trivariate normal distribution with mean and covariance matrix given by

	1		3	1	-2
μ =	-2	Σ =	1	2	-1
	0		-2	-1	1.5

(a) Find the joint distribution of (X₁, X₃).
(b) Find the joint conditional distribution of (X₁, X₃)|X₂ = 1.

2X1 - X3

X2 + 4X3 + 1

2. Let X ~ N_p(µ, Σ). Show via moment generating function that the quadratic shown below is distributed as a central Chi-Square distribution with degrees of freedom p.

(X - µ)^T Σ^-1 (X - µ) ∼ χ_p²

Recall that the moment generating function of a Chi-Square distribution with degrees of freedom p is given by M (t) = (1 - 2t)^-p/2. A helpful property here is that for generic independent random variables Y₁, ..., Y_n: M_Y1+...+Y_n (t) = E(e^tΣ_t=1ⁿ Yi) = Πⁿ_i=1 E(e^tY_i)

3. Consider the regression problem Y|X = Xβ + R, in which R ~ N (0 , σ²I), X is an n × p matrix, β_p×1 is the parameter vector, and Y_n×1 is the vector of response variable.

Show that

(a) β^M LE = (X^T X)^-1X^TY

(b) σ^{^2}_MLE = (Y-Xβ^{^}_MLE )^T(Y - Xβ^{^}MLB)

4. We often mention that n (sample size) must be much larger than p (the dimension of each observation) in order for the Central Limit Theorem to be an accurate approximation particularly when the data do not come from a normal distribution.

Recall for the univarite t-distribution, the smaller the degrees of freedom, the larger the kurtosis. Similarly, in the multivariate case, the lower the degrees of freedom, the further the distribution deviates from normality (particularly via kurtosis). The following code simulates data from a p-variate t distribution with degrees of freedom 6, and a covariance matrix that was simulated from a Wishart with p degrees of freedom:

Σ ∼ Wishart(p, I_p)

X₁, ...X_n ii_~d t_p(Σ, df = 6)

Use the code below to input atleast three values of p that contain one low, medium, and high value (e.g. 2, 5, 20), and assess the normality of the sample means for each values of p using n = (10, 100, 1000). Report the qqplots and formal test results for the normality of the sample means. Feel free to test more p's and n's, but you do not need to show qqplots and normality tests for extra results. Provide a written summary of your findings.

library (mvt)

p = p0
N = 5000
means = matrix ( 0 , ncol = ( p ) , nrow = N)

Sigma <- matrix ( rWishart ( 1 , df = p , Sigma = diag ( p ) ), byrow = TRUE, ncol = p)

## Keep the same Sigma for fixed p and varying n

n = n0

for ( i in 1 :N) {

x <- rmvt ( n , sigma = Sigma , df = 6 )

means [ i , ] = apply ( x , 2 , mean)

}

5. Stiffness and bending strength are two variables of interest in the quality of lumber. A sample of 30 pieces of a particular type of wood is provided in the file lumber.txt.

(a) Construct and plot a 95% confidence ellipse for the pair µ = (µ₁, µ₂), where µ₁ = E(Stiffness) and µ₂= E(Bending Strength).

(b) Suppose high quality lumber has µ = (2000, 10000)T . Given the result in part (a), do the data in lumber.txt represent a sample of high quality lumber? Explain.

(c) Given the data, do you think bivariate normal distribution is a good model for the data? Use a QQ-plot, as well as a formal test, to answer this question.

6. Consider the random vector X where

X ~ N₃

3		10	5	4
2	,	5	18	7
1		4	7	9

Below, you see 5 simulated samples from this distribution.

6:171516	4:605047	5:8303953
7:595643	1:754275	1:8826819
4:047683	1:791576	0:7613451
1:672295	3:434457	2:1768536
2:904052	3:906055	4:6161726

Of course, the choice of data is arbitrary. Here is how I generated the 5 observations above. Feel free to generate more observations, change the mean, covariance, etc.

library(mvtnorm) mu <- c(3,-2,1)
Sigma <- matrix(c(10,5,4,5,18,7,4,7,9),nrow=3) X <- rmvnorm (5,mu,Sigma)

Now, suppose two of the observations in the data-set above are missing at random, the one on the fist row and first column, as well as the one on the third row and third column. The data-set with the missing components is shown below.

NA	4:605047	5:8303953
7:595643	1:754275	1:8826819
4:047683	1:791576	NA
1:672295	3:434457	2:1768536
2:904052	3:906055	4:6161726

Use EM algorithm described in your text book to estimate the missing data, the MLE for the mean vector and the MLE for the covariance matrix. Be sure that you run the algorithm long enough to reach convergence say within 1e - 5. Also, consider the algorithm in which we only update the missing x_j˜(1) for each subject/observation j = 1, ..., n and then recompute the MLE's directly from the updated dataset. In other words, we skip (5-39), and update Σ˜ from the entire dataset as opposed to trying to separately estimate each x(˜1) (1)T (note that the estimate for x(˜1) (2)T ∼j xj ∼j xj are the same under both algorithms). Discuss your thoughts on the implications of both EM methods. Do you prefer one over the other? Discuss any theoretical benefits/downfalls that you see.

7. Bootstrap is an efficient method in calculating the p-value of a test when the theoretical distribution of the test statistic is not available, and/or if the sample size is too small for the asymptotic approximations. The data file T est.txt includes 30 observations of 3 variables. Interest lies in testing the null hypothesis

	4		4
H₀:μ =	8	vs. H_a:μ ≠	8
	-2		-2

To calculate the bootstrap p-value, generate 10,000 samples, each of size 30 (with re- placement), from the original sample. For each sample set compute the test statistic:

W = - 2 log((max_Σ∈?0, L(µ₀,Σ))/(maxµ,_Σ∈?L(µ_~, Σ)).

Let W_obsbe the above computation for the originally observed dataset. Estimate the p-value Pr(W > W_obs) using the bootstrap samples. Compare your answer to the p-value calculated from the asymptotic distribution of the test statistic (Result 5.2 in the book). Provide a plot of your choice to compare the asymptotic distribution of W to its empirical distribution estimated based on bootstrap samples.

Attachment:- Assignment.rar

View Complete Question

Request for Solution File

Ask an Expert for Answer!!

Advanced Statistics: Construct and plot a 95 confidence ellipse for the pair and

Reference No:- TGS01628860

Expected delivery within 24 Hours

Have a Question? (oR Write a Review)

Write atleast 100 words!!

Request for Solution File

Ask an Expert for Answer!!

Advanced Statistics: Construct and plot a 95 confidence ellipse for the pair and

Reference No:- TGS01628860

Have a Question? (oR Write a Review)

Recent Questions Asked Advanced Statistics

Q : Evaluate your business opportunity using a feasibility

Q : Calculate the weighted average cost of capital wacc for the

Q : Do you think that the experts recommendations will be

Q : Suppose your teacher announces that only 1 student in the

Q : Construct and plot a 95 confidence ellipse for the pair and

Q : Analyze the various research methods employed in psychology

Q : Calculate the area required for a shell-and-tube heat

Q : This week you are reviewing setting pricing based on

Q : David purchases two goods bananas x and tea y her

Assign the most appropriate cpt procedure code

Finger-to-nose test allows assessment of what

Post a description of the healthcare organization website

Problem about healthcare organization reviewed

Discuss about purchased an electronic health record system

Nearing the end of indigenous health in canada

Potassium has which of the following effects

Request for Solution File

Ask an Expert for Answer!!

Advanced Statistics: Construct and plot a 95 confidence ellipse for the pair and

Reference No:- TGS01628860

Recent Questions Asked Advanced Statistics

Q : Evaluate your business opportunity using a feasibility

Q : Calculate the weighted average cost of capital wacc for the

Q : Do you think that the experts recommendations will be

Q : Suppose your teacher announces that only 1 student in the

Q : Construct and plot a 95 confidence ellipse for the pair and

Q : Analyze the various research methods employed in psychology

Q : Calculate the area required for a shell-and-tube heat

Q : This week you are reviewing setting pricing based on

Q : David purchases two goods bananas x and tea y her

Asked Questions