Sta 141a fall 2016 homework compute the decision boundary, Basic Statistics

Sta 141a fall 2016 homework compute the decision boundary

STA Homework

1. This is a basic illustration of using Bootstrap for inference.

(i) Generate a random sample of size n = 100 following the univariate regression model

Y_i = -5 + 2X_i + ε_i

where X_i's are independent Chi-square random variables with degrees of freedom 6, and ε_i's are i.i.d. N(0, σ²) with σ = 1. Fix a random seed to ensure that the results are reproducible.

(ii) Fit the least squares regression line to the data and obtain the estimate of (β₀, β₁, σ²).

(iii) Obtain re-sampling-based 95% confidence intervals for β₀ and β₁ by using a parametric (i.e., residual-base) bootstrap procedure with 400 bootstrap replicates.

(iv) How do the confidence intervals in (iii) compare with the theoretical confidence intervals for β₀ and β₁? [To compare the accuracy of the confidence intervals, repeat the procedure in steps (i)-(iii) 10 times (using different random seed for each simulation run) and report the average lengths of the bootstrap confidence intervals and that of corresponding theoretical confidence intervals.]

2. In this example, compare k-NN classification method, linear discriminant analysis and logistic regression in a two-class classification problem. For this consider the iris data available in R.

(i) Extract the data corresponding to flower types setosa and versicolor, numbering a total of 100 flowers. Set aside the last 10 measurements for each flower type as test data and use the remaining data consisting of 80 measurements as training data.

(ii) Fit a logistic regression model to the training data, using the variable Sepal.Length as predictor. Obtain the estimates of the model parameters. Compute the confusion matrix for the test data set.

(iii) Compute the decision boundary for linear discriminant analysis, using Sepal.Length as the predictor variable. Compute the confusion matrix for the test data set.

(iv) Use k-nearest neighbors classification method with k = 3, 4, 5, again using Sepal.Length as the predictor variable. In each case, confusion matrix for the test data set.

(v) Write a very brief summary of the comparative performance of different classification procedures.

Reference -

1. James, G., Witten, D., Hastie, T. and Tibshirani, R. (2013). An Introduction to Statistical Learning with Applications in R. Springer. [Chapters 3, 4 & 5].

Request for Solution File

Ask an Expert for Answer!!

Basic Statistics: Sta 141a fall 2016 homework compute the decision boundary

Reference No:- TGS01699660

Have a Question? (oR Write a Review)

Recent Questions Asked Basic Statistics

Q : What concepts of the theory make it the most appropriate

Q : For integration of faith 2 continue your review of nehemiah

Q : What are the costs and benefits if ann whithey decided to

Q : What is business model design and innovation 3 paragraphs

Q : Sta 141a fall 2016 homework compute the decision boundary

Q : What form of business ownership would you recommend for

Q : The elora jean amp co owner has come to you asking for

Q : What would this organization be measuring in terms of cash

Q : Describe any conditions mentioned in the annual report that

What physical wellness in the workplace refers to

Process of redesigning several floors of office space

What if an adult diagnosed with schizophrenia

Discuss problems within the health care system

What are parts of the patient safety competency

Discuss brain development and emotional intelligence

Caring for a client with pancreatic cancer