Plot a scatter chart that summarizes a linear relation


Assignment: Simple Linear Regression

It is well known that the famous geyser Old Faithful in Yellowstone National Park erupts quite regularly, and hence it has attracted millions of visitors. The data file "Geyser" gives information about eruptions during October 1980*. Variables are the Duration in seconds of the current eruption, and the Interval, the time in minutes to the next eruption. The park service uses data like these to obtain a prediction equation for the time to the next eruption. Such time predictions are shown to the tourists who patiently wait for the next eruption of Old Faithful.

*The data were collected by volunteers and made public by R. Hutchinson.Apart from missing data for the period from midnight to 6 am, this is a complete record of eruptions for this month.

1. Using Descriptive Statistics in Data Analysis of Excel, find the 95% and 99% confidence intervals for the population means ofDuration and Interval? Interpret these intervals. Compare the widths of the twoDuration intervals and of the twoInterval intervals, and make relevant comments about the differences in these widths.

2. Using Excel, plot a scatter chart that summarizes a linear relation between Duration and Interval; have Duration on the horizontal axis and Interval on the vertical axis. Include in this chart the "trendline" and the coefficient of determination. How would you interpret the slope of this "trendline"?

3. Using Regression in Data Analysis of Excel or Multiple Linear Regression in Predict of XLMiner, find the estimated regression equation for Interval from Duration. Compare it with the "trendline" found in Task 2. In the output identify the coefficient of determination and compare it with that found in Task 2. What is the interpretation of this coefficient? What is the estimated variance of the error termε in the assumed model Interval=β_0+β_1 Duration+ε? What is the estimated standard deviation of this error term?Suppose a tourist has just arrived at the end of an eruption that lasted 3.5 minutes. What is his/her predicted waiting time in hours to the next eruption?

4. Using Excel or XLMineroutput found in Task 3, at the 5% significance level, conduct the t test for testing the significance of Duration.(Specify clearly your hypotheses and symbols used, indicate the value of the test statistic and the distribution of the test statistic, identify the test p-value, make your conclusion and interpret this conclusion.)

5. Using Excel or XLMiner output found in Task 3, identify the 95% confidence interval for the population slope (the coefficient β_1) of Duration? Interpret this interval. Conduct the test in Task 4 using this interval. Did you get the same conclusion as in Task 4?

6.

A. Consider a population of tourists who arrive at the end of an eruption that lasted 3.5 minutes. What is their average waiting time to the next eruption? What is the 95% confidence interval fortheir average waiting time to the next eruption? (Show all details of your calculations!)

B. Suppose a tourist (John Smith) has just arrived at the end of an eruption that lasted 3.5 minutes. What is his predicted waiting time to the next eruption? What is the 95% confidence interval forhis predicted waiting time to the next eruption?(Show all details of your calculations!)

C. Explain the difference in the width of the intervalsfound in Tasks 6A and 6B.

Note: For the estimated regression equation Y ^= b_0+b_1 X and X=x^*, the point prediction is obviously y ^^*=b_0+b_1 x^*and you already found this prediction in Task 3.

The (1-α)100%confidence interval for the meanE(Y|X=x^*) is y ^^*±t_(α/2,n-2) s√(1/n+(x^*-¯X)^2/((n-1)s_X^2 )), where P(t≥t_(α/2,n-2) )=α/2 with df=n-2, s=S_YX= standard error of the estimate, ands_X^2 is the sample variance of X.

The (1-α)100%prediction intervalfor(Y ^|X=x^* ) is y ^^*±t_(α/2,n-2) s√(1+1/n+(x^*-¯X)^2/((n-1)s_X^2 ))

Use Microsoft Word to write a report with your name shown on the first page. The report should include all your Excel outputs (copy and paste them), so do not attach any separate Excel files.

The response should include a reference list. Double-space, using Times New Roman 12 pnt font, one-inch margins, and APA style of writing and citations.

Attachment:- Geyser.rar

Solution Preview :

Prepared by a verified Expert
Applied Statistics: Plot a scatter chart that summarizes a linear relation
Reference No:- TGS02115963

Now Priced at $40 (50% Discount)

Recommended (91%)

Rated (4.3/5)