Simple linear


Simple Linear Regression

Dataset Need: IQ

1. Relationship Between Eighth Grade IQ and Ninth grade Math Score

For a statistics class project, students examined the relationship between x = 8th grade IQ and y = 9th grade math scores for 20 students. The data are displayed below.

Student Math Score IQ Abstract Reas
1 33 95 28
2 31 100 24
3 35 100 29
4 38 102 30
5 41 103 33
6 37 105 32
7 37 106 34
8 39 106 36
9 43 106 38
10 40 109 39
11 41 110 40
12 44 110 43
13 40 111 41
14 45 112 42
15 48 112 46
16 45 114 44
17 31 114 41
18 47 115 47
19 43 117 42
20 48 118 49

Open the dataset IQ found in the Datasets folder in ANGEL. Perform a linear regression with the Response (dependent variable) math score and the variable IQ as the Predictor (independent variable). Store/Save the (unstandardized) Residuals and Fitted(Predicted) values. These will be stored in the fourth and fifth columns of the data worksheet. The output should look as follows:

SPSS: Regression Analysis: Math Score versus IQ


a. Explain this equation. Discuss slope as change in Y per unit change in X in context of the variables used in this problem



b. Create a scatter plot of the measurements by selecting Math Score for the y-axis (response) and IQ for the x-axis (predictor). Describe the relationship between math score and IQ. Minitab Users: Graph > Scatter Plot > Simple. SPSS Users: Graphs > Legacy Dialogues > Scatter/Dot > Simple Scatter



c. One of the students with a high IQ (number 17) appears to be an outlier. With a sample size of only 20 this can affect our normality assumption. Also, the constant variance assumption could be compromised. We can visually check for constant variance using a Residual Plot and test for normality using a Probability Plot (or Q-Q plot)t. To get a residual plot, simply create a Scatterplot using the Residuals as the y-variable and the Fitted(Predicted) Values as the x-variable. (Remember these should have been stored/saved when you first performed the regression per instructions above. If not, re-run regression and click store/save and click the boxes for unstandardized residuals and fits(predicted) values.)

Now create a probability plot (Q-Q plot if using SPSS) of the residuals. We are provided the results of a test of the null hypothesis that the data follows a normal distribution. Based on these two graphs and what you have learned about hypothesis testing, what interpretations do you come to regarding the assumptions of constant variance and normality?




Minitab Users: Probability plot go to Graphs > Probability Plot > Single and select Residuals

SPSS Users: Q-Q plot with normal test go to Analyze > Descriptive Statistics > Explore and enter Unstandardized Residuals in Dependent List click Plots and select box for Normal plots with tests


d. The least squares regression line for predicting math score from IQ is given in the above output. What is the fitted regression line (i.e. regression equation)?



e. What do the Fitted (predicted) values and Residuals represent?



f. Based on the output, what is the test of the slope for this regression equation? That is, provide the null and alternative hypotheses, the test statistic, p-value of the test, and state your decision and conclusion.




2. Although outliers should never be deleted without a reason, there are several reasons why it may be legitimate to conduct an analysis without them. Delete the IQ data point for row 17 and re-calculate the regression line for the remainder of the data . You should obtain the following output:



SPSS:

1.) Directions: Read the following problem. In your drop box submission, please include the following information:
a. Determine if the problem is either a test of hypothesis, a confidence interval or something else and specify the 'key words' found in the problem that demonstrate your choice.
b. Determine the procedure name and parameters involved for each problem (use the Stat 200 Formulas and Techniques Summary document.) Specify the 'key words' found in the problem that lead you to this choice.
c. If the problem is a hypothesis test, indicated if it has a lower-tail, upper-tail, or two-tail alternative hypothesis as well as the test statistic formula for the test. If the problem is a confidence interval then indicate the formula used for the margin of error.
You DO NOT need to do the ACTUAL test of hypothesis, confidence interval, etc. in order to submit in the dropbox, just answer parts a, b, and c for the problem.

A large organization is being investigated to determine if its recruitment is sex-biased. Tables 1 and 2, respectively, show the classification of applicants for sales and for secretarial positions according to gender and result of interview. Table 3 is an aggregation of the corresponding entries of Table 1 and Table 2.
a. According to the data in tables 1 and 2, does there seem to be an association with gender and hiring status at the 5% significance level?
b. Does the data in Table 3 indicate the same result.

Table 1 Sales Positions

Offered
Denied
Total

Male
25
50
75

Female
75
150
225

Total
100
200
300


Table 2 Secretarial Positions

Offered
Denied
Total

Male
150
50
200

Female
75
25
100

Total
225
75
300


Table 3 Secretarial and Sales Positions

Offered
Denied
Total

Male
175
100
275

Female
150
175
325

Total
325
275
600





2.) Directions: Read the following problem. In your drop box submission, please include the following information:
a. Determine if the problem is either a test of hypothesis, a confidence interval or something else and specify the 'key words' found in the problem that demonstrate your choice.
b. Determine the procedure name and parameters involved for each problem (use the Stat 200 Formulas and Techniques Summary document.) Specify the 'key words' found in the problem that lead you to this choice.
c. If the problem is a hypothesis test, indicated if it has a lower-tail, upper-tail, or two-tail alternative hypothesis as well as the test statistic formula for the test. If the problem is a confidence interval then indicate the formula used for the margin of error.
You DO NOT need to do the ACTUAL test of hypothesis, confidence interval, etc. in order to submit in the dropbox, just answer parts a, b, and c for the problem.


An instructor at Arizona State University asked a random sample of eight students to record their study times in a elementary statistics course. She then made a table for total hours studied over 2 weeks and test scores at the end of the 2 weeks. Here are the data:

Study time 10 15 12 20 8 16 14 22
Test Scores 92 75 86 76 92 80 84 81

Assuming that a linear association exists, how are the data correlated? Determine the exact linear relationship between study time and test scores and use it to estimate the predicted test score for a student who studies 12 hours.





3.) Directions: Read the following problem. In your drop box submission, please include the following information:
a. Determine if the problem is either a test of hypothesis, a confidence interval or something else and specify the 'key words' found in the problem that demonstrate your choice.
b. Determine the procedure name and parameters involved for each problem (use the Stat 200 Formulas and Techniques Summary document.) Specify the 'key words' found in the problem that lead you to this choice.
c. If the problem is a hypothesis test, indicated if it has a lower-tail, upper-tail, or two-tail alternative hypothesis as well as the test statistic formula for the test. If the problem is a confidence interval then indicate the formula used for the margin of error.
You DO NOT need to do the ACTUAL test of hypothesis, confidence interval, etc. in order to submit in the dropbox, just answer parts a, b, and c for the problem.

A company sells a strong commercial floor cleaner and claims that the flashpoint (the lowest temperature at which the vapor of a combustible liquid can be ignited in air) exceeds 200ºF. A random sample of cleaner was obtained and the flashpoint of each was measured. The sample mean was 198.2ºF and the sample standard deviation was 10ºF. At the 1% significance level, is there sufficient evidence to support the companies claim?


4.) Directions: Read the following problem. In your drop box submission, please include the following information:
a. Determine if the problem is either a test of hypothesis, a confidence interval or something else and specify the 'key words' found in the problem that demonstrate your choice.
b. Determine the procedure name and parameters involved for each problem (use the Stat 200 Formulas and Techniques Summary document.) Specify the 'key words' found in the problem that lead you to this choice.
c. If the problem is a hypothesis test, indicated if it has a lower-tail, upper-tail, or two-tail alternative hypothesis as well as the test statistic formula for the test. If the problem is a confidence interval then indicate the formula used for the margin of error.
You DO NOT need to do the ACTUAL test of hypothesis, confidence interval, etc. in order to submit in the dropbox, just answer parts a, b, and c for the problem.

A survey of 500 households found that the family room was the primary television location in 415 homes. Is there evidence that the true population proportion of households having the family room as the primary television location is actually less than .85 at the 5% significance level?

Request for Solution File

Ask an Expert for Answer!!
Basic Statistics: Simple linear
Reference No:- TGS0802405

Expected delivery within 24 Hours