Conduct regression and chi-squared test of independency


Assignment: Regression and Correlation Analysis & ANOVA

Overview and Rationale This assignment is designed to provide you with hands-on experiences in performing regressions and correlation analysis. The data set is provided in an Excel workbook and contains a wide range to data types that you will need to work with.

Course Outcomes

This assignment is directly linked to the following key learning outcomes from the course syllabus:

1: Explore the use of statistical software in data analysis through hands-on applications

2: Conduct regression and chi-squared test of independency to study associations between numerical and categorical variables respectively; and justify the legitimacy of the regression model.

Assignment Summary

Using the data provided in the attached Excel workbook, apply regression and correlation analysis on two data sets.

Follow the instructions in this project document to analyze the data presented in the Excel workbook. Then complete a report summarizing the results in your Excel workbook (or R script file). Submit both the report and the Excel workbook (or R script file).

Project Description

Using the Data worksheet found in the Module 6 Project_ US Occupations.xlsx Excel workbook, complete the following analyses' regarding US occupation data. Place the results in the worksheet specific in each part of the assignment.

In some parts of this project, you are asked to create random samples from a given population. Random sampling methods have been covered in Module 3, and tutorials are available in the Instructor Perspective folder in your Blackboard course page.

Part 1

The location quotients are given for NY (population 1) and LA (population2) in columns B and D, respectively, of worksheet Q1. The location quotient (LOC_QUOTIENT) represents the ratio of an occupation's share of employment in a given area to that occupation's share of employment in the U.S. as a whole. For example, an occupation that makes up 10 percent of employment in a specific metropolitan area compared with 2 percent of U.S. employment would have a location quotient of 5 for the area in question.

1. Use the random sampling method explained in the Instructor Perspective of Module 3 to draw a random sample of 350 from the NY LOC QUOTIENTs and a random sample of size 350 from the LA LOC QUOTIENTs.

2. Copy your samples into columns F and G of worksheet Q1.

3. Standardize both sets samples of LOC QUOTIENTs and display the standardized values (????) in columns I and J respectively.

4. For each of the two sets of LOC QUOTIENTs values, partition the standardized values into seven groups according to the following group specifications:

Group 1: Standardized values that are less than or equal to - 0.5 (that is, ???? ≤ -????. ????) Group 2: Standardized values satisfy: -????. ???? < ???? ≤ ???? Group 3: Standardized values satisfy: ???? < ???? ≤ ???? Group 4: Standardized values satisfy: ???? < ???? ≤ ???? Group 5: Standardized values satisfy: ???? < ???? ≤ ???? Group 6: Standardized values satisfy: ???? < ???? ≤ ???? Group 7: Standardized values satisfy: ???? > ????

5. Next, count the number of NY and the number of LA standardized LOC QUOTIENT values that fall into each of the above seven groups and complete Table A. 6. Use alpha = 0.10 to perform a Chi-squared test of independency to test the claim that the standardized LOC QUOTIENTs and locations (NY and LA) are independent factors by completing Tables B, C, and D.

Important: In your report, explain your solution procedures and your finding.

Part 2

The data in Q2 consists of a sample of LOC QUOTIENTs for both NY and LA for 317 randomly selected professions

1. Complete Table A in the Q2 worksheet by calculating the slope and the intercept of the regression line and the coefficients of correlation and determination of the regression model.

2. Create a scatter plot of the LA LOC QUOTIENTS versus those of NY. Display the regression line along with its equation and the ???????? value on the graph.

3. Use the calculated slope and intercept in Table A to calculate the predicted Y values and the residuals in columns I and J respectively.

4. Complete Table B.

5. Perform the procedure for creating a normal probability plot of the residuals (v).

6. Check the independency of the residuals graphically (vi).

7. Check the homoscedasticity of the residuals graphically (vii).

8. Construct a frequency distribution of the residuals consisting of 18 bins (viii).

9. Then use the Chisquared Goodness of Fit test to test the normality of the residuals.

10. Complete your work in the designated cells of worksheet Q2, and complete table D.

In your report, explain your solution procedures and your finding.

Format your assignment according to the following formatting requirements:

1. The answer should be typed, double spaced, using Times New Roman font (size 12), with one-inch margins on all sides.

2. The response also includes a cover page containing the title of the assignment, the student's name, the course title, and the date. The cover page is not included in the required page length.

3. Also include a reference page. The Citations and references should follow APA format. The reference page is not included in the required page length.

Attachment:- Module-Project-US-Occupations.rar

Solution Preview :

Prepared by a verified Expert
Applied Statistics: Conduct regression and chi-squared test of independency
Reference No:- TGS02979935

Now Priced at $100 (50% Discount)

Recommended (93%)

Rated (4.5/5)