Avanced quantitative methods - statistical case study -


Analysis of Correlates of School Performance

You have been hired as a consultant to the California Department of Education to analyze elementary school outcome data in order to understand the predictors of school level variation in academic performance. A select group of demographic and outcome data are available. The R data file called "calschooldist.csv" contains these data (note: these are real data). This file contains school outcomes for 400 elementary schools randomly selected across the state of California. The data set contains the following variables:

Your task is to build a reliable regression model which explains the correlates with school performance. Your dependent variable is acadperf, which is a measure of school wide standardized test scores (higher is better).

Policy background
By looking at schoolwide academic performance at the school level (rather than the individual level) we can obtain important insights into how context matters. Sadly, this analysis highlights how much student performance varies based on characteristics that are out of one's control such as the poverty of the community they were born in. It also raises some interesting questions related to educational practice. The ideal way to conduct this analysis would be to do a two-level analysis so that we could separate student level from school level factors. But I think you'll find the analysis of school level factors interesting.

Please follow the instructions below carefully:

Memo:

Using a hierarchal modeling strategy (as described in class and in Field), build a regression model that best predicts school performance. Write a 3-4 page memo which describes your findings. Discuss what other data you would like to collect in order to strengthen your findings. Describe the main substantive results from your final regression model only. Leave the more technical details about model assumptions and the model building process to the technical appendix (described below). Remember: learning what variables are statistically insignificant can be just as important as learning which ones are statistically significant.

Model building documentation appendix:

Important: following is a step by step description of the model building process. Regression can get unwieldy if you don't have a plan so I would suggest that you closely follow the steps I outline below.

Then, in a couple of pages, prepare a concise technical appendix where you answer the questions below. In the technical appendix, you basically have to document how you went through the 10 step process. See below for more detail, by step. Note: brief answers are okay!

Step 1: Without looking at the data, record expectations: what factors are likely to explain school performance (make a ‘wish list' of independent variables)?

Step 2: Reconcile "wish list" with available data. Take note of variables that you can't measure because they aren't available (to gauge omitted variable bias). List those variables here.

Step 3: Create a list of the variables in your wish list that are available in the data (or have close proxies). These are your candidate independent variables.

Step 4: Perform basic checks of the candidate variables. Do you have any missing value or out of range data problems? (if so, what did you do to resolve them, if anything?).

Step 5: What did your check of the correlation matrix find? Did you add any variables to the end of you list based on it? Does it look like you need to worry about multicollinearity?

Step 6: Write down the order of entry based on your best guess given your knowledge of field (protection against specification error) . If you added any variables based on the correlation analysis, add them to the end of your list. They should be given lowest priority since a priori expectations did not suggest their importance.

Step 7: Add your first independent variable. Show your bivariate model. Did it accord with your expectations?

Step 8: Check for regression violations for this bivariate mode. Did you find any major violations?

Step 9: Sequentially build up the model adding variables in the order you specified (don't check reg. assumptions at each stage)

Add variables one by one. As you add variables:

- Drop variables that are insignificant unless strong theoretical reason to keep.

- If an insignificant variable makes existing variable insignificant just drop the new one.

- If the new variable is significant but adding it makes and old variable insignificant, keep both. Theory led you to think the other important, so keep it.

- Keep track of variables which are not significant. This is important to document.

Briefly document what you kept and what you dropped.

You do NOT!! Need to check assumptions for each variable you add..only do this for the bivariate model and your final model. The one exception relates to multicollinearity. It can be useful to check for multi-collinearity as you add variables.

Step 10: Recheck model assumptions, for your final model. The final model is the one you should write about.

Discuss your final model, review the coefficient table in detail, and the other key statistics (Bs, Rsq,T stats,Fstats,StandardizedBs etc). Also, briefly discuss if the final model satisfied regression assumptions overall. If not, what are some options for improving the model fit?

Review the distance measures and influence statistics that Field discusses for the final model (Cooks Distance) , etc. What do they suggest?)

Notes:

-The free meals variable is included both as a continuous and categorical variable. I would suggest starting with the continuous one and only use the categorical one if you want to explore the relationship between income and performance in more detail. But, if you do that, remember that categorical variables need to be dummy coded (so if you eventually use the categorical representation of school meals, you should dummy code it and don't use both the continuous and categorical version of the free meals variable). If you do this, beware of the dummy variable trap (can't enter dummy variable for every level of a variable-need to drop at least one).

Step 11: Advanced options (Please try at least one of these)

1. explore the use of logarithms of the dependent variable. Do these improve your model?

2. For some of your predictors create dummy variables for those who score "high" on the variables (that is, those in the top quartile). See code for how to do this (using the high ELL variable as an example). Do there appear to be threshold effects? In other words, do these dummy variables perform better than continuous versions of the same domains? (when you add these variables, remove the continuous version of the variables)

3. Create an interaction variable (by multiplying two dummy variables). Test for interaction effects. If you do this, make sure that the main effects are also included in the model. Alternatively, it could be interesting to run your final model separately by subgroups of a key variable (such as mealcat).

4. Using the visualization tools included in the lab (under "visualization extensions")

Attachment:- Statistical Case Study.zip

Solution Preview :

Prepared by a verified Expert
Applied Statistics: Avanced quantitative methods - statistical case study -
Reference No:- TGS02212171

Now Priced at $45 (50% Discount)

Recommended (99%)

Rated (4.3/5)