Asymptotic variance of the estimator


Question 1:

Let X ~ Bin(n,p). As an example, X could be the number of people (out of n people) who recovered from a disease upon using some drug.

(a) X/n is a standard estimator for p. What is the asymptotic distribution of √n ((X/n) - p)? Hence, if X = 37 and n = 100, create a 95% confidence interval for p.

(b) In part (a) the asymptotic variance of the estimator itself involves the parameter we are trying to estimate. Now show that in √n (sin-1 (√X/n) - √p) converges to a normal distribution the variance of which does not depend on p. Using this fact create a 95% confidence interval for p. (Hint: Do you remember your chain rule and the derivative of sin-1(x)?)

(c) Is either method of creating confidence intervals guaranteed to produce intervals that lie wholly inside (0, 1)?

Question 2:

One of the important questions an applied econometrician faces regarding model specification relates to nonlinearities in the right hand side variables. One might think that the solution is simple: we can simply include higher order terms and check the (joint) significance of coefficients attached to these terms. This strategy may not be practical however; if your model has 20 variables (including an intercept), a linear specification will have 20 variable, while a quadratic specification will have 210 variables. If we are talking about even higher powers, the loss of degrees of freedom and concomitant imprecision in the estimates will be disastrous! Several tests have been proposed in the literature to get around this problem. One of them is based on the following idea. If indeed a linear model is valid for the entire dataset then the fit obtained by using the entire dataset should not be very different from the fit obtained by using a smaller, in a sense to be defined, a subset consisting of more 'central data points'.

Using the above idea, the test suggests that we order the data points according to the distance of xi from its mean x‾and (assuming an even number of observations), divide the dataset into two groups: a 'central' group (consisting of observations for which the right hand side vector is closer to the mean vector) and a 'non-central' group (consisting of the other observations). Suppose, e is the residual vector from running a regression on the whole dataset and suppose e˜ be the residual vector from running a regression on only the n/2 'central' data points. Finally, we carry out an F-test using the statistic

[(e'e — e˜'e˜)/(n/2)]/ (e˜'e˜)/(n/2 -k)

and conduct a one-tailed F-test (with degrees of freedom n/2, n/2 — k) with the critical region being everything to the right of a certain point. Here k of course is the number of right hand side variables in the original model.

(a) Rigorously justify the asserted distribution of the test statistic under the null (i.e. a linear specification in the given rhs variables is correct). Be specific about the assumptions you are using for your justification.

(b) Intuitively explain why if indeed the null fails (particularly if E(ylx) is convex or concave in x), the test statistic is likely to be high (drawing a picture may help here).

(c) Using the data in the datafile "US City Ihmperatures", which provides typical Jan-uary temperatures for several US cities as well as their latitudes and longitudes, implement the test (of course, the temperature is the left-hand-side variable here).

(d) (Open ended) Now knowing what you know about variable selection from Topic 4, choose a 'good model'.

Request for Solution File

Ask an Expert for Answer!!
Econometrics: Asymptotic variance of the estimator
Reference No:- TGS01238191

Expected delivery within 24 Hours