# ECO 440/640 — Problem Set 6 (IV concepts)

For this assignment, feel free to write out answers to questions individually rather than treating it as a professional report.

## Part A

As in many other examples we have seen, here you will want to explain wages as a function of education. I used the WAGE2 data to produce Table 1, and you may look at it for guidance if you want to. It might also be worth your time to look at the table and see if you would present anything differently (given the reading from last week).

### OLS

Suppose you estimate a linear model with OLS of the log of wage on education and a bunch of demographic controls (IQ, experience, age, race, whether the person lives in the south, whether the person is an urban dweller, number of siblings, birth order, and mother's and father's education) using OLS. You will likely see this kind of approach often in lay articles. They will say that they found an effect "even controlling for _____!" Sometimes this is convincing, and sometimes it is not. It is up to you to decide if you are convinced. Why might the OLS estimator for the effect of education on wage still be biased despite all these controls (examples are always better than vague generalities)? What is the direction of bias that you expect (and why)?

### IV

For the rest of the question, ignore all those controls in the previous paragraph to make things more simple. Sometimes controls are needed with a valid instrument, and sometimes they are not. This next paragraph will ask you to consider a case where a particular control variable is still necessary despite having a valid instrument.

There are two potential instruments we will consider here: the number of siblings a person has and their birth order (1 if the person is a first-born child, 2 if the person is a second-born child, 3 if the person is a third-born child, etc.). Why might the number of siblings not be a satisfactory IV for education? Why might birth order be a useful IV for education? Why might birth order fail as an IV for education if you do not include the number of siblings in the regression? (Make sure that you read that question carefully, since many of my students last semester answered a completely different question than the one I asked.)

Write out the model you would use for the effect of education on wages if you wanted to use birth order as an instrument for education (hint: what did the last paragraph say about siblings?). Assuming that number of siblings is exogenous, explain the steps you would go through to estimate the model consistently (in other words, what regressions would you “run”?). Your explanation should not be about R.

See the results of my estimation in Table 1 below. Does the first stage indicate that birth order could be a relevant IV (explain)? Does it show that it is a valid IV (explain)? Compare the IV estimates of the marginal effect of education to the OLS estimates (interpret at least one of them). True or false: the results show that the OLS estimator is downward (or “negatively”) biased (it may be useful to compare the results to what you expected when thinking about this).

 Dependent variable: log(Wage) Years of school log(Wage) OLS OLS IV (1) (2) (3) Years of school 0.0579*** 0.137* (0.0063) (0.0747) No. of siblings -0.015** -0.153*** 0.0021 (0.006) (0.040) (0.017) Birth order -0.153*** (0.057) Constant 6.06*** 14.3*** 4.94*** (0.091) (0.13) (1.1) Observations 852 852 852 R2 0.112 0.0583 -0.0543 Note: *p<0.1; **p<0.05; ***p<0.01

## Part B

Suppose that you wish to estimate the effect of class attendance on student performance. A basic model (at the level of individual students) is $$stndfnl = \beta_{0} + \beta_{1}atndrte + \beta_{2}priGPA + \beta_{3}ACT + u,$$ where the variables are defined as follows:

• $$stndfnl$$ is a standardized final exam score (the score is standardized by substracting the mean from it and dividing that difference by the standard deviation)
• $$atndrte$$ is the percent of classes attended
• $$priGPA$$ is the prior grade point average
• $$ACT$$ is the ACT score
1. Let $$dist$$ be the distance from the students' living quarters to the lecture hall. Do you think $$dist$$ is uncorrelated with $$u$$ (explain)?
2. Assuming that $$dist$$ and $$u$$ are uncorrelated, what other assumption must $$dist$$ satisfy to be a valid IV for $$atndrte$$? How could you justify this assumption?
3. Suppose we add the interaction term $$priGPA{\times}atndrte$$: $$stndfnl = \beta_{0} + \beta_{1}atndrte + \beta_{2}priGPA + \beta_{3}ACT + \beta_{4}priGPA{\times}atndrte + u$$ If $$atndrte$$ is correlated with $$u$$, then, in general, so is $$priGPA{\times}atndrte$$. Thus to estimate this model consistenly you need an instrument for $$priGPA{\times}atndrte$$. What might be a good IV for $$priGPA{\times}atndrte$$? [Hint: If $$E(u|priGPA, ACT, dist) = 0$$, as happens when $$priGPA$$, $$ACT$$, and $$dist$$ are all exogenous, then any function of $$priGPA$$ and $$dist$$ is uncorrelated with $$u$$.]