ECO 440/640 — Problem Set 1

Script to help (right click and “save as”)

ddply help (you should read it): http://stat545.com/block013_plyr-ddply.html

Write your answers clearly using proper grammar. Explain all answers. Upload one copy of your work, and bring one to class (on your computer is fine). Note that we are not using Excel, because it is harder to do the tasks we want to do in this class in Excel. The tasks in this assignment may be easy in Excel, but calculating the entire distribution of marginal effects from a probit regression is extremely difficult in Excel. In R it is four lines of code.

You will need this data file: http://randycragun.com/courses/640/fertil2.RData. "The data... were collected on women living in the Republic of Botswana in 1988. The variable children refers to the number of living children. The variable electric is a binary indicator equal to one if the woman's home has electricity, and zero if not." (Wooldridge)

The following questions are based on ones by Jeffrey Wooldridge in his Introductory Econometrics: A Modern Approach. Some modifications have been made.

  1. Suppose you want to know if smaller class sizes improve student outcomes.
    1. If you could do any study you wanted to answer this question, what would you do (some answers are better than others)?
    2. Suppose that you have observational data on several thousand fourth graders on the size of their fourth-grade class and a standardized test score grade. You find a negative correlation between class size and test performance. Does this show that smaller classes improve performance? Explain.
    3. Suppose it were true that class size has no impact on student performance. Why might you still expect a negative correlation between class size and performance? Give specific examples for this case. (Notice that this does not simply ask you why you would expect a negative correlation in general!)
  2. Download the fertil2 dataset.
    1. Find the difference in average children between the groups with and without electricity. Describe the difference in a sentence with units.
    2. Test whether the means are different.
      1. Write out the null and alternative hypotheses.
      2. Calculate the test statistic. What are the degrees of freedom (and how do you know)?
      3. Find the p-value. Interpret the p-value (this is not the same as telling me what you do with it).
      4. Are the means significantly different? (This is about telling me what you do with the p-value.)
      5. Construct a 95% confidence interval for the difference in means. Explain what the confidence interval means.
    3. Write out an expression for the expected value of the difference of the average. Split the expression into treatment effects and a selection bias term.
    4. Identify two potential sources of selection bias (where “selection bias” refers to the kind of selection bias we have been talking about in class).
    5. Describe an experimental design for eliminating the selection bias. Explain why it works. (“Experimental” is key here.)
    6. Describe the process of matching we might use to try to reduce the selection bias (see MM chapters 1 and 2 if you are unsure). How good of a match in the samples would you need in order to accept that a negative correlation between electricity and fertility represents a causal effect of electricity on fertility? (It is important for you to think about what would convince you whenever you face a problem rather than just having abstract notions about the evidence.)