ECO 440/640 — Problem Set 2

Due 2017-02-15

Part A

  1. The following question is based on one by Jeffrey Wooldridge in his Introductory Econometrics: A Modern Approach. Some modifications have been made. Suppose you want to study the effects of various activities on GPA, and you survey a bunch of students on how much of their time they spend on four activities: work, study, sleep, and leisure. Every activity is put into one of these categories so that the total time spent on these activities is 168 hours for every student. You then construct the following model: $$GPA = \beta_0 + {\beta_1}study + {\beta_2}sleep + {\beta_3}work + {\beta_4}leisure + u$$
    1. Does it make sense to hold sleep, work, and leisure fixed while changing study? (This illustrates the problem we call “perfect multicollinearity”)
    2. How could you reformulate the model so that the parameters have a useful interpretation? Write out the alternative empirical model, and interpret one of the coefficients.

Part B

I want you to address the issue of the gender wage gap. Is there evidence of systematic discrimination in employment practices?


Here are the data: These are data from the 2016 Annual Social and Economic supplement (ASEC) of the Current Population Survey (CPS) from the US Census. This is a survey of 94,097 households ( that asks for economic information from every member of the household in March of each year. I have pared the sample to people:

I constructed the following variables for you:

Some ideas to keep in mind in answering the question:

Write up your results

This is a little like practice for writing a research paper. Do not take the questions below and make each a bullet point. Write up your results like you are making a report to your employer about whether there is evidence of wage discrimination.

Start with the simple regression of hourly wage on female: $$\ln(wage_i) = \alpha_0 + {\alpha_1}female_i + v_i$$ Make sure you write out this model in your report when you are describing your estimates. Make a regression results table (model it after the ones you see in Mastering 'Metrics). Interpret the coefficient on female. How does this coefficient relate to the means of the wages for men and women? Why? Clearly state the null and alternative hypotheses. Perform the hypothesis test and clearly state the results. Explain if the coefficients are economically significant (not just statistically). Is there evidence of discrimination?

Are you convinced by the “bivariate” regression above? Identify some potential confounds. Remember that a regression is an automatic match-maker; on which other variables do you want to match women and men? Write out the alternative model you will use, estimate it, and produce a regression table (put both regressions in the same table, which is really easy to do with stargazer()). Interpret the coefficient on female. Is there still evidence of discrimination (again, answering this should include a formal hypothesis test)? Produce and interpret a 95% confidence interval for the sex wage gap. Are you convinced by these results? What other information might you want in order to do a better job of addressing whether there is systematic discrimination in wages?