# ECO 440/640 — Problem Set 5

For your assignment this week, I want you to answer the question “Does a year of education increase wages more for women than for men?”

### Data

• Use the CPS ASEC data from ipums.org. It is fine to pick just one year (2017 is a good year).
• You should use the logarithm of wage income. You will need to construct a measure of hourly wage using a combination of variables like incwage, uhrsworkly, and wkswork1.
• Read the descriptions of the variables.
• On IPUMS, before you "submit extract", change the "data format" to csv.
• Pay special attention to codes for "missing" or "not in universe" (NIU). Convert these to "NA" before doing anything with those variables or you will see a bunch of people with \$9999999 in income. This information is usually under "codes" for each variable on IPUMS.
• Try to focus on full-time workers (this depends on both hours worked per week and number of weeks worked per year) that are adults of working age.

I put together the following variables for you. They are the same length, and you can use them to convert the categories in EDUC to years of schooling. You will need to put them in a data frame (as I did in the third line of code below) and merge that data frame with your main data frame (again as I did in the fourth line of code below). You will need to change the name of the "cps" data frame to whatever you named yours.

EDUC = c(999,0,1,2,10:14,20:22,30:32,40,50,60,70:73,80,81,90:92,100,110,111,120:125)
school = c(NA,NA,NA,0,2.5,1:4,5.5,5:6,7.5,7:11,rep(12,4),13,rep(14,4),15,16,16,17,rep(18,4),20)
edyears = data.frame(EDUC,school)
cps2 = merge(cps,edyears,by="EDUC")

### Write up your results

This is a little like practice for a paper or report. Write out what question you are answering (do not quote me) and why it matters. Show the econometric model you will use, give clear descriptions of the variables, and point out and interpret the important coefficients. Clearly state the null and alternative hypotheses, and put these in terms of the econometric model. Clearly describe the data you used (including exactly which observations you included--only people aged 15-64 who worked more than 35 weeks last year?). Make a regression results table, and describe the important facts it communicates. Perform the hypothesis test and clearly state the results. Explain if the coefficients are economically significant (not just statistically). Plot the predicted wages for women and men against hours of schooling (this is just two curves on the same graph). Comment on the graph. What is important? Does it help answer the question? Are there portions of the graph that are not useful to us because they are outside of your sample? Does your regression recover the causal effect of sex on returns to education? Are there any variables that you left out that should be included? The goal here is to be thorough in answering the question.