# ECO 440/640 — Problem Set 2

Due 2017-02-15

## Part A

- The following question is based on one by Jeffrey Wooldridge in his Introductory Econometrics: A Modern Approach. Some modifications have been made. Suppose you want to study the effects of various activities on GPA, and you survey a bunch of students on how much of their time they spend on four activities: work, study, sleep, and leisure. Every activity is put into one of these categories so that the total time spent on these activities is 168 hours for every student. You then construct the following model:
$$GPA = \beta_0 + {\beta_1}study + {\beta_2}sleep + {\beta_3}work + {\beta_4}leisure + u$$
- Does it make sense to hold sleep, work, and leisure fixed while changing study? (This illustrates the problem we call “perfect multicollinearity”)
- How could you reformulate the model so that the parameters have a useful interpretation? Write out the alternative empirical model, and interpret one of the coefficients.

## Part B

I want you to address the issue of the gender wage gap. Is there evidence of systematic discrimination in employment practices?

### Data

Here are the data: http://randycragun.com/courses/640/asec2016.RData. These are data from the 2016 Annual Social and Economic supplement (ASEC) of the Current Population Survey (CPS) from the US Census. This is a survey of 94,097 households (https://cps.ipums.org/cps/sample_sizes.shtml) that asks for economic information from every member of the household in March of each year. I have pared the sample to people:

- Aged 16 to 67
- Not in the military
- Not living in "group quarters" (given by
`GQ`

in IPUMS) - Who worked more than 0 weeks in the last year (from
`WKSWORK1`

) - Who typically worked more than 1 hour per week of work last year (from
`UHRSWORKLY`

) - For whom we have wage and salary income data for the last year (from
`INCWAGE`

) - Who had more than $0 of wage and salary income over the last year

I constructed the following variables for you:

`hourwage = INCWAGE/(UHRSWORKLY*WKSWORK1)`

: hourly wage over the last year.`school`

: years of schooling completed.`female`

: constructed from the IPUMS`SEX`

. 1 if the person is female. 0 otherwise.`white`

: 1 if the person is white; 0 otherwise.`asian`

: 1 if the person is Asian (by race or ethnicity); 0 otherwise.`marriedSpousePresent`

: 1 if the person is married and their spouse is present in the household; 0 otherwise.`fulltime`

: 1 if the person usually worked at least 35 hours in the weeks they worked last year.`FullTimeLastWeek`

: 1 if the person worked at least 35 hours last week; 0 otherwise.`PartTimeLastWeekForFamily`

: 1 if the reason for working part time last week was one of the following- Too busy with house, school, etc
- Child care problems
- Other family/personal obligations

`OutOfWorkLastYearForFamily`

: 1 if the person reported "taking care of home/family" as their activity when they were not in the labor force last year; 0 otherwise.

Some ideas to keep in mind in answering the question:

- Should you use the hourly wage income or the logarithm of the hourly wage income?
- You definitely want demographic and education variables.
- Read the descriptions of the variables that you are using on IPUMS CPS.
- I removed most of the data that have missing values, but usually you need to pay special attention to codes for "missing" or "not in universe" (NIU). You would convert these to "NA" before doing anything with those variables or you would see a bunch of people with $9999999 in income. This information is usually under "codes" for each variable on IPUMS.
`lm()`

is the function for an ordinary least squares regression.`summary(lm())`

will give you a summary of the regression that includes more information. You can also save regression results with a name (`MyRegression = lm(MyDepVar~MyIndepVar1+MyIndepVar2)`

) and then get a summary of that (`summary(MyRegression)`

). Regression outputs are lists with named components, so`MyRegression$coefficients[1:4,2:3]`

will give you the first four rows and second through third columns of the coefficients table.- You do not need the
`t.test()`

function for this assignment. The summary of a regression already includes all the information you need to test null hypotheses that coefficients are zero (including t stats and p values). - The
`stargazer`

package (you will need to install it) contains the`stargazer()`

function, which can output regression tables from R into text, Word, TeX, or HTML files. Look at the documentation; it will be useful to you throughout the semester.`stargazer(MyRegression1, MyRegression2, type="text", out="Path/To/File/For/Output.txt")`

will produce a regression table with the two regressions in two separate columns in a text file that you can copy wherever you need. - If you are comparing males and females in a sample, you must tell your audience how many people in the sample are in each group (part of what we call “descriptive statistics”) .

### Write up your results

This is a little like practice for writing a research paper. Do not take the questions below and make each a bullet point. Write up your results like you are making a report to your employer about whether there is evidence of wage discrimination.

Start with the simple regression of hourly wage on `female`

:
$$\ln(wage_i) = \alpha_0 + {\alpha_1}female_i + v_i$$
Make sure you write out this model in your report when you are describing your estimates. Make a regression results table (model it after the ones you see in Mastering 'Metrics). Interpret the coefficient on `female`

. How does this coefficient relate to the means of the wages for men and women? Why? Clearly state the null and alternative hypotheses. Perform the hypothesis test and clearly state the results. Explain if the coefficients are economically significant (not just statistically). Is there evidence of discrimination?

Are you convinced by the “bivariate” regression above? Identify some potential confounds. Remember that a regression is an automatic match-maker; on which other variables do you want to match women and men? Write out the alternative model you will use, estimate it, and produce a regression table (put both regressions in the same table, which is really easy to do with `stargazer()`

). Interpret the coefficient on `female`

. Is there still evidence of discrimination (again, answering this should include a formal hypothesis test)? Produce and interpret a 95% confidence interval for the sex wage gap. Are you convinced by these results? What other information might you want in order to do a better job of addressing whether there is systematic discrimination in wages?