ECO 440/640 — Problem Set 4
Pay attention to this paragraph, as it illustrates for you how to talk about a data set and define variables. This problem set uses data on the Agora marketplace collected by your instructor in December 2014. Agora is a Darknet market where people typically buy drugs and other illicit materials. Each observation in the data set is information on one offer of a drug for sale (similar to an eBay listing). I have limited the sample to a randomly selected sample of listings on two days and to only “weed” versions of Cannabis for ease of exposition. You may download the sample from http://randycragun.com/courses/640/AgoraData_small.csv (remember that csv files are a generic text format and that if you want to use these data in r, you will need to use the
read.csv() function and save the output to a name—I used the name
agora). The data set contains the following variables.
|PriceBTC||Price in Bitcoin|
|Grams||Grams of weed offered|
|Type||String information about strain or quality|
|Seller||User name of the seller|
|Origin||Country from which the drugs would be shipped|
|OriginIsUSA||Recode of Origin: 1 if Origin is USA and 0 otherwise|
|To||Places to which seller will ship|
|Shipping||Information in the listing about shipping options and prices (a number means dollars unless otherwise specified)|
|FeedbackCount||Number of items of feedback the seller has received on the site|
|MostRecentFeedback_Days||Number of days since the seller last received feedback|
|OldestFeedback_Days||Number of days since the seller first received feedback|
|Score||Average feedback score for the seller (feedback scores can be 1, 2, 3, 4, or 5)|
|NumberOfDeals||String with bins showing the number of sales the seller has made on the site|
|NumberOfDeals_Continuous||Recode of NumberOfDeals into a continuous numeric variable with the midpoint of the range in NumberOfDeals (1000+ was coded to 1000)|
|Date||Date the listing was observed|
|Date||Time the listing was observed|
|DollarsPerBTC||Exchange rate (USD per BTC) on the day the listing was observed|
|URL||URL of the listing (at the time it was observed)|
Table 1 is some R output (not “RStudio output”) that is missing some values. However, you have enough information in the table to fill in all the missing information, so do that. Include significance stars where appropriate. Explain how you get the values. The regression is similar to the ones in part A but uses separate indicators for multiple country origins (instead of just the US).
| || || || |
| || || || |
| || |
| || |
| || || |
| || || || ||
| || || |
| || || |
Table 2 contains estimates of the economies of scale in marijuana purchases with data from Agora auctions. All of the subsequent questions refer to Table 2.
|Price Per Gram ($)||Log of Price Per Gram ($)|
|Log (base 2) of grams||-1.296***||-1.199***|
|Log of grams||-0.191***||-0.177***|
|Seller Rating (out of 5)||7.685*||0.545|
|Origin is USA||-4.594***||-0.401***|
|Number of deals by seller||-0.0004||-0.00000|
|Residual Std. Error||2.776||2.346||0.270||0.242|
|Note:||*p<0.1; **p<0.05; ***p<0.01|
- For each of the estimated regressions, write out the implied regression model.
- Interpret the coefficient on Grams in each regression (they are not all the same). Use plain language (there should be nothing in your interpretation like “the log of grams increases by...”). You should also explain why I used log base 2 in the first two regressions and the natural log of grams in the second two regressions.
- Interpret the coefficient on OriginIsUSA. Use plain language.
- Why is the intercept in regression 2 negative? How would we interpret that number? Does it make sense?
- Interpret the p-value on the log of Grams in regression 4 and explain what you would do with that p-value.
- Construct a 95% confidence interval and a 99% confidence interval (using the correct degrees of freedom) for the coefficient on the log of Grams in regression 4. Interpret one of these confidence intervals.