ECO 440/640 — Problem Set 4

Pay attention to this paragraph, as it illustrates for you how to talk about a data set and define variables. This problem set uses data on the Agora marketplace collected by your instructor in December 2014. Agora is a Darknet market where people typically buy drugs and other illicit materials. Each observation in the data set is information on one offer of a drug for sale (similar to an eBay listing). I have limited the sample to a randomly selected sample of listings on two days and to only “weed” versions of Cannabis for ease of exposition. You may download the sample from (remember that csv files are a generic text format and that if you want to use these data in r, you will need to use the read.csv() function and save the output to a name—I used the name agora). The data set contains the following variables.

PriceBTCPrice in Bitcoin
GramsGrams of weed offered
TypeString information about strain or quality
SellerUser name of the seller
OriginCountry from which the drugs would be shipped
OriginIsUSARecode of Origin: 1 if Origin is USA and 0 otherwise
ToPlaces to which seller will ship
ShippingInformation in the listing about shipping options and prices (a number means dollars unless otherwise specified)
FeedbackCountNumber of items of feedback the seller has received on the site
MostRecentFeedback_DaysNumber of days since the seller last received feedback
OldestFeedback_DaysNumber of days since the seller first received feedback
ScoreAverage feedback score for the seller (feedback scores can be 1, 2, 3, 4, or 5)
NumberOfDealsString with bins showing the number of sales the seller has made on the site
NumberOfDeals_ContinuousRecode of NumberOfDeals into a continuous numeric variable with the midpoint of the range in NumberOfDeals (1000+ was coded to 1000)
DateDate the listing was observed
DateTime the listing was observed
DollarsPerBTCExchange rate (USD per BTC) on the day the listing was observed
URLURL of the listing (at the time it was observed)

Part A

Table 1 is some R output (not “RStudio output”) that is missing some values. However, you have enough information in the table to fill in all the missing information, so do that. Include significance stars where appropriate. Explain how you get the values. The regression is similar to the ones in part A but uses separate indicators for multiple country origins (instead of just the US).

Table 1: example R output with gaps
lm(formula = PricePerGramDollars ~ log2(Grams) + Score + FeedbackCount + factor(Origin) + NumberOfDeals_Continuous, data = agora)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.6305802 24.2787089 0.191 0.84946
log2(Grams)-1.1909527 0.1349346
Score 2.8133501 0.573 0.56929
FeedbackCount 0.0476203 -0.858 0.39457
factor(Origin)Canada-3.8790813 3.2676473-1.187 0.24038
factor(Origin)Germany 2.9483056 1.6652960 1.770 0.08230 .
factor(Origin)USA-3.7883966 1.1314708 -3.348
NumberOfDeals_Continuous -0.0004706 0.38933
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.273 on 54 degrees of freedom
Multiple R-squared: 0.7267, Adjusted R-squared: 0.6913
F-statistic: 20.52 on 7 and 54 DF, p-value: 3.802e-13

Part B

Table 2 contains estimates of the economies of scale in marijuana purchases with data from Agora auctions. All of the subsequent questions refer to Table 2.
Table 2: Estimates of the economies of scale in marijuana purchases with data from Agora auctions
Dependent variable:
Price Per Gram ($)Log of Price Per Gram ($)
Log (base 2) of grams-1.296***-1.199***
Log of grams-0.191***-0.177***
Seller Rating (out of 5)7.685*0.545
Feedback count-0.049-0.003
Origin is USA-4.594***-0.401***
Number of deals by seller-0.0004-0.00000
Adjusted R20.5400.6710.5650.650
Residual Std. Error2.7762.3460.2700.242
Note:*p<0.1; **p<0.05; ***p<0.01

  1. For each of the estimated regressions, write out the implied regression model.
  2. Interpret the coefficient on Grams in each regression (they are not all the same). Use plain language (there should be nothing in your interpretation like “the log of grams increases by...”). You should also explain why I used log base 2 in the first two regressions and the natural log of grams in the second two regressions.
  3. Interpret the coefficient on OriginIsUSA. Use plain language.
  4. Why is the intercept in regression 2 negative? How would we interpret that number? Does it make sense?
  5. Interpret the p-value on the log of Grams in regression 4 and explain what you would do with that p-value.
  6. Construct a 95% confidence interval and a 99% confidence interval (using the correct degrees of freedom) for the coefficient on the log of Grams in regression 4. Interpret one of these confidence intervals.