ECO 440/640 — Problem Set 4
Pay attention to this paragraph, as it illustrates for you how to talk about a data set and define variables. This problem set uses data on the Agora marketplace collected by your instructor in December 2014. Agora is a Darknet market where people typically buy drugs and other illicit materials. Each observation in the data set is information on one offer of a drug for sale (similar to an eBay listing). I have limited the sample to a randomly selected sample of listings on two days and to only “weed” versions of Cannabis for ease of exposition. The data set contains the following variables.
PriceBTC  Price in Bitcoin 
Grams  Grams of weed offered 
Type  String information about strain or quality 
Seller  User name of the seller 
Origin  Country from which the drugs would be shipped 
OriginIsUSA  Recode of Origin: 1 if Origin is USA and 0 otherwise 
To  Places to which seller will ship 
Shipping  Information in the listing about shipping options and prices (a number means dollars unless otherwise specified) 
FeedbackCount  Number of items of feedback the seller has received on the site 
MostRecentFeedback_Days  Number of days since the seller last received feedback 
OldestFeedback_Days  Number of days since the seller first received feedback 
Score  Average feedback score for the seller (feedback scores can be 1, 2, 3, 4, or 5) 
NumberOfDeals  String with bins showing the number of sales the seller has made on the site 
NumberOfDeals_Continuous  Recode of NumberOfDeals into a continuous numeric variable with the midpoint of the range in NumberOfDeals (1000+ was coded to 1000) 
Date  Date the listing was observed 
Date  Time the listing was observed 
PricePerGram  PriceBTC/Grams 
DollarsPerBTC  Exchange rate (USD per BTC) on the day the listing was observed 
PricePerGramDollars  PricePerGram×DollarsPerBTC 
URL  URL of the listing (at the time it was observed) 
Part A
Table 1 is some R output (not “RStudio output”) that is missing some values. However, you have enough information in the table to fill in all the missing information, so do that. Include significance stars where appropriate. Explain how you get the values. The regression is similar to the ones in part A but uses separate indicators for multiple country origins (instead of just the US).
Call:  
lm(formula = PricePerGramDollars ~ log2(Grams) + Score + FeedbackCount + factor(Origin) + NumberOfDeals_Continuous, data = agora)  
Coefficients:  
 Estimate  Std. Error  t value  Pr(>t) 


(Intercept)  4.6305802  24.2787089  0.191  0.84946 

log2(Grams)  1.1909527  0.1349346 



Score  2.8133501 
 0.573  0.56929 

FeedbackCount 
 0.0476203  0.858  0.39457 

factor(Origin)Canada  3.8790813  3.2676473  1.187  0.24038 

factor(Origin)Germany  2.9483056  1.6652960  1.770  0.08230  . 
factor(Origin)USA  3.7883966  1.1314708  3.348 


NumberOfDeals_Continuous  0.0004706 

 0.38933 

 

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 2.273 on 54 degrees of freedom 

Multiple Rsquared: 0.7267, Adjusted Rsquared: 0.6913 

Fstatistic: 20.52 on 7 and 54 DF, pvalue: 3.802e13 
Part B
Table 2 contains estimates of the economies of scale in marijuana purchases with data from Agora auctions. All of the subsequent questions refer to Table 2.
Dependent variable:  
Price Per Gram ($)  Log of Price Per Gram ($)  
(1)  (2)  (3)  (4)  
Constant  15.420^{***}  18.673  2.776^{***}  0.394 
(0.749)  (18.988)  (0.073)  (1.960)  
Log (base 2) of grams  1.296^{***}  1.199^{***}  
(0.152)  (0.139)  
Log of grams  0.191^{***}  0.177^{***}  
(0.021)  (0.021)  
Seller Rating (out of 5)  7.685^{*}  0.545  
(3.891)  (0.402)  
Feedback count  0.049  0.003  
(0.049)  (0.005)  
Origin is USA  4.594^{***}  0.401^{***}  
(0.934)  (0.096)  
Number of deals by seller  0.0004  0.00000  
(0.001)  (0.0001)  
Observations  62  62  62  62 
R^{2}  0.547  0.698  0.572  0.679 
Adjusted R^{2}  0.540  0.671  0.565  0.650 
Residual Std. Error  2.776  2.346  0.270  0.242 
Note:  ^{*}p<0.1; ^{**}p<0.05; ^{***}p<0.01 
 For each of the estimated regressions, write out the implied regression model.
 Interpret the coefficient on Grams in each regression (they are not all the same). Use plain language (there should be nothing in your interpretation like “the log of grams increases by...”).
 Interpret the coefficient on OriginIsUSA. Use plain language.
 How would we interpret that intercept in regression 2? Does it make sense? What should we do about it?
 Interpret the pvalue on the log of Grams in regression 4 and explain what you would do with that pvalue.
 Construct a 95% confidence interval and a 99% confidence interval (using the correct degrees of freedom) for the coefficient on the log of Grams in regression 4. Interpret one of these confidence intervals.