Hi all,
I am using multiple imputation on a dataset. Of the ~220 observations, ~40 require imputation for three active variables and a handful of passive variables. I transform these variables to restrict their ranges: the number of wells should be only positive, so I impute ln(wells); the proportion of agricultural wells should be between 0 and 1, so I impute logit(percent_ag); and the same for proportion near the coast.
The missing observations follow a monotone pattern, so I use mi impute monotone. Because the imputation procedure was giving me many extreme results (e.g., almost all proportions coming out to 99% or <1%), I tried to restrict the range of the regression on the logit-transformed variables (see below, e.g., truncreg, ll(-2) ul(2)). This reduced the imputation sample slightly, but not by much.
The issue that arises is that the imputation datasets are now no longer complete. Of the 20 imputations, a handful will have missing values for some observations for one or more of these variables. When I then try to run estimation afterwards, I get the following error: "estimation sample varies between m=1 and m=2; click here for details r(459)" This makes sense. Does anyone have an idea for why Stata is not imputing full datasets when I restrict the range of the dep vars using a truncated regression??? Is it because it wants to impute values outside the truncated range and then drops the imputation (m) when it cannot do so? That would seem odd...
The important parts of the code are:
//Make necessary transformations
gen epsilon=.0001
gen ln_wells = ln(num_wells_exog+epsilon) //We don't want this to be ln(0)
//Logit undefined if p=0 or p=1
gen logit_ag = logit(percent_ag_wells_exog)
replace logit_ag = logit(percent_ag_wells_exog+epsilon) if percent_ag_wells_exog==0
replace logit_ag = logit(percent_ag_wells_exog-epsilon) if percent_ag_wells_exog==1
replace prop_wells_1000m_coast=0 if dum_coast==0 //Conditional Value
gen logit_prop_coast = logit(prop_wells_1000m_coast)
replace logit_prop_coast = logit(prop_wells_1000m_coast+epsilon) if prop_wells_1000m_coast==0
replace logit_prop_coast = logit(prop_wells_1000m_coast-epsilon) if prop_wells_1000m_coast==1
mi set wide
mi register imputed ln_wells logit_ag logit_prop_coast
mi register regular wellyieldavg mean_precip_19502014 dum_coast avggrowth_1950_2010 type_num swp_connect totalarea_acres mean_spatialvariance_19502014 nfarms_avg19401959
mi impute monotone (truncreg, ll(-10) ul(12)) ln_wells (truncreg, ll(-2) ul(2)) logit_ag (truncreg if dum_coast==1, ll(-10) ul(10)) logit_prop_coast = wellyieldavg mean_precip_19502014 dum_coast avggrowth_1950_2010 type_num swp_connect totalarea_acres mean_spatialvariance_19502014 nfarms_avg19401959, noisily force add(20) rseed(47)
mi passive: gen mi_wells_exog = exp(ln_wells)
mi passive: replace mi_wells_exog = num_wells_exog if num_wells_exog!=.
mi passive: gen mi_percent_ag_wells_exog = invlogit(logit_ag)
mi passive: replace mi_percent_ag_wells_exog = percent_ag_wells_exog if percent_ag_wells_exog!=.
mi passive: gen mi_prop_wells_1000m_coast = invlogit(logit_prop_coast)
mi passive: replace mi_prop_wells_1000m_coast = prop_wells_1000m_coast if prop_wells_1000m_coast!=.
mi passive: gen mi_prop_1000m_sq = mi_prop_wells_1000m_coast^2
mi passive: replace mi_prop_1000m_sq = prop_1000m_sq if prop_1000m_sq!=.
mi passive: gen mi_percent_nonag_wells_exog = (1-mi_percent_ag_wells_exog)
mi passive: replace mi_percent_nonag_wells_exog = percent_nonag_wells_exog if percent_nonag_wells_exog!=.
mi passive: gen mi_well_heterogeneity_exog = (mi_percent_ag_wells_exog)*(mi_percent_nonag_wells _exog)
mi passive: replace mi_well_heterogeneity_exog = well_heterogeneity_exog if well_heterogeneity_exog!=.
mi passive: gen mi_wells_per_acre_exog = (mi_wells_exog)/(totalarea_acres)
mi passive: replace mi_wells_per_acre_exog = wells_per_acre_exog if wells_per_acre_exog!=.
mi estimate, post: ologit type_num wellyieldavg mean_precip_19502014 dum_coast mi_wells_per_acre_exog avggrowth_1950_2010 mi_percent_ag_wells_exog, robust
//Here is where the error is encountered