Interpretation of IV results using dummy variables as the Instrument

March 20, 2016, 11:45 am

≫ Next: Variable and value labels with reshape

≪ Previous: graphing interaction between categorical variable after stpm2

I am currently running IV regressions where the instrument is dummies for regions of the world. I have completed a 2sls regression and have the coefficient of the variable which I was originally instrumenting. Both the dependent and independent variable are in levels. I have omitted one of the dummies in the regression and I was wondering how we should interpret this coefficient. The interpretation will change depending on if I omit the dummy for Africa or the dummy for Europe.

↧

Variable and value labels with reshape

March 20, 2016, 12:40 pm

≫ Next: Updated version of -itsa- available on SSC

≪ Previous: Interpretation of IV results using dummy variables as the Instrument

Hello, I have some data:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input double(hhid mem dc01) long(dc02 dc03)
2 51 2001 2 9
3  5 1990 2 7
3 12 1997 2 9
4  9 2001 3 9
4 10 2000 3 9
end
label values dc02 dc02
label def dc02 2 "male", modify
label def dc02 3 "female", modify
label values dc03 dc03
label def dc03 7 "nephew/niece", modify
label def dc03 9 "grandchild", modify

I am trying to reshape the data to wide. I do:

Code:

by hhid, sort: gen number= _n
reshape wide dc*, i(hhid) j(number)

This works fine, but I lose the value and variable labels. I've tried following http://www.stata.com/support/faqs/da...after-reshape/, but it doesn't seem to work even with their example data. The variable labels end up as " : 80" instead of "year: 80"

Ultimately, I would like to take the variable labels such as "Gender: Number *" where * is the corresponding number from the number variable I generated. This seems to be exactly what the above link is doing, but is not working. Thanks for any help.

↧

Updated version of -itsa- available on SSC

March 20, 2016, 12:43 pm

≫ Next: Problem with syntax

≪ Previous: Variable and value labels with reshape

Thanks to Kit Baum, a revised version of itsa is now available on SSC.

This new version sets the time variable (_t) to start at 0, not 1, in order for the model estimate of the constant to represent the baseline level of the time series. I thank Nicola Orsini for noticing this and bringing it to my attention.

Additionally, I also set the version to 11.0, which seems to be what a lot of people wanted.

itsa performs interrupted time series analysis for single and multiple groups

itsa estimates the effect of an intervention when the outcome variable is ordered as a time series, and a number of observations are available in both pre- and post-intervention periods. The study design is generally referred to as an interrupted time series because the intervention is expected to "interrupt" the level and/or trend subsequent to its introduction. itsa is a wrapper program for, by default, newey, which produces Newey-West standard errors for coefficients estimated by OLS regression, or optionally prais, which uses the generalized least-squares method to estimate the parameters in a linear regression model in which the errors are assumed to follow a first-order autoregressive process. itsa estimates treatment effects for either a single treatment group (with pre- and post-intervention observations) or a multiple-group comparison (i.e., the single treatment group is compared with one or more control groups). Additionally, itsa can estimate treatment effects for multiple treatment periods.

↧

Problem with syntax

March 20, 2016, 2:09 pm

≫ Next: Problem Interpreting IV Results

≪ Previous: Updated version of -itsa- available on SSC

Dear all

I am having a strange problem with syntax. More specifically, Stata seems to get confused with the comma separating the options of a program, arbitrarily adding it in the last variable name of a varlist. To illustrate the problem, consider the following toy program and its output:

Code:

program define mycomma
version 13

syntax varlist(numeric min=3 max=3 fv), msg(name)

di `1'
di `2'
di `3'

end


. mycomma var1 var2 var3, msg(hello)
---------------------------------------------------------------------- begin mycomma ---
- version 13
- syntax varlist(numeric min=3 max=3 fv), msg(name)
- di `1'
= di var1
235
- di `2'
= di var2
1
- di `3'
= di var3,
0
------------------------------------------------------------------------ end mycomma ---

As you can see, Stata puts a comma after var3. In this case the program runs but the added comma creates problems when i am trying to write more complicated programs. The issue is perhaps more clear if i include var3 in brackets:

Code:

. mycomma var1 var2 var3, msg(hello)
--------------------------------------------------------------------- begin mycomma ---
- version 13
- syntax varlist(numeric min=3 max=3 fv), msg(name)
- di `1'
= di var1
235
- di `2'
= di var2
1
- di (`3')
= di (var3,)
var3, invalid name
---------------------------------------------------------------------- end mycomma ---
r(198);

Is this a known bug and is there a way around it? One solution is to put a space between the last variable and the comma but this also creates issues in more complicated programs with estimation commands, so it is not a good workaround.

Cynthia

↧

Problem Interpreting IV Results

March 20, 2016, 2:57 pm

≫ Next: Creating a Retrospective Panel

≪ Previous: Problem with syntax

I would appreciate any help on this issue. Thank you so much. I am running an IV regression using the command below. I instrument using dummies for France, UK, Germany and Scandanavia legal origin. I omit the dummy for Scandanavia in the analysis below because each country either has a legal origin from one of these regions and I would have multicollinearity otherwise. This is what I have done in stata
. ivregress 2sls newgdpgrowth loginitialincome logsse Govtexp_GDP Inflation_CPI Trade_GDP (LLY_KL93 = legor_uk legor_fr legor_ge)

Instrumental variables (2SLS) regression Number of obs = 51
Wald chi2(6) = 24.44
Prob > chi2 = 0.0004
R-squared = 0.3390
Root MSE = 1.2635

----------------------------------------------------------------------------------
newgdpgrowth | Coef.
-----------------+----------------------------------------------------------------
LLY_KL93 | .0093698
loginitialincome | -.5764283
logsse | 1.066066
Govtexp_GDP | -.0374744
Inflation_CPI | -.0029508
Trade_GDP | .0070273
_cons | 5.664811
----------------------------------------------------------------------------------
Instrumented: LLY_KL93
Instruments: loginitialincome logsse Govtexp_GDP Inflation_CPI Trade_GDP
legor_uk legor_fr legor_ge
If I change the instrumental dummies and omit uk instead of scandanavia, I get a different coefficient which would be interpreted differently. As a result, how would I interpret the LLY_KL93 coefficient having omitted a certain region from the instrumental regression?

↧

Creating a Retrospective Panel

March 20, 2016, 3:29 pm

≫ Next: How do i generate a monthly variance from a daily dataset

≪ Previous: Problem Interpreting IV Results

Hello

I have some schooling data from which I am trying to create a retrospective panel of schooling history.

The schooling data has information on

1. Current age
2. Age at entry in school
3. Age at drop out (if dropped)

From this information I can create new variables that are

1. Year of entry in school
2. Year of exit from school (this will be equal to year of survey for those still in school at time of survey)

I want to be able to expand my data in a way that creates multiple observations for each individual wherein they enter the sample when they turn 6 (say 2000) and exit when they turn 18 (say 2012).

1. For someone who stays in school from 6 to 18 I want a variable called enrolled that is "1" for all years between 2000 and 2012.
2. For someone who enters school at 6 and drops out at 10, I want the variable enrolled to be "1" for 2000-2004 and "0" for 2005-2012.

For children who are younger than 18 at the time of the survey they are in the panel from the time they are 6 till the survey year. Same rules as above apply in creating 0/1 enrolled variable. I also have some children who never enrolled in school so their enrolled variable will be counted as "0" for all the years between 6 and 18 (or age at survey, whichever is lower)

How can I use the

Code:

expand

(or any other command) to change my data is such a way to make this retrospective panel?

I really appreciate any help I can get on this!

An example of what my data looks like

PID	Age	Age_Entry	Age_exit	Year Entry	Year Exit	Year Survey	Remarks
1	10	6	10	2008	2014	2014	1 throughout [never dropped out]
2	12	5	10	2007	2012	2014	1 from 2007-2012, 0 from 2013-2014 [dropped out]
3	10	6	6	2010	2010	2014	0 throughout from 2010-2014 [never enrolled]
4	20	6	16	2000	2010	2014	1 from 2000-2010 0 from 2011-2012 AND drops from sample after 2012 (when becoming 18) [dropped out]

↧

How do i generate a monthly variance from a daily dataset

March 21, 2016, 8:55 am

≫ Next: multiple mi imputes in one dataset

≪ Previous: Creating a Retrospective Panel

Hi, I have an exchange rate dataset that has a daily frequency and i would like to generate the monthly variance. May I know how you can go about doing it. Do I run an "foreach" loop command involved?

↧

multiple mi imputes in one dataset

March 21, 2016, 9:01 am

≫ Next: Sargan and ar(1) test statistics - gmm one and two step - esttab

≪ Previous: How do i generate a monthly variance from a daily dataset

Dear all,

Can you help me please with an issue with multiple imputations? I have different variables that have missing values and I am trying to fill them with mi. I have got a question regarding the variables that should be chosen for regression with mi impute.
If I impute one variable with

mi impute monotone (regress) X1 = Y X2 X3 X4 ( those X2, X3, and X4 have no missing values and they do explain the dependent variable well that is why I have chosen them and Y is the dependent variable)

for the next variable that I want to impute with multiple imputations I need to do it with the same Y X2 X3 X4 or I can include for example already imputed X1 too? I guess the error should get bigger if I do include it, right?

Can you please help
Thanks in advance

↧

Sargan and ar(1) test statistics - gmm one and two step - esttab

March 21, 2016, 9:21 am

≫ Next: Change axis when using tabout to summarize a continuous variable?

≪ Previous: multiple mi imputes in one dataset

Dear Stata Users,

I am trying to construct a table including the different specifications of a gmm model.
I want i) the coefficients estimated using the gmm one-step; and ii) sargan and AR tests obtained from two step estimation to appear in the table.

Is there any way to to this?

Here is my code:

***Specification 1;
xtdpd lrexp l1.lrexp mean_vol dlogindustry_rer_96_99_cst HDOL_D_RER_PS lrsale capin llabprod leverage2 logGDP_Partners logGDP_Domestic if year>2001,dgmmiv(lrexp , lag(3 3) ) dgmmiv( lrsale llabprod , lag(3 3)) div( HDOL_D_RER_PS dlogindustry_rer_96_99_cst mean_vol logGDP_Partners logGDP_Domestic) hascons twostep
estat sargan
estat abond

eststo gmm_imp_inp1: xtdpd lrexp l1.lrexp mean_vol dlogindustry_rer_96_99_cst HDOL_D_RER_PS lrsale capin llabprod leverage2 logGDP_Partners logGDP_Domestic if year>2001,dgmmiv(lrexp , lag(3 3) ) dgmmiv( lrsale llabprod , lag(3 3)) div( HDOL_D_RER_PS dlogindustry_rer_96_99_cst mean_vol logGDP_Partners logGDP_Domestic ) hascons

***Specification 2;

xtdpd lrexp l1.lrexp mean_vol dlogindustry_rer_96_99_cst HDOL_D_RER_PS lrsale capin dllabprod leverage2 logGDP_Partners logGDP_Domestic if year>2001,dgmmiv(lrexp , lag(3 3) ) dgmmiv( lrsale dllabprod , lag(3 3)) div( HDOL_D_RER_PS dlogindustry_rer_96_99_cst mean_vol logGDP_Partners logGDP_Domestic) hascons twostep
estat sargan
estat abond

eststo gmm_imp_inp2::xtdpd lrexp l1.lrexp mean_vol dlogindustry_rer_96_99_cst HDOL_D_RER_PS lrsale capin dllabprod leverage2 logGDP_Partners logGDP_Domestic if year>2001,dgmmiv(lrexp , lag(3 3) ) dgmmiv( lrsale dllabprod , lag(3 3)) div( HDOL_D_RER_PS dlog_industry_rer_all mean_vol logGDP_Partners logGDP_Domestic ) hascons

****Final table;

esttab gmm_imp_inp1 gmm_imp_inp2, b(a2) star(* 0.10 ** 0.05 *** 0.01) t margin label order( dlog_industry_rer_all log_industry_rer_all dlogindustry_rer_96_99_cst logindustry_rer_96_99_cst mean_vol capin leverage2 lrsale llabprod dllabprod HDOL_RER_PS HDOL_D_RER_PS logGDP_Partners logGDP_Domestic )

↧

Change axis when using tabout to summarize a continuous variable?

March 21, 2016, 10:11 am

≫ Next: bootstrap with -mi- and using e(b_mi)

≪ Previous: Sargan and ar(1) test statistics - gmm one and two step - esttab

Is it possible to change the axis when using the tabout command for summarizing a continuous variable (pulsenum_) by one binary variable (randomise_)?

The code below runs properly; however, I would prefer to have the binary variable "randomise_" in the columns and the mean, median, & N of pulsenum_ in the rows (rather than the opposite).

tabout randomise_ if itt_==1 using procedurelog_v1.xls, ///
sum cells(mean pulsenum_ sd pulsenum_ median pulsenum_ N pulsenum_) ///
rep f(1) oneway

This is what the table looks like with the code above. I want to know if it's possible to change the axis.

randomise_	Mean	Sd	Median	N
	pulsenum	pulsenum	pulsenum	pulsenum
Groupe 1: Actif	103.6	10.9	105	117
Groupe 2: Sham	104.6	7.4	105	58
Total	103.9	9.8	105	175

Thanks for any solutions you may have.
Chris

↧

bootstrap with -mi- and using e(b_mi)

March 21, 2016, 10:43 am

≫ Next: Creating expressions/variables with post-estimation matrix e(b) elements, or post-estimation scalars _b(varname)

≪ Previous: Change axis when using tabout to summarize a continuous variable?

Hi,

I am running the following program to obtain estimates with bias corrected bootstrapping. I get the estimate for x1 with the following program, but how to obtain the estimates for other variables (x2-x8) in the model? especially the i.x8, which is a factor variable with 8 categories (first as reference).

program myprog , rclass

mi estimate, saving(miest1, replace) eform post: glm y x1 x2 x3 x4 x5 x6 x7 i.x8 c1 c2, fam(poisson) link(log) nolog vce(robust)

matrix bb_x1= e(b_mi)
scalar b_x1=(bb_x1[1,1])
return scalar b_x1=x1[1,1]

bootstrap exp(r(b_x1)) , seed(12345) reps(50): myprog

estat bootstrap, all

Any advice is highly appreciated.

Best wishes,
Massao

↧

Creating expressions/variables with post-estimation matrix e(b) elements, or post-estimation scalars _b(varname)

March 21, 2016, 11:35 am

≫ Next: Raykov composite reliability in stata?

≪ Previous: bootstrap with -mi- and using e(b_mi)

Dear Readers,

I am trying to generate a new variables computed using a linear combination of the coefficients from a linear regression and previously existing variables.
I have 3 questions.
1. Is this the best way to do this, or should I be working with matrixes instead?
2. how to use _b(varname) with gen function?
3. How to access matrix elements of e(b) matrix created after the regression? And can those elements be used in creating the variable?
Note: The current error I'm getting is: command _b is unrecognized
Note 2: As background to Q.1, the objective is then to collapse by mean if the variable crisis==1 and then plot the f_err1, f_err2,.... f_err4 (this is why I think it could be a better idea to create a vector which has the mean of f_err`i' in its entry (1,i) and then just graph it).

A synthesis of the code (it is longer, but I'm replicated a small part to make the point of what I'm trying to do):

set more off
sort id year
xtreg grrt_wb L(1/4).grrt_wb , r
predict g_hat, xb
bys ccode: gen f_err1_ba= grrt_wb - g_hat if crisis==1
bys ccode: gen f_err2_ba= grrt_wb - ( _b(cons) + _b(L.grrt_wb)*f_err1_ba) if crisis==1

Any help would be greatly appreciated!

Regards,

CH

↧

Raykov composite reliability in stata?

March 21, 2016, 11:53 am

≫ Next: Collapse and missing values: Generate dates that do not appear and assign value 0

≪ Previous: Creating expressions/variables with post-estimation matrix e(b) elements, or post-estimation scalars _b(varname)

Hi,

I estimated the cronbach alpha as a measure of reliability for a scale using the following commands:

alpha x1 x2 x3 x4 x5 x6 x7 x8 x9 x10, std item

However, a reviewer asked to report the raykov composite reliability instead.

Any ideas how to do that in stata?

Any suggestion is highly appreciated.

Best wishes,
Massao

↧

Collapse and missing values: Generate dates that do not appear and assign value 0

March 21, 2016, 12:12 pm

≫ Next: Random Effects & Pooled OLS results the same in panel data analysis

≪ Previous: Raykov composite reliability in stata?

I have a dataset containing fights incidents from Nov 2015 to Feb 2016 for different facilities. I need to create collapsed data-set with the total number of fights for each month by facility. So each facility should have data for 4 months The problem is that sometimes there not fights for some months so the that month don't appear in my collapse computation. I would like to see the month and just have a 0 values. Is there a way to generate the missing month and assign a value of 0 by facility? So for ample for facility EW we need two months Dec and Jan. Or perhaps is there a way to do when collapsing the data in the first place?

aaa	2015m11	12
aaa	2016m2	15

Code:

* Example generated by -dataex-.    To install: ssc    install    dataex
clear
input str5 FACILITYCODE float ym    long fightmonth
"MOA"  670 78
"MOA"  671 92
"MOA"  672 76
"MOA"  673 68
"BKC" 672  1
"EW"  670  1
"EW"  673  1
end
format %tm ym

↧

Random Effects & Pooled OLS results the same in panel data analysis

March 21, 2016, 12:39 pm

≫ Next: Maximum likelihood estimation

≪ Previous: Collapse and missing values: Generate dates that do not appear and assign value 0

Hi,

This is my first post here and I'm relatively new to stata. I'm doing some panel data analysis and my fixed effect, random effects and pooled ols results are all relatively similar (although RE & OLS are identical). Why might this be the case and how can I correct for it?

↧

Maximum likelihood estimation

March 21, 2016, 12:57 pm

≫ Next: Massive Survival Model takes Forever!

≪ Previous: Random Effects & Pooled OLS results the same in panel data analysis

Dear all, Dear all,

I am currently trying to estimate the parameters of a limited dependent variable model, as introduced by Lesmond et al. (1999).
y=b*x-c1 if y<0
y=b*x-c2 if y>0

This is the log likelihood function:

LNF=SUM(if y<0) [ln(1/(2*pi*(sigma^2)))-((y_j+c1-b*x)^2)/(2*(sigma^2))] +SUM(if y>0) [ln(1/(2*pi*(sigma^2)))-((y+c2-b*x)^2)/(2*(sigma^2))]+

SUM(if y=0) [Ln(NCDF((c2-b*x)/sigma)- NCDF((c1-b*x)/sigma)].

Where the y and x vectors represents the dependent and independent variables, respectively. NCDF stands for normal cumulative density function.

The parameters c1, c2, b, and sigma need to be estimated.

Below you find the .ado file and the command I used for this problem.

program LDV

version 12.0

args lnf mu1 mu2 sigma

quietly replace `lnf'=ln(1/(2*_pi*(`sigma'^2)))-(($ML_y1-`mu1')^2)/(2*(`sigma'^2)) if $ML_y1<0

quietly replace `lnf'=ln(1/(2*_pi*(`sigma'^2)))-(($ML_y1-`mu2')^2)/(2*(`sigma'^2)) if $ML_y1>0

quietly replace `lnf'=ln(normalden(-`mu2',0,`sigma')-normalden(-`mu1',0,`sigma')) if $ML_y1==0

end

Stata command:

constraint 1 [#1]x = [#2]x

ml model lf LDV (mu1: y=x) (mu2: y=x) (sigma

I am not sure whether this is correct or not. c1 c2 are needed. So I take c1 as the -coef(constant) for mu1 and c2 as the -coef(constant) for mu2.

I would really appreciate your hep!

Lei Zou

↧

Massive Survival Model takes Forever!

March 21, 2016, 1:53 pm

≫ Next: stepchart graph with time series values by year

≪ Previous: Maximum likelihood estimation

We routinely process a file with 4.5g of data. We are adding 80m of data every month. The processing time appears to increase exponentially. We believe an egen command is the issue. Is there another set of commands that would operate the same way as egen more efficiently? Does Stata it self have limitations when you get these big files? Thank you, Pete.

↧

stepchart graph with time series values by year

March 21, 2016, 4:51 pm

≫ Next: Create the ROC curve for the binomial logit model

≪ Previous: Massive Survival Model takes Forever!

Hi stata users,
I have some time series values and I want a graph with steps like that in the link:

http://peltiertech.com/line-chart-without-risers/

Do you know if there is a graph function?
Regards

↧

Create the ROC curve for the binomial logit model

March 21, 2016, 4:57 pm

≫ Next: zip or xtpoisson?

≪ Previous: stepchart graph with time series values by year

Dear Statalist,
I’m trying to plot the "Receiver Operating Characteristics curve" and get the area under the curve for my binomial logit model.
The AUC has been estimated using the Stata command ‘‘lroc’’. (I am using Stata 14.1 on Windows 7). When I run the code below, I get a long line of errors.
Any help please?

logit Yvar x1var x2var…….
lroc

Error messages :
(note: scheme s2color not found, using s2color)
…………………………….
(note: scheme s2color not found, using s2color)
(note: scheme s2color not found, using s2color)
system limit exceeded - see manual
(note: default scheme s2color not found, ignored)
(note: _restyle could not find style indexed 7 in the current scheme for class gsize)
(note: _restyle could not find style indexed 1 in the current scheme for class margin)
(note: _restyle could not find style indexed 1 in the current scheme for class gsize)
(note: _restyle could not find style indexed 1 in the current scheme for class margin)
(note: _restyle could not find style indexed 9 in the current scheme for class compass2dir)
(note: textboxstyle not found in scheme, default attributes used)
(note: tickstyle not found in scheme, default attributes used)
(note: gridstyle not found in scheme, default attributes used)
(note: ticksetstyle not found in scheme, default attributes used)
(note: ticksetstyle not found in scheme, default attributes used)
(note: ticksetstyle not found in scheme, default attributes used)
(note: ticksetstyle not found in scheme, default attributes used)
(note: axisstyle not found in scheme, default attributes used)
(note: _restyle could not find style indexed 2 in the current scheme for class yesno)
(note: _restyle could not find style indexed 2 in the current scheme for class yesno)
(note: textboxstyle not found in scheme, default attributes used)
(note: _restyle could not find style indexed 1 in the current scheme for class tb_orientstyle)
(note: _restyle could not find style indexed 1 in the current scheme for class yesno)
(note: _restyle could not find style indexed 2 in the current scheme for class yesno)
(note: _restyle could not find style indexed 2 in the current scheme for class yesno)
(note: _restyle could not find style indexed 2 in the current scheme for class yesno)
(note: _restyle could not find style indexed 2 in the current scheme for class yesno)
(note: _restyle could not find style indexed 1 in the current scheme for class transformstyle)
(note: _restyle could not find style indexed 1 in the current scheme for class yesno)
(note: _restyle could not find style indexed 2 in the current scheme for class yesno)
(note: _restyle could not find style indexed 1 in the current scheme for class yesno)
(note: _restyle could not find style indexed 1 in the current scheme for class transformstyle)
(note: _restyle could not find style indexed 9 in the current scheme for class compass2dir)
(note: _restyle could not find style indexed 2 in the current scheme for class yesno)
(note: _restyle could not find style indexed 2 in the current scheme for class yesno)
(note: _restyle could not find style indexed 2 in the current scheme for class yesno)
(note: textboxstyle axis_title not found in scheme, default attributes used)
(note: textboxstyle not found in scheme, default attributes used)
(note: axisstyle not found in scheme, default attributes used)
(note: _restyle could not find style indexed 1 in the current scheme for class margin)
(note: shadestyle not found in scheme, default attributes used)
(note: linestyle foreground not found in scheme, default attributes used)
(note: linestyle not found in scheme, default attributes used)
(note: shadestyle foreground not found in scheme, default attributes used)
(note: shadestyle not found in scheme, default attributes used)
(note: linestyle foreground not found in scheme, default attributes used)
(note: linestyle not found in scheme, default attributes used)
(note: shadestyle foreground not found in scheme, default attributes used)
(note: shadestyle not found in scheme, default attributes used)
(note: _restyle could not find style indexed 2 in the current scheme for class horizontal)
(note: _restyle could not find style indexed 2 in the current scheme for class vertical)
(note: _restyle could not find style indexed 9 in the current scheme for class compass2dir)
………………………….
(note: _restyle could not find style indexed 2 in the current scheme for class yesno)
(note: _restyle could not find style indexed 1 in the current scheme for class yesno)
…………………..
(note: graphstyle graph not found in scheme, default attributes used)
(note: graphstyle not found in scheme, default attributes used)
(note: color background not found in scheme, default attributes used)
option seriesid() not allowed
invalid syntax
r(111);

Best Regards Sabrine

↧

zip or xtpoisson?

March 21, 2016, 6:35 pm

≫ Next: Failure of CODE delimiters

≪ Previous: Create the ROC curve for the binomial logit model

So I have a dilemma: a zip model with clustered standard errors (which takes into account my zeros), or use xtpoisson (which does my multiple levels correctly, but can't take into account the extra zeros).

I have data gathered from conference participants giving feedback on the sessions they attended. Fairly small conferences; with a 50% response rate, I get about 75-150 respondents per conference, with a given person probably attending four to six sessions. We are testing a new questionnaire format that is more smartphone-friendly and follows best practices in general -- currently I have about 500 respondents for each format, a little under 1000 respondents total. My dependent variables are missing data points on quantitative questions and length of an open-ended response. So my data is nested like this:

Conference
Person and Session crossed
response

Would it be reasonable to model it like below for a robustness check that I footnote? The effect of the new questionnaire format is strong enough that it shows up in just looking at the means (and t-tests), logit models, poisson, zip, and xtpoisson. I probably will report the simple differences in means (after all, the audience is more MBA-speak than Stats-geek), but want to be able to footnote that the effects were significant with more appropriate models. But there is no truly appropriate model that I can find. If I had a truly appropriate model, I might actually report its results along with the changes in means.

Any thoughts?

Code:

clear
set more off

input str2 conference byte(person session) str40 comment byte format
"FL" 1 10 ""                                         1
"FL" 1 11 "Dr. Keen was really Keen"                 1
"FL" 1 12 ""                                         1
"FL" 1 13 "fantastic"                                1
"FL" 1 14 ""                                         1
"FL" 2 10 "interesting"                              1
"FL" 2  9 "boring.  Nothing to see here, move along" 1
"FL" 2  6 "repetitive"                               1
"FL" 3 10 ""                                         1
"FL" 3  5 ""                                         1
"FL" 3  6 ""                                         1
"FL" 3  4 ""                                         1
"FL" 3  3 ""                                         1
"FL" 3  1 ""                                         1
"AL" 1 10 ""                                         0
"AL" 1 11 "guacamole!"                               0
"AL" 1 12 ""                                         0
"AL" 1 13 ""                                         0
"AL" 1 14 "food was good, presentation bad"          0
"AL" 2 10 ""                                         0
"AL" 2  9 ""                                         0
"AL" 2  6 ""                                         0
"AL" 2 10 "seen this one before"                     0
"AL" 3  5 ""                                         0
"AL" 3  6 ""                                         0
"AL" 4  4 ""                                         0
"AL" 4  3 ""                                         0
"AL" 4  1 ""                                         0
end

gen commentLength=length(trim(comment))
encode conference, gen(confNum)
gen confPerson=int(confNum*1000+person)
gen confSession=int(confNum*1000+session)
sum

zip commentLength format, vce(cluster confPerson) inflate(format)
zip commentLength format, vce(cluster confSession) inflate(format)

xtset confPerson
xtpoisson commentLength format

xtset confSession
xtpoisson commentLength format

↧