Hausman test issues

December 7, 2016, 7:46 am

≫ Next: Combining datasets by multiple variables and a condition

≪ Previous: Probit Model with Binary endogenous regressor

Dear all,

As I've been running a couple of panel regressions, a problem often stated on statalist occurred, the Hausman test mentioned

the rank of the differenced variance matrix (6) does not equal the number of coefficients being tested (7);
be sure this is what you expect, or there may be problems computing the test. Examine the output of your
estimators for anything unexpected and possibly consider scaling your variables so that the coefficients
are on a similar scale.

as well as

(V_b-V_B is not positive definite)

. Similar quotes came up at the analysis of control variables only. As suggested in previous threads I've tried to solve the issue using both sigmamore, sigmaless and changing the Total assets and GDP variables into different scales (thousands, millions, billions). However, none of the (combinations of) solutions solved my problem. Implementing xtoverid did give probable results, however I'm not sure if I'm always allowed to use this instead of the Hausman test. Am I right in this assumption? Does anyone have a suggestion how to handle my problem?

I used the following code in a do-file to conduct the full analysis at once, perhaps there is a mistake there?:

Code:

global id Bank1
global t Date1
global ylist Expenses
global xlist_contr Tot_assets_CS2 GDP_FRED2 Rep_senate_dummy Rep_house_dummy Rep_president_dummy
global xlist Tot_assets_CS GDP_FRED Rep_senate_dummy Rep_house_dummy Rep_president_dummy ROE_FR Tot_riskbased_capt_ratio_BR

describe $id $t $ylist $xlist
summarize $id $t $ylist $xlist
outreg2 using x.doc, replace sum(log)

* Test multicollinearity - control variables
quietly reg $ylist $xlist_contr
estat vif

* Test multicollinearity
quietly reg $ylist $xlist
estat vif

* Set data as panel data
sort $id $t
xtset $id $t
xtdescribe
xtsum $id $t $ylist $xlist

* Pooled OLS estimator
reg $ylist $xlist
outreg2 using myreg.doc, replace ctitle(OLS)

* Population-averaged estimator
xtreg $ylist $xlist, pa

* Between estimator
xtreg $ylist $xlist, be

* Fixed effects or within estimator - control variables
xtreg $ylist $xlist_contr, fe
outreg2 using myreg.doc, append ctitle(Fixed Effects control variables) addtext(Bank FE, YES)
* Test for heteroskedasticity
xttest3
* Test for autocorrelation
xtserial $ylist $xlist_contr

* Fixed effects or within estimator
xtreg $ylist $xlist, fe
outreg2 using myreg.doc, append ctitle(Fixed Effects) addtext(Bank FE, YES)
* Test for heteroskedasticity
xttest3
* Test for autocorrelation
xtserial $ylist $xlist

* First-differences estimator
reg D.($ylist $xlist), noconstant

* Random effects estimator - control variables
xtreg $ylist $xlist_contr, re theta
outreg2 using myreg.doc, append ctitle(Random Effects control variables)

* Random effects estimator
xtreg $ylist $xlist, re theta
outreg2 using myreg.doc, append ctitle(Random Effects)

* Hausman test for fixed versus random effects model - control variables
quietly xtreg $ylist $xlist_contr, fe
estimates store fixed
quietly xtreg $ylist $xlist_contr, re
estimates store random
hausman fixed random

* Sargan-Hansen test for fixed versus random effects model - control variables
quietly xtreg $ylist $xlist_contr, re
xtoverid

* Hausman test for fixed versus random effects model
quietly xtreg $ylist $xlist, fe
estimates store fixed
quietly xtreg $ylist $xlist, re
estimates store random
hausman fixed random

* Sargan-Hansen test for fixed versus random effects model
quietly xtreg $ylist $xlist, re
xtoverid

* Breusch-Pagan LM test for random effects versus OLS
quietly xtreg $ylist $xlist, re
xttest0

* Recovering individual-specific effects
quietly xtreg $ylist $xlist, fe
predict alphafehat, u
sum alphafehat

estat vce

Thank you very much in advance!

↧

Combining datasets by multiple variables and a condition

December 7, 2016, 9:09 am

≫ Next: Calculate discrete failure percent per time interval?

≪ Previous: Hausman test issues

Dear all,

For my research into earnings announcement dates (EADs) I need to combine the following two datasets:

Option prices

Company ID	Date	Option price
Company 1	date1	price1
Company 1	date2	price2
Company 1	date3	price3
Company 2	date1	price4
etc.	etc.	etc.

EAD database

Company ID	EAD
Company 1	EAD1
Company 1	EAD2
Company 1	EAD3
Company 2	EAD1
etc.	etc.

They need to be combined in such a way that for each CompanyID, each observations gets the future EAD that is closest to Date. I am currently trying different type of merges, but I cannot get the conditionality in there. Hopefully, someone out there has encountered this before or has a way around it.

Please be advised that the set is quite large (around 45 milion observations for 5697 companies).

Thanks in advance!

Kind regards,

Marijn Baartmans
Vrije Universiteit Amsterdam

↧

Calculate discrete failure percent per time interval?

December 7, 2016, 9:54 am

≫ Next: poststratification for non-response and estimation of totals

≪ Previous: Combining datasets by multiple variables and a condition

Hi,
Does anyone know a way to make Stata calculate a discrete failure percentage for each interval (ie t, t+1) complete with upper / lower bounds / standard error in a survival analysis?

I suppose I can manually do it (1-[survival_t+1/survival_t]) but it won’t give me the upper / lower bounds or standard error

Any thoughts?

↧

poststratification for non-response and estimation of totals

December 7, 2016, 10:17 am

≫ Next: How to calculate the median for a group but exclude firms in the same industry with its own?

≪ Previous: Calculate discrete failure percent per time interval?

Dear STATALIST users,

from a population of 7396 units I randomly selected half, i.e., 3698 units, for an opinion survey. This means my base weights equal 2. From that sample of 3698 units, 1255 responded.

To correct for non-response bias, I used post-stratification. [Post-stratification can be used to weigh sample data to conform population data, but it can also be used to correct for non-response, interpreting "population data" as "data for the total sample" (see, e.g., Kalton and Flores-Cervantes, 2003).]

I've used the following command:

svyset [pw= base_weight], poststrata(strat_group) postweight(N_total)

with base_weight equal to 2, and N_total the poststratum sizes of the total sample.

So, when estimating a total (svy: total Q9_2_3_dummy), STATA indicates population size is 3698 (see below). So, the statistic represents an estimate of the total for the sample (n=3698), and not for the population (N=7396=2*3698).

svy: total Q9_2_3_dummy
(running total on estimation sample)

Survey: Total estimation

Number of strata = 1 Number of obs = 1,255
Number of PSUs = 1,255 Population size = 3,698
N. of poststrata = 8 Design df = 1,254

----------------------------------------------------------------------------------------
| Linearized
| Total Std. Err. [95% Conf. Interval]
-------------+-------------------------------------------------------------------------
Q9_2_3_dummy | 416.0893 31.74165 353.8167 478.3619
----------------------------------------------------------------------------------------

Question: In order to arrive at a population estimate, do I simply need to multiply the estimate by two, i.e., 416*2=832?

Many thanks,

Lode

reference: Kalton, G. and Flores-Cervantes, I. (2003). Weighting Methods. Journal of Official Statistics 19: 81-97

↧

How to calculate the median for a group but exclude firms in the same industry with its own?

December 7, 2016, 10:18 am

≫ Next: Difference in differences with one regression totally insignificant

≪ Previous: poststratification for non-response and estimation of totals

Dear all,

I have a question that bothered me for a while. I would like to calculate the median leverage in a group that exclude non-acquirer and firms in the same industry with acquirer. A simplified sample of my data looks like the following:

FirmID	SIC2	AcquirerDummy	Leverage	group
10001	11	1	0.2	1
10003	12	1	0.3	1
10004	13	1	0.4	1
10005	11	0	0.5	1
10006	12	0	0.6	1
10007	13	0	0.7	1
10008	14	0	0.8	1
10009	11	1	0.2	2
10010	11	0	0.3	2
10011	12	0	0.4	2

for example, I would like to generate a variable for firm 10001 that equal the median leverage of firm 10006, 10007, and 10008. Thus the median leverage for firms are non-acquirer, have different SIC2 with the acquirer (firm 10001), but within the same group (group 1) with the acquirer.
For firm 10003, the new variable's value should equal to the median leverage of firm 10005, 10007, and 10008.

Any help would be highly appreciated.

Bo

↧

Difference in differences with one regression totally insignificant

December 7, 2016, 10:32 am

≫ Next: Gllapred for specific values of the covariates

≪ Previous: How to calculate the median for a group but exclude firms in the same industry with its own?

Hello everybody,

as the name of the thread suggests, Im currently trying to do a diff-in-diff analysis for my treatment group (some EU countries). The event is the European debt crisis starting in 2009 and therefore the periods are defined as 2001-2008 and 2009-2012. However, when I run the fixed effect regression for 2009-2012, all coefficients get insignificant. So my question now is if I still can do a diff-in-diff analysis and what could be the causes of this insignificance? do I have not enough observations for the 2009-2012 period?

Here are the regression results:

Before the crisis
Array Array

Thank you for your help !

↧

Gllapred for specific values of the covariates

December 7, 2016, 11:15 am

≫ Next: adjustrcspline using imputed dataset?

≪ Previous: Difference in differences with one regression totally insignificant

Dear all,

I am trying to obtain predictions after gllamm at specific values of the covariates.

I am estimating a logit model with random effects (at the region and country level) and weights. I want to estimate the probability of success of an intervention at different values of the risk variable (categorical: 1 V high risk; 2 Fairly high risk; 3 Fairly small risk; 4 Very small risk; 5 DK).

However the predictions do not seem to update for the different values of the covariates I give. I tried following

HTML Code:

http://www.stata.com/meeting/uk08/Sophia.predict5.pdf

Code:

xi: gllamm success i.risk age, i(coren countries) link(logit) family(binom) nip(20) pweight(weight)
estimates store A
gen risk1=risk
gen zeta1=0
gen zeta2=0
forval i=1/5 {
estimates restore A
replace risk=`i'
gllapred adjpred`i', mu us(zeta) 
gen me`i'1=adjpred`i'-adjpred1
}

drop risk
rename risk1 risk

The results of the code:

Code:

 xi: gllamm success i.risky age, i(coren countries) link(logit) family(binom) nip(5)    pweight(weight)
i.risky           _Irisky_1-5         (naturally coded; _Irisky_1 omitted)

Iteration 0:   log likelihood =  -3849.956  (not concave)
Iteration 1:   log likelihood = -3845.7998  
Iteration 2:   log likelihood = -3845.6634  
Iteration 3:   log likelihood = -3844.4678  
Iteration 4:   log likelihood = -3844.4658  
Iteration 5:   log likelihood = -3844.4658  

number of level 1 units = 26052
number of level 2 units = 275
number of level 3 units = 28

Condition Number = 132.91531

gllamm model 

log likelihood = -3844.4658

Robust standard errors

success       Coef.   Std. Err.      z    P>z     [95% Conf. Interval]

_Irisky_2   -.3387562   .0691081    -4.90   0.000    -.4742057   -.2033067
_Irisky_3   -.1743749   .2936712    -0.59   0.553    -.7499598    .4012101
_Irisky_4    .5610155   .1080437     5.19   0.000     .3492538    .7727773
_Irisky_5   -.2142177   .1181688    -1.81   0.070    -.4458243    .0173889
age   -.0402096   .0037367   -10.76   0.000    -.0475334   -.0328858
_cons   -1.213365   .1699397    -7.14   0.000     -1.54644   -.8802891



Variances and covariances of random effects



***level 2 (coren)

var(1): .36912712 (.12842124)

***level 3 (countries)

var(1): .2353787 (.05440445)

    sum adjpred*

    Variable    Obs    Mean    Std. Dev.    Min    Max
                        
    adjpred1    26,052    .0424598    .0308485    .0046232    .2217506
    adjpred2    26,052    .0424598    .0308485    .0046232    .2217506
    adjpred3    26,052    .0424598    .0308485    .0046232    .2217506
    adjpred4    26,052    .0424598    .0308485    .0046232    .2217506
    adjpred5    26,052    .0424598    .0308485    .0046232    .2217506

My ultimate aim is to obtain an outcome similar to what you would get using the margins command. However, margins is not available after gllamm. Xtmelogit would play the trick if I did not have to take into account weights but I need both weights and random effects.

The equivalent of what I am trying to do for a logit model would be given by the code below. I am basically aiming at doing the same thing after gllamm.

Code:

logit success i.risky age [pw=weight]

gen risk1=risky
forval i=1/5 {
replace risky=`i'
predict adjpred`i'
gen me`i'1=adjpred`i'-adjpred1
}
drop risky
rename risk1 risky

Thank you in advance.

↧

adjustrcspline using imputed dataset?

December 7, 2016, 11:24 am

≫ Next: for each observation, find the first change in variables that tracking by years

≪ Previous: Gllapred for specific values of the covariates

Is it possible to use adjustrcspline for data that has been multiply imputed? Does anybody have a good resource?

Code:

mi estimate, dots: regress sumscore rcs_size_* bmi aua volume age i.stitch i.nerve i.damico

adjustrcspline

I keep getting the following error:

all variables created in the last call to mkspline2 must be
independent variables in the last estimation command
r(198).

It works when I don't use the "mi estimate" command.

Thank you in advance for your consideration of my question and for any advice that you can give.

Mark Tyson

↧

for each observation, find the first change in variables that tracking by years

December 7, 2016, 11:47 am

≫ Next: Finding patterns from listwise deletion

≪ Previous: adjustrcspline using imputed dataset?

I am conducting a study to track people's health condition changes. Data will be look like the one below. My goal is to find the first year that the condition changed. For example, for person 1, first year of change is 2001; person 2: never changed; person 5: 2002; ... Maybe it is good idea to create a variable at the end, say "first_yr" which contains the year of first change. The same analysis will be conducted for subsample that includes all individuals without missing information. My real data has more years and observations, I tried to use loop and kind of lost tracking. Any help will be great! thanks.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte id str1(h_2000 h_2001 h_2002 h_2003)
1 "a" "b" "b" "a"
2 "c" ""  "c" "c"
3 "d" "e" "d" "d"
4 "f" ""  "f" "f"
5 "g" ""  "h" "g"
6 "b" "b" "b" "g"
end

↧

Finding patterns from listwise deletion

December 7, 2016, 12:10 pm

≫ Next: Stata programming issue

≪ Previous: for each observation, find the first change in variables that tracking by years

Hello everyone,

I recently ran a few regressions from General Social Survey data and Stata automatically did some listwise deletion in my analyses. This was fine, but some analyses had far fewer respondents after the listwise deletion. The iap or "inapplicable" missing data label was rather high for a few of these.

So I am hoping to double check to make sure the missing data did not have any significant patterns. For example, I don't want the missing data to have any patterns corresponding to age, race, or gender. To do this, I just created a binary variable that grouped all the data into missing or non-missing and saw if there was any significant results with age, race, and gender. However, I have two questions about this.

1) Is this an acceptable way to find patterns in the missing data?

2) When I tried to lump the "iap" data, I got an error message from stata. It cannot recode the iap data so I am now unsure how to analyze it. (for other missing data labeled ".b", this was not a problem.

Thank you!

. recode nataid (1=1) (2=1) (3=1) (iap=2), generate (nataidmiss)
ERROR: unknown el iap in rule

↧

Stata programming issue

December 7, 2016, 12:14 pm

≫ Next: Long Data: Reshape wide with 2 levels

≪ Previous: Finding patterns from listwise deletion

Dear Stata user,
When I run the following program (PROGRAM A) it worked fine. But when run the same program within (See PROGRAM B)

cap prog drop WASH
prog def WASH
....PROGRAM A

it gives me the following error message while reading the line at the end "putexcel A`row'=(variable) B`row'=(b) C`row'=(ll) D`row'=(ul) E`row'=(size) F`row'=(wsize) G`row'=(sd) H`row'=(se) I`row'=(deft)" :

A: invalid cell name
r(198);

Any idea how could I solve this problem?
Thanks for your time.

Nizam

*************************
****PROGRAM A*****
*************************
use "wash_analytical_wgts",clear
svyset [pw=hh_wgt], psu (hhea) strata (project)

global WASH improved_water correct_watertreat boiling bleaching filtering time_water solar_disinfect ///
improved_sanitation open_defecation proper_handwashing

putexcel set "WASH.xlsx", sheet ("COMBINED") replace
putexcel A1=("variable") B1=("mean") C1=("min") D1=("max") E1=("size") F1=("wsize") G1=("sd") H1=("se") I1=("deft")

local row = 2

foreach x of varlist $WASH {
svy: mean `x'
ret list
ereturn list
matlist r(table)
mat basic = r(table)
matlist basic
scalar b= basic[1,1]
scalar se=basic[2,1]
scalar ll=basic[5,1]
scalar ul=basic[6,1]
scalar wsize=round(e(N_pop),1)
scalar size=e(N)
scalar variable=e(varlist)

estat effects, deff deft meff meft
return list
mat deft=r(deft)
scalar deft=deft[1,1]

estat sd
return list
mat sd=r(sd)
scalar sd=sd[1,1]

putexcel A`row'=(variable) B`row'=(b) C`row'=(ll) D`row'=(ul) E`row'=(size) F`row'=(wsize) G`row'=(sd) H`row'=(se) I`row'=(deft)

local row=`row' + 1
scalar drop _all
}

***********************************
**PROGRAM B
*************************************
use "wash_analytical_wgts",clear
svyset [pw=hh_wgt], psu (hhea) strata (project)

global WASH improved_water correct_watertreat boiling bleaching filtering time_water solar_disinfect ///
improved_sanitation open_defecation proper_handwashing

putexcel set "WASH.xlsx", sheet ("COMBINED") replace
putexcel A1=("variable") B1=("mean") C1=("min") D1=("max") E1=("size") F1=("wsize") G1=("sd") H1=("se") I1=("deft")

local row = 2

cap prog drop WASH
prog def WASH
foreach x of varlist $WASH {
svy: mean `x'
ret list
ereturn list
matlist r(table)
mat basic = r(table)
matlist basic
scalar b= basic[1,1]
scalar se=basic[2,1]
scalar ll=basic[5,1]
scalar ul=basic[6,1]
scalar wsize=round(e(N_pop),1)
scalar size=e(N)
scalar variable=e(varlist)

estat effects, deff deft meff meft
return list
mat deft=r(deft)
scalar deft=deft[1,1]

estat sd
return list
mat sd=r(sd)
scalar sd=sd[1,1]

putexcel A`row'=(variable) B`row'=(b) C`row'=(ll) D`row'=(ul) E`row'=(size) F`row'=(wsize) G`row'=(sd) H`row'=(se) I`row'=(deft)

local row=`row' + 1
scalar drop _all
}
end

WASH

↧

Long Data: Reshape wide with 2 levels

December 7, 2016, 2:14 pm

≫ Next: Comparing results of mixed-effects logistic regression with crossed random effects: Stata and R (lme4)

≪ Previous: Stata programming issue

I googled how to do this but I couldn’t find a solution. Basically, I have data in a long format with different levels. My data represents students with miss a series of misconducts for two periods: 1=before and 2=after some time range from the start date of a class. The missing values for the “when” variable means that that misconduct happened outside the date parameters (neither before nor after). I need to reshape this data to a wide format. See below.

My data:

ID	type	when	count
21647a	Arrest	.	1
21647a	Bad Behavior	.	1
21647a	Force	.	2
28878a	Arrest	1	1
28878a	Infraction	1	3
28878a	Bad Behavior	1	1
28878a	Force	1	7
28878a	Force	2	1
28878a	Force	.	5

What I need:

ID	Arrest_Pre	Arrest_Post	BadBehavior_Pre	BadBehavior_Post	Force_Pre	Force_Post	Infraction_Pre	Infraction_Post	SeriousInjury_Pre	SeriousInjury_Post
21647a	0	0	0	0	0	0	0	0	0	0
28878a	1	0	1	0	7	1	3	0	0	0

Data sample:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str6 id str14 type byte(when count)
"21647a" "Arrest"         . 1
"21647a" "Bad Behavior"   . 1
"21647a" "Force"          . 2
"28878a" "Arrest"         1 1
"28878a" "Infraction"     1 3
"28878a" "Bad Behavior"   1 1
"28878a" "Force"          1 7
"28878a" "Force"          2 1
"28878a" "Force"          . 5
"32260a" "Infraction"     1 1
"32260a" "Serious Injury" . 1
"32260a" "Bad Behavior"   1 1
"32260a" "Force"          1 4
end

------------

i think this gets even more complicated because not all students experienced all the types. So I guess we need to expand this data.

Thank you so much for any help,
Marvin

↧

Comparing results of mixed-effects logistic regression with crossed random effects: Stata and R (lme4)

December 7, 2016, 2:35 pm

≫ Next: Tobit returning negative coefficients

≪ Previous: Long Data: Reshape wide with 2 levels

I have created a mixed-effects logistic model with crossed random effects for my data in both Stata and R (using lme4). The two models, presented below, have yielded very similar estimates for the coefficients, but it appears my method of calculating standard errors and p-values in R differs from Stata's method. Could anyone shed any light on the discrepancy between these two models?

Stata:

Code:

.melogit A B i.C D || _all: R.E || F:

Mixed-effects logistic regression               Number of obs      =       553

-----------------------------------------------------------
                          |   No. of       Observations per Group
 Group Variable |   Groups    Minimum    Average    Maximum
----------------+------------------------------------------
             _all |        1        553      553.0        553
                F |       23         14       24.0         40
-----------------------------------------------------------

Integration method:     laplace

                                                Wald chi2(3)       =     29.21
Log likelihood = -337.64555                     Prob > chi2        =    0.0000
--------------------------------------------------------------------------------------
                      A |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
---------------------+----------------------------------------------------------------
                      B |   .0521762   .0259335     2.01   0.044     .0013474    .1030049
                   1.C |  -2.563292   .7732082    -3.32   0.001    -4.078753   -1.047832
                      D |  -4.070081   1.131891    -3.60   0.000    -6.288546   -1.851616
               _cons |   2.693203   1.467959     1.83   0.067    -.1839439    5.570349
---------------------+----------------------------------------------------------------
_all>E                |
           var(_cons)|   .9765646   .3263157                      .5073117    1.879867
---------------------+----------------------------------------------------------------
F                        |
           var(_cons)|   .0615515   .0880975                      .0037233    1.017531
--------------------------------------------------------------------------------------
LR test vs. logistic regression:     chi2(2) =    48.95   Prob > chi2 = 0.0000

Note: LR test is conservative and provided only for reference.

R (lme4):

Code:

> my.glmm = glmer(A ~ B + D + C + (1|E) + (1|F), data = my.data, family = 'binomial')
> my.glmm
Generalized linear mixed model fit by maximum likelihood (Laplace
  Approximation) [glmerMod]
 Family: binomial  ( logit )
Formula: A ~ B + D + C + (1 | E) + (1 |  
    F)
   Data: my.data
      AIC       BIC    logLik  deviance  df.resid 
 687.2924  713.1846 -337.6462  675.2924       547 
Random effects:
 Groups Name        Std.Dev.
 E  (Intercept) 0.9876  
 F   (Intercept) 0.2479  
Number of obs: 553, groups:  E, 44; F, 23
Fixed Effects:
       (Intercept)                 B                D                  C  
           2.69337             0.05206            -4.06681            -2.55907  
> coefs <- data.frame(coef(summary(my.glmm)))
> coefs$p.z <- 2 * (1 - pnorm(abs(coefs$t.value)))
Error in abs(coefs$t.value) : 
  non-numeric argument to mathematical function
> coefs
                   Estimate      Std..Error      z.value     Pr...z..
(Intercept)       2.69337259 2.16447149  1.244356 0.2133686532
B                 0.05206151 0.02424857  2.146992 0.0317938861
D                -4.06681259 2.13085477 -1.908536 0.0563220015
C                -2.55906567 0.74622222 -3.429361 0.0006050034

I know that I am using a roundabout way to get z scores and p-values for the coefficients in R, but it would seem that Stata's method must somehow be different. I'm hoping that I've specified the model correctly in both programs. Are there some estimation methods or assumptions that differ between Stata and R for this type of model? Thank you!

↧

Tobit returning negative coefficients

December 7, 2016, 5:17 pm

≫ Next: help with putexcel and creating a loop with foreach

≪ Previous: Comparing results of mixed-effects logistic regression with crossed random effects: Stata and R (lme4)

Hello,

I have a panel data from an RCT with two observations per per subject (i.e. each subject has two rows in my data file). Subjects were asked to donate some money via two ways, let's call these channel A and channel B. Some subjects gave money via both channels and some only via one channel. My dependent variable is how much money they gave (' Amtgiven'). If a person gave $10 via channel A and $20 via channel B, this person will have one row with $10 and one with $20 in 'Amtgiven'. If a person only gave say $50 via channel A, she will have $50 in one row and $0 in the other. And so on.

Subjects were randomly allocated to two treatment groups, denoted by dummies in the dataset.

My independent variables are a series of dummies that aim to cover all possible scenarios as follows:

X1 = (Treatment1 * ChannelA)

X2 = (Treatment2* ChannelB)

X3 = (Treatment1 * ChannelB)

X4 = (Treatment2*ChannelA)

So for example X1 will be = 1 when that subject was in Treatment group 1 and gave a positive amount of money via channel A and so on. I know want to estimate the following model:

tobit Amtgiven X1 X2 X3 X4, ll(0) vce( cluster ID) noconstant

With the above model I cluster errors at the variable ID so STATA knows there are two rows per subject.

Now, my problem is that the coefficients for X3 and X4 are negative from this model! This is really odd since all my dependent and independent variables have 0 as a min and can never take negative values. When I run the same model with OLS in fact the coefficients are positive. But I need to use tobit since I have many observations that are left-censored.

I also saw some more complicated ways to do Tobit with fixed effects etc. but I think what I am trying to do here should be fairly simple...

Thanks,

Bill

P.S.: I have already requested my name to be updated to full name.

↧

help with putexcel and creating a loop with foreach

December 8, 2016, 4:29 am

≫ Next: Difference-in-difference estimation: multiple pre/post-periods, lags, state-specific time trends, treatment intensity

≪ Previous: Tobit returning negative coefficients

Hi,

I have a health dataset which contains information for some 350 clinics. I would like to export basic descriptive data for each clinic to excel.

How do I use putexcel and foreach to generate a loop which exports descriptive data for each clinic in a separate sheet in excel?

Currently my code looks like this:

putexcel set ZQD.xlsx, sheet(All)replace
putexcel C1=("All participants")
putexcel C2=("N") D2=("%")

*no of participants
count
return list
putexcel A3=("Total participants") C3 = (r(N))

* sex
tab sex, m

putexcel A10=("Gender")
tab sex, matcell (cell) m
putexcel B10=("male") B11=("female") B12=("missing") ///
C10=matrix(cell) D10=matrix (cell/r(N)*100)

How do I write this so that a loop is created and for each hospital I get the sex and number of participants recorded in a new sheet in excel.

I would like to find the code which says:

"For each hospital coded in the variable called Hospital please record the outputs on number of pariticpants and the sex of participants in 350 excel sheets each representing outputs of a different hospital."

I hope this question makes sense.

Thanks
Ingrid

↧

Difference-in-difference estimation: multiple pre/post-periods, lags, state-specific time trends, treatment intensity

December 8, 2016, 5:34 am

≫ Next: Stata Conference 2017 Call for Presentations

≪ Previous: help with putexcel and creating a loop with foreach

Dear Statalist community,

I want to execute a Diff-in-diff estimation.

My data is on the state-level and comprises the years 2005-2013.
State-year-level observations;
state:= 1,..,16;
year:= 2005,..,2013.

My treatment:
2 of 16 states implemented a policy:
state_1 in 2009, state_2 in 2010.
I expect this policy change (which is in place starting from the time of implementation) to affect the outcome variable not in the year of implementation, but with a two-year lag, three-year lag, .., .

1. Question)
Supposed the treatment-effect is treated as equal for both states, I model:

Code:

gen D=0
replace D=1 if state==1 & year==2009
replace D=1 if state==2 & year==2010

reg outcome_ij i.state i.year l2.D_ij l3.D_ij l4.D_ij, vce(cluster state)

Is this specification correct?

If I excluded the terms "l3.D_ij l4.D_ij":
Would Stata assume in 2012/2013 that the outcome of state_1 would return to its state- and year-fixed effect level and expect no treatment effect?

2. Question)
How could I include state-specific time trends instead of assuming that they are independent of the state?

Code:

reg outcome_ij i.state i.year state#year l2.D_ij l3.D_ij l4.D_ij, vce(cluster state)

Is this specification correct?

3. Question)
Some of my covariates are distorted after 2010, some after 2011.
Is it possible to include them even if they only cover the pre-treatment period (like in the Synthetic Control Method approach)?

4. Question)
Public Health Insurance (share): Variable of treatment intensity OR important covariate?
The treatment only affects individuals that are publicly insured.
I do not expect the share of public HI to directly affect the outcome.
But the share is expected to affect the treatment effect.

Some papers interact the share of individuals with public HI with the treatment indicator.
But, one author states that "controlling for .. health insurance coverage (as a predictor of a closely related outcome) is important .. given that uninsured individuals are not directly affected by the mandates".
Is it enough to interact share of publicly HI with my treatment indicator or should I even include it as a covariate?
I think: If the share is exceptionally high in the two treated states, the treatment effect could be inflated.
How would I include the interaction term in my specification?

Thank you very much for your support/help in advance!

Kind regards,
Mischa

↧

Stata Conference 2017 Call for Presentations

December 8, 2016, 6:36 am

≫ Next: addressing left censoring in survival analysis with enter() option?

≪ Previous: Difference-in-difference estimation: multiple pre/post-periods, lags, state-specific time trends, treatment intensity

Dear Stata Community,

I am pleased to announce that presentation abstracts are now being accepted for Stata Conference 2017, to be held in Baltimore, Maryland, on July 27-28, 2017. Go to http://www.stata.com/meeting/baltimore17/ to get more details about the conference and to submit an abstract. Presentations are typically 15 or 25 minutes and can be on any topic involving a creative or innovative use of Stata. Abstracts must be 200 words or less and are due by March 31, 2017. Priority will be given to presentations involving the use of Stata to better predict presidential election results (just kidding).

Even if you don't have anything to present, please consider attending the conference. It is a great venue for interacting with other Stata users and with StataCorp staff. A highlight every year is an opportunity at the end of the conference to provide feedback to StataCorp on things that need to be fixed or new features that would be helpful. Details about the location of the conference can be found at the above link and more information about hotel reservations and conference registration will be posted there soon.

Please feel free to let me know if you have any questions or comments about the conference and I will direct them as appropriate. I am a native of the Baltimore area, so please don't hesitate to ask if you have questions about the conference location, things to do in Baltimore, travel arrangements, etc.

Looking forward to seeing you here July!

Sincerely,
Joe Canner
on behalf of the Scientific Committee

↧

addressing left censoring in survival analysis with enter() option?

December 8, 2016, 8:38 am

≫ Next: Heckman probit error

≪ Previous: Stata Conference 2017 Call for Presentations

Dear all,

I work on a single record per id, single spell survival dataset with late entry and try to do everything I need in this framework with the built in Stata commands. This means the standard hazard-/survival function graphs by groups as well as (semi-)parametric regressions.

quickly to the setting: Let's call a person going to a certain store a relationship. I want to model the survival of these relationships in the last 12 quarters prior to the specific store's closing. In this respect I am not worried about right censoring because the spell is either completed when the relationship has terminated before the store has closed forever, or the spell lasts until t=12. There is nothing after the "right end" of my study time.

My question relates to the "left end" of my sample, i.e. the relationships that existed prior the last 12 quarters of the store's existence, that also survived into my analysis time. I am in the fortunate situation to have a very large dataset that goes back all the way so I know when every relationship started.

My question is how to properly take this information into account to not have a problem arising from left censoring.

Can this be accomplished by simply typing

stset timevar, failure(D_CENSORED=1) origin(time 0) enter(time relastartdate)

where D_CENSORED is an indicator variable equal to 1 when the spell ends before the store closes, and relastartdate gives the start quarter of the relationship, relative to the beginning of analysis time (i.e. it's negative for otherwise left-censored observations)? I tried this with a dataset I created just for this purpose but inclusion of the enter() option does not seem to alter the kaplan-meier survival function or anything else.

Any help is appreciated!
Swati

/edit: I also just tried stsetting the dataset with positive values for relastartdate but it still did have no effect. I did this because I realized before timevar should be positive. is this because all obs before becoming at risk are ignored for the analysis? but how can I address the left censoring then without leaving these observations out or integrating out?

↧

Heckman probit error

December 8, 2016, 8:49 am

≫ Next: Supercolumn position in table command

≪ Previous: addressing left censoring in survival analysis with enter() option?

Hello.

When the following formula is run then error messages appears:
. gen capphi=norm(p1)
. gen invmills=phi/capphi

How can one fix this? Stata didn’t seem to recognize the nnorm function.
Need this for some research in social studies.

Thank you for answers.

↧

Supercolumn position in table command

December 8, 2016, 9:00 am

≫ Next: Converting text values to numeric values in Stata dataset

≪ Previous: Heckman probit error

Dear All,

I need a table with layout same as shown below, but supercolumn totals placed first (before "cheap") in the order of supercolumns.
Just in case, I am looking for a solution which would utilize the existing options of the -table- command, not totally rewriting this command (unless a drop-in replacement is already available).

Thank you, Sergiy Radyakin

Code:

--------------------------------------------------------------------------
Repair    |                  Price category and Car type                  
Record    | ------ cheap -----    ---- expensive ---    ------ Total -----
1978      | Domestic   Foreign    Domestic   Foreign    Domestic   Foreign
----------+---------------------------------------------------------------
        1 |                        4,564.5               4,564.5          
        2 |    3,667               6,296.3               5,967.6          
        3 |    3,515     3,895     6,993.6   5,295.5     6,607.1   4,828.7
        4 |    3,829     3,995     6,138.1   6,544.8     5,881.6   6,261.4
        5 |    3,984     3,773       4,425   7,012.6     4,204.5   6,292.7
--------------------------------------------------------------------------

Obtained with the following code:

Code:

version 13.0
clear all
sysuse auto
recode price (1/4000=1 "cheap") (4001/16000=2 "expensive"), generate(pcateg)
label variable pcateg "Price category"
table rep78 foreign pcateg, c(mean price) scol

↧