stset: Cannot wrap my head around how to explain death to Stata

June 19, 2016, 12:51 pm

≫ Next: impossible calculation with %fc format

≪ Previous: Stata not responding with cluster ()

Dear all,

I am trying to stset my data, and I cannot quite figure out how to do what I want to do.

My data concerns head coaches which work for sports teams. I have multiple observations of each coach in irregular intervals (where I am recording performance data etc.). I would like to stset the data, so I can run all sorts of survival time commands on them.

For each one of the coaches, I know the date he gets fired. But: I do not have an observation which is from the exact day of firing. So I wonder how I can tell this to Stata?

Simply adding a new variable "fired" as my event variable and setting it to 1 on each coach's last observation feels wrong because this observation can of course be quite some time before the coach gets fired.

Do you have any ideas of how I need to do this?

Your help is greatly appreciated!
Michael

↧

impossible calculation with %fc format

June 19, 2016, 1:08 pm

≫ Next: local (macro) question

≪ Previous: stset: Cannot wrap my head around how to explain death to Stata

Hi Statalist
I got a weird thing I'm not able to figure out.
I'm currently working on money value and would like to use the %w.dfc format.
I need to store several local macros to make a table showing the results I want. I use the following code:

Code:

foreach x of numlist 0 1{
    foreach var of varlist totalcharge_ccr ly_gained {
        sum `var' if SummaryCPR    ==    `x' , detail
        local mean_`var'_CPR`x'    =    trim("`:display %12.1fc r(mean)'")
    }
    local    cer_cpr`x' = trim("`: display %3.2fc `mean_totalcharge_ccr_CPR`x''/`mean_ly_gained_CPR`x'''")
}

local incrm_health    =    trim("`: display %5.1fc (`mean_ly_gained_CPR1'-`mean_ly_gained_CPR0')'")
local incrm_cost    =    trim("`: display %10.1fc (`mean_totalcharge_ccr_CPR1'-`mean_totalcharge_ccr_CPR0')'")
local icer             =    trim("`:display %10.1fc (`incrm_cost'/`incrm_health')'")

The Stata output is as follows:

Code:

. local incrm_health = trim("`: display %5.1fc (`mean_ly_gained_CPR1'-`mean_ly_gained_CPR0')'")

. local incrm_cost = trim("`: display %10.1fc (`mean_totalcharge_ccr_CPR1'-`mean_totalcharge_ccr_CPR0')'")
61,676.5 invalid name

. local icer = trim("`:display %10.1fc (`incrm_cost'/`incrm_health')'")
/3.7 invalid name

When I use the format without comma, it works.
Do you have any idea to fix it? I could of course fix it after the table is created, but I try to build definite tables directly from Stata.
Thanks for your help

↧

local (macro) question

June 19, 2016, 1:41 pm

≫ Next: correlated, ar1, or psar1?

≪ Previous: impossible calculation with %fc format

I have a local macro containing several terms, and I want each time I reference it to have it include more of the terms. I was unable to find how to do it elegantly. For example I have a local containing regressors, and I want each time I refrence it to include all the regressors up to that point + another one.

so I want it to translate to something like that, but by using the local in the estimation commands.

Code:

local vars mpg weight trunk

reg price mpg
reg price mpg weight
reg price mpg weight trunk

↧

correlated, ar1, or psar1?

June 19, 2016, 1:56 pm

≫ Next: Ordered Probit: Interpret, test and compare coefficients

≪ Previous: local (macro) question

Hello,

I am working on panel data in Stata 14.1.
I have run the following tests:
lrtest = p<0.000 indicating heteroscedasticity
xtserial = p<0.000 indicating serial correlation

Based on these test results, I am using the following option:
xtgls y $xlist, panels(correlated)
How can I test whether I should be opting of one of the following over my current specification:
xtgls y $xlist, panels(heteroskedastic) corr(ar1)
xtgls y $xlist, panels(heteroskedastic) corr(psar1)

I get some strange results on one of my control vars when I ran the last 2 FGLS regressions.

Thanks, Robert

↧

Ordered Probit: Interpret, test and compare coefficients

June 19, 2016, 3:43 pm

≫ Next: Probit with first differences: modeling and interpretation

≪ Previous: correlated, ar1, or psar1?

Dear colleagues,

I estimate the following model:
oprobit swb ib2.marital_status if syear==2005, vce(cluster pid)
Coding:
Dependent variable - swb: discrete values from 0 - 10.
Predictor - marital status: categorical variable 1 married, 2 single, 3 widowed, 4 divorced, 5 separated.

Code:

oprobit swb ib2.marital_status if    syear==2005, vce(cluster pid)

Iteration 0:   log pseudolikelihood    = -10375.864  
Iteration 1:   log pseudolikelihood    = -10347.239  
Iteration 2:   log pseudolikelihood    = -10347.239  

Ordered probit regression    Number of obs    =       5519
    Wald chi2(4)    =      51.92
    Prob > chi2    =     0.0000
Log pseudolikelihood = -10347.239    Pseudo R2    =     0.0028

    (Std. Err. adjusted for    5519 clusters in pid)
        
    Robust
swb       Coef.    Std. Err.      z    P>z    [95% Conf. Interval]
        
marital_status
[1] Married        1     .0758386    .0326055     2.33   0.020    .0119331    .1397441
[3] Widowed        3    -.0873954    .1555698    -0.56   0.574    -.3923067    .2175159
[4] Divorced       4     -.217214    .0585252    -3.71   0.000    -.3319213   -.1025066
[5] Separated      5    -.3878464    .0974401    -3.98   0.000    -.5788255   -.1968673

I would like to avoid evaluating (average) marginal effects (at means) for all 10 values of swb.

Question 1) Does the sign of the coefficients give the direction of the effects?
Is it possible to make a statement such as "compared to being single, being married increases the probability of scoring high in SWB" or "is associated with higher SWB scores than being single"?

Question 2) How can I test whether the effects on SWB are equal?
E.g., is the command "test _b[1.marital_status] = _b[5.marital_status]" valid to make statements?

Thank you very much in advance!

Kind regards,
Mischa

↧

Probit with first differences: modeling and interpretation

June 19, 2016, 4:44 pm

≫ Next: A Latent Variable Multilevel Model (Croon and Veldhoven 2007)

≪ Previous: Ordered Probit: Interpret, test and compare coefficients

Dear colleagues,

Starting situation:

- Balanced sample on individual level for two survey years.
- Variable "Happiness" coded from 0 - 10.
- Other variables: Age (continuous), marital_status (categorical: Single, Married, Widowed, Divorced, Separated), I would like to know whether the ΔHappiness between two points in time is related toΔage, Δmarital status etc.

Proceeding: Define ΔHappiness (dependent variable):
Model (I): ΔHappiness = 1 if positive change in Happiness, 0 otherwise.
Model (II): ΔHappiness = 1 if negative change in Happiness, 0 otherwise.
This (1/0) coding is a demand of my supervisor.

probit ΔHappiness Δage Δmarital_status

Question 1)
Δage is equal for all individuals.
Does the coefficient for Δage simply reflect the constant of the model? Does the coefficient of Δage have any explanatory power?

Question 2)
Variable "marital_status" has 5 categories => There are 20 (I guess) combinations for changes between two points in time: Single -> Married, Single -> Divorced etc.
Too many to include all combinations into the model.

Supposed I would generate the variable:
single_to_married = 1 if single_year1==1 & married_year2==1
(single_year1 reflects if individual was single in year 1, married_year 2 analogously defined)
other_changes = 1 if marital_status_year1 != marital_status_year2 & single_to_married != 1
(marital_status_year1/2 reflect marital_status of ind. in year 1/2)

and my model would be:
ΔHappiness = beta1 * Δage + beta2 * single_to_married + beta3 * other_changes ..

Would this model be valid?
Is there a sensible alternative to including some combinations of marital statuses?
Would I be able to interpret this model as
"Individuals being single in year1 and married in year2 have a higher probability of experiencing a positive (negative if model II) change in happiness"?

Thanks so much in advance!

Kind regards,
Mischa

↧

A Latent Variable Multilevel Model (Croon and Veldhoven 2007)

June 20, 2016, 8:37 am

≫ Next: 3SLS and instrumental variables

≪ Previous: Probit with first differences: modeling and interpretation

Hi Everyone!

I want to use Croon and Veldhoven's (2007) latent variable multilevel model (Predicting Group-Level Outcome Variables from Variables Measured at the Individual Level: A Latent Variable Multilevel Model (2007). I am looking for Stata code for getting 'adjusted group means'. I would appreciate your help on this.

Thanks,

Moeen

↧

3SLS and instrumental variables

June 20, 2016, 9:30 am

≫ Next: Interting Fixed Effects Model Question

≪ Previous: A Latent Variable Multilevel Model (Croon and Veldhoven 2007)

Hello everyone,
before to start i would like to thank you all for this website. It is very well done and it is helping me a lot with my thesis. As is coming the deadline for it , i have some last doubts to discern.
My aim is to measure the impact of broadband on gdp. Unfortunately, this study is heavily affected by endogeneity, so i decided to follow a model used in a paper by Koutroumpis 2009 which control for it. In order to do that, they use a 3SLS model and a set of four equations (you can find it in the document attached with the link for the paper). I am not quite familiar with this econometric model as it was not part of my course of econometric but i tried to implement it by following your threads. Luckily, it worked out but now i am not quite sure about its validity (Btw you can find both the outputs for the single equations and for the 3sls are shown in the document attached) . Firstly, i was wondering if the fact that i got very insignificant results in the third and fourth equation can invalidate the 3SLS. Can i still use it even with such insignificant outcomes? Besides,i just set exogenous and endogenous variables for the 3SLS but i read online that it is also possible to set instrumental variables in the regression but i haven't understood in what they differ from endogenous variables and when is the case to use them. Finally, the econometric support at my university said that i cannot use the fixed effect with the 3SLS model but i saw in the paper that they do that. So is my supervisor wrong?
Tell me if something is not clear, if i have to add something or if i shall modify the thread somehow (it is my first one).
Thanks in advance for the help and availability!

↧

Interting Fixed Effects Model Question

June 20, 2016, 9:30 am

≫ Next: Replacing missing values by existing values

≪ Previous: 3SLS and instrumental variables

^^should say interacting in title
Hello,

So I am trying to run a fixed effects model regression consisting of panal data from 3 different decades, using xi and xtreg commands. Data is in long form, and I am looking at the census tract level as my unit of observation. So far I have been using county level effects interacted with time to control for broader level time varying trends, but am wanting to include tract level, time varying effects instead (interacting the tract as a categorical variable with time as a categorical variable). But, I have something like 13,000 tracts (observations) and when I try to run the regression, it says I don't have enough space, maxvariables not high enough etc. I even upgraded to STATA SE to try to handle it, but it still says I can't do it.
Code:

xi: quietly xtreg av trctpop shrblk shrhsp child shrfor ffh hsdrop ///
unemprt povrat welfare avin area propocc propsoon ///
housden popdens hsdropper msdropper nocollper somcollper collegeper perattach ///
permobile f0_2bdr f3_4bdr mvd_5 f5_yr_blt f10_yr_blt i.county i.geo2010*i.year i.aoc*i.year if year < 9, fe vce(robust)

"characteristic contents too long
The maximum value of the contents is 67,784.
characteristic contents too long
The maximum value of the contents is 67,784.
i.year _Iyear_7-9 (naturally coded; _Iyear_7 omitted)
characteristic contents too long
The maximum value of the contents is 67,784."

I feel like there should be a way to do this, maybe not actually generating the variables or something like that, any advice??

Thanks,
Julian

↧

Replacing missing values by existing values

June 20, 2016, 9:40 am

≫ Next: Granger Causality with panel data, using gcause

≪ Previous: Interting Fixed Effects Model Question

I have a dataset that looks like the following:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(Individual Loan)
1 23
1 23
1  .
1  .
1 23
1  .
2  .
2 13
2 13
2 13
2  .
2  .
end

Here, the variable Individual denotes an individual, and the loan denotes the loan the individual made. The loan is invariant within an individual (constant over different observations). However, there are quite a few missing observations. How would I go about populating the loan value by individuals depending on the existing value? The issue is that the existing values are not systematic, i.e. it is not the case that each individual has the existing value as the 1st observation, otherwise I could replace the missing values by the value in the first observation. Existing values exist at random observation points throughout the dataset. Any help is greatly appreciated!

↧

Granger Causality with panel data, using gcause

June 20, 2016, 10:04 am

≫ Next: bootstrap SE for SEM latent growth mediation model

≪ Previous: Replacing missing values by existing values

Hey,

I am traying to test Granger causality with panel, I installed gcause in order to do so but if using normal:

gcause var1 var2, lag (2) regress

I keep getting error "option / required"

The only way it works is when using gcause2, which allows only estimation for a panel unit at a time...

Any thoughts ?

Thx,
Kristina

↧

bootstrap SE for SEM latent growth mediation model

June 20, 2016, 11:02 am

≫ Next: Drop duplicate observations by another variable

≪ Previous: Granger Causality with panel data, using gcause

I have created an SEM latent growth mediation model using FIML (Stata 14). I need to get bootstrap standard errors (SE) and confidence intervals for various path in these models. As far as I can tell, using the "estat teffects” Stata command after the SEM model only gives me the Sobel SE. I need bootstrap SE and confidence intervals, as this is the publication standard in most journals these days.

If you have suggestions or syntax for obtaining the bootstrap SE and confidence intervals for the direct, indirect, and total path for an SEM mediation model in Stata, I would be very grateful.

Thank you!

In case it is helpful, here is the syntax for the SEM model.

sem (Outcome1<-I@1 S@0 _cons@0) ///
(Outcome2<-I@1 S@1 _cons@0) ///
(Outcome4<-I@1 S@2 _cons@0) ///
(Outcome6<-I@1 S@3 _cons@0) ///
(Mediator1 <- IV1 control1 control2 _cons) ///
(Mediator1 <- IV2 control1 control2 _cons) ///
(Mediator2 <- IV1 control1 control2 _cons) ///
(Mediator2 <- IV2 control1 control2 _cons) ///
(Mediator3 <- IV1 control1 control2 _cons) ///
(Mediator3 <- IV2 control1 control2 _cons) ///
(I <- IV1 IV2 control1 control2 _cons) ///
(S <- IV1 IV2 control1 control2 _cons), ///
latent(I S) ///
var(e.Outcome1@var e.Outcome2@var e.Outcome4@var e.Outcome6@var) ///
method(mlmv) vce(cluster schoolvar)

estat teffects

↧

Drop duplicate observations by another variable

June 20, 2016, 11:05 am

≫ Next: Updated version of -itsa- available on SSC

≪ Previous: bootstrap SE for SEM latent growth mediation model

Hi There, as indicated in the title I have a data set like this:
ID var
1 1001 k1
2 1001 k1
3 1001 k2
4 1001 k3
5 1002 k1
6 1002 k2
7 1002 k2
8 1002 k2
9 1003 k3
10 1003 k3
...
...
I hope to delete duplicate values in var by ID so it will look like this:
ID var
1 1001 k1
2 1001 k2
3 1001 k3
4 1002 k1
5 1002 k2
6 1003 k3
...
...

I use Stata 14.1 and have checked some information but have not found an appropriate answer. The one I think could be correct is this:
duplicates drop var, by(ID)
But Stata shows that "option by() not allowed".
So I am wondering if anyone knows a magic command to deal with this problem?
Thanks!

↧

Updated version of -itsa- available on SSC

June 20, 2016, 12:47 pm

≫ Next: I have a 2 id panel data set and I want to fill down/expand observations with respect to a time variable.

≪ Previous: Drop duplicate observations by another variable

Thanks to Kit Baum, a revised version of itsa is now available on SSC.

This new version allows users to specify additional "two_way" graphing options. Additionally, a bug was fixed that impacted how the posttrend option was calculated when there were multiple intervention periods.

itsa performs interrupted time series analysis for single and multiple groups

itsa estimates the effect of an intervention when the outcome variable is ordered as a time series, and a number of observations are available in both pre- and post-intervention periods. The study design is generally referred to as an interrupted time series because the intervention is expected to "interrupt" the level and/or trend subsequent to its introduction. itsa is a wrapper program for, by default, newey, which produces Newey-West standard errors for coefficients estimated by OLS regression, or optionally prais, which uses the generalized least-squares method to estimate the parameters in a linear regression model in which the errors are assumed to follow a first-order autoregressive process. itsa estimates treatment effects for either a single treatment group (with pre- and post-intervention observations) or a multiple-group comparison (i.e., the single treatment group is compared with one or more control groups). Additionally, itsa can estimate treatment effects for multiple treatment periods.

As always, please contact me if you find any bugs in the program.

Ariel

↧

I have a 2 id panel data set and I want to fill down/expand observations with respect to a time variable.

June 20, 2016, 2:56 pm

≫ Next: Nardl

≪ Previous: Updated version of -itsa- available on SSC

Hello everyone,

I encounter a problem and couldn’t find the solution from the past threads, so I decided to post a new thread seeking for advice.

I have a panel data set regarding the holding information of an institutional investor at a certain time point. The sample period spans from 2002Q1 to 2007Q2. For each stock held by investor i, I want to fill in the time gaps. Secondly, if the last period of a stock is not 2007Q2, then I want to expand one extra period for that stock held by investor i. The variables used are: manager number (mgrno), CUSIP, date, and shares (shares held at time t).

For example:

mgrno	cusip	date	shares
110	00184A10	2002m3	49825
110	00184A10	2002m6	56325
110	00184A10	2002m12	56625
110	00184A10	2003m3	56625
110	00206R10	2005m12	28111
110	00206R10	2006m3	27711
110	00206R10	2006m12	17691
110	00206R10	2007m3	23423
500	26101810	2003m6	158060
500	26101810	2003m9	57760
500	26101810	2003m12	18710
500	26101810	2004m3	18310
500	26101810	2004m6	21210
500	26157010	2007m3	3700
500	26157010	2007m6	3700

For stock 00184A10 held by investor 110, the holding period begins from 2002m3 to 2003m3. I want to fill the time gap between 2002m6 and 2002m12, which is 2002m9. Also, I want to add an extra period after 2003m3, since it doesn’t meet the limitation of the sample period.

The expected result (partial) will be:

mgrno	cusip	date	shares
110	00184A10	2002m3	49825
110	00184A10	2002m6	56325
110	00184A10	2002m9	0
110	00184A10	2002m12	56625
110	00184A10	2003m3	56625
110	00184A10	2003m6	0

The second example:

500	26157010	2007m3	3700
500	26157010	2007m6	3700

Since there is no time gap between the 2 observations and the last period is 2007m6, there is no need to do anything to this stock held by investor 500.

I have tried the tsfill command but I couldn’t define the dataset as a penal dataset. The reason is that at time t, stock x can be held by numerous investors. There are several observations for a certain stock at time t. It is required that mgrno and cusip are combined to generate a composite categorical variable in order to uniquely identify an observation. I also tried the command: egen both = group(mgrno cusip), label. However, there are too many observations in my dataset (13,148,727 observations), so the software couldn’t generate the result I want. I have already searched for potential materials for a while, but still didn’t find useful resources perhaps due to my capability. I hope someone can generously offer some suggestions to my problem. Thank you.

Kind regards,

Chihhao

References
1. tsfill
2. composite categorical variables

↧

Nardl

June 20, 2016, 3:21 pm

≫ Next: Gravity Model of Trade with Unbalanced Panel using fixed effects - help!

≪ Previous: I have a 2 id panel data set and I want to fill down/expand observations with respect to a time variable.

Hi everybody,
Does anyone know how to decide what constraints to implement in a NARDL model. In general, can anybody tell how NARDL should be implemented on stata step by step with a plain English? I am currently working on nonlinearity between oil price changes, financial development and economic growth (where economic growth is the dependent variable). Thank you.
Muhammed Benli

↧

Gravity Model of Trade with Unbalanced Panel using fixed effects - help!

June 20, 2016, 3:23 pm

≫ Next: Interpreting contradicting Dickey-Fuller results

≪ Previous: Nardl

Hi,

I am estimating a gravity model using an unbalanced country-pair panel data set, looking at European countries from 1948-2006, with about 48,000 observations. The dataset is the one used in Head et al (2010) “The erosion of colonial trade linkages after independence”. Anyway, I am looking at the EU membership effect, i.e the gains in trade due to EU membership. I am attempting to run the following regression,

xi: reg lnexports lndist GDP_exp GDP_imp lang intra_EU EUtoROW landlock RTA impyear_* expyear_* year_*, robust cluster(distw)

* Where impyear_* are tabulated, time-varying importer fixed effects. And, expyear_* are tabulated, time-varying exporter fixed effects.

I get the error that the mat size is too small.

When I run the following regression, I don't receive the above error. But, I don't know if it makes sense to run this regression, given that it is panel data that I am dealing with.

xi: reg lnexports lndist GDP_exp GDP_imp lang intra_EU EUtoROW landlock RTA imp_* exp_* year_*, robust cluster(distw)

This is the same as above, but includes importer and exporter fixed effects that are not time-varying.

I also ran the same regression using country-pair fixed effects, but ran into a "no room to add more variables" error. This is the following regression.

xi: reg lnexports lndist GDP_exp GDP_imp lang intra_EU EUtoROW landlock RTA countrypair_* year_*, robust cluster(distw)

My question is the following.

Is there any other method of estimating my equation? I looked at reghdfe and reg2hdfe commands but I am a bit confused on how to use them.

Any comments, questions or advice would be very much appreciated.

Thanks, Amran

↧

Interpreting contradicting Dickey-Fuller results

June 20, 2016, 5:16 pm

≫ Next: Appending dataset where value labels differ across variables

≪ Previous: Gravity Model of Trade with Unbalanced Panel using fixed effects - help!

I am getting some contradicting results for an ADF unit root test (attached), with two rejecting H0 and two not. How should I interpret these results?
Is it also possible in Stata to get the actual estimations of the predicted coefficient (gamma) and the Dickey-Fuller tau statistic? Array

↧

Appending dataset where value labels differ across variables

June 20, 2016, 5:50 pm

≫ Next: duplicates

≪ Previous: Interpreting contradicting Dickey-Fuller results

I am attempting to create a longitudinal dataset from multiple survey files. My problem is for some of the variables the value labels are not the same across files, and so if I append only the value labels of the master file is maintained. I am not sure this is clear so let me illustrate with an example.

Say we have two files, file1 and file2, with same variable named var1 but different value labels as follows:

File1 File2

var1 var1
1 "2006" 1 "2004"
2 "2007" 2 "2005"
3 "2008" 3 "2009"

So, I have same variable but different value labels. If I append one of the value labels is dropped. Is there a way to convert the value labels to just labels? Say convert 1 "2006" and 1 "2004" to just 2006 and 2004 respectively.

Thanks in advance

↧

duplicates

June 20, 2016, 6:23 pm

≫ Next: not able to estadd multiple stats for esttab

≪ Previous: Appending dataset where value labels differ across variables

Array Array Hello
My data has some duplicates. Below you can see the code and the output for duplicates.

My questions is that I want to list the duplicates. How can I do that? thanks

duplicates tag TICKER Year_Plus, g(duplicate)
tab duplicate

↧