Regression Results with "If" Statement

April 5, 2020, 4:25 pm

≫ Next: IV Fixed Effects Model combined with Heckit

≪ Previous: How to include Fixed Effects in a Diff-in-diff specification?

I am trying to run a simple regression for a school assignment. The question asks me to run a regression and restrict my estimation to var == 1.

Initially, I wrote the following code:

Code:

keep if var == 1
reg y x

and then ran my regressions.

But, my classmate answered the question with a differently. Rather than dropping observations where var!=1, he wrote

Code:

reg y x if var == 1

We get different results for our regressions.

Why do we get different results?

↧

IV Fixed Effects Model combined with Heckit

April 5, 2020, 6:13 pm

≫ Next: Predictions after mlogit

≪ Previous: Regression Results with "If" Statement

Dear Statalist,

Following this discussion:

https://www.statalist.org/forums/for...quations-model

I want to know if it is possible to estimate an IV Fixed Effects Model combined with a Heckit. Is there any command to do this?

Thanks a lot!

↧

Predictions after mlogit

April 5, 2020, 6:37 pm

≫ Next: how to solve those questions about difference in difference

≪ Previous: IV Fixed Effects Model combined with Heckit

The margins command calculates predicted probabilities for each covariate across groups (outcome). For example, if you estimate a multinomial model with 3 choices the sum of the predicted probabilities for each covariate will equal one across outcome groups. This is equivalent to running crosstabs with the row option. However, is there an equivalent in margins of running crosstabs with the column option? In other words, is there an option for margins that calculates predicted probabilities that sum to one for each covariate within, as opposed to across, groups.

↧

how to solve those questions about difference in difference

April 5, 2020, 7:15 pm

≫ Next: Dropping leading zero observations within several variables

≪ Previous: Predictions after mlogit

This question investigates whether restricting youth access to alcohol has impacts on motor vehicle death rates for young people. We restrict attention on death rates of those 18-20 (the age group impacted MLDA). The key variable is `legal1820`, indicating the fraction of 18-20 year olds in a state that can buy alcohol legal. This will be 1 if the MLDA is 18, and 0 if it is 21 for an entire year. For states that changed mid-way through the year, the variable is scaled. Many States had MLDA ages between this range. We exploit the over-time, within-state variation in an difference-in-difference design.

## Difference in Difference

Since the data is a panel on states that vary the drinking age limit, a difference-in-differences strategy to estimate the effect on drinking age limits on death rates seems natural here.

Long data is great for figures, but doesn't always work for tables or regressions. Thus, I covert it to wide form here. This creates separate variables for each of the death causes. The main dependent variable with be `MVA`, deaths from moter vehicals.
```{r}
df <- df %>%
ungroup() %>%
pivot_wider(names_from = dtype, id_cols = c(state, year,pop, legal1820, legal, beertaxa, beerpercap, winepercap, spiritpercap, totpercap), values_from = mrate) %>%
rename(other_external = `other external`) %>%
group_by(state) %>%
mutate(treat = ifelse(first(legal1820) != legal1820[year == 1979],1,0))

```

1. I have created a variable called `treat` that is equal to 1 for states that responded to the 1971 constitution change and 0 otherwise. Add two more variables: `post` if the year is `>=1975` and an interaction between the variable `post` and `treat`.

```{r}
gen treat = 1 if(year=1971)
gen post if(year >= 1975)
gen interaction = post*treat
```
(I do not have idea about this)

2. Run a simple difference in difference regression where the dependent variable is `MVA` and the right hand side has `post`, `treat`, and the interaction term you created above. Interpret your result: do states that lower their drinking age have more motor vehicle deaths?

```{r}

```

3. The simple difference in difference above doesn't use all the information available. Instead of putting a dummy variable `treat`, we could include state fixed effects. Likewise, instead of a dummy variable `post` we could include year fixed effects. The variable `legal1820` varies across states and over time, so it will more efficiently use the data compared to a post-treatment dummy. Using the data frame `df` add two new variables. A `factor` variable called year using `year = as.factor(year)` inside mutate and similarly for state, `state = as.factor(state)`. These can now be added easily to a regression as categorical variables. Run a difference-in-difference regression of `MVA` on `legal1820` and state and year fixed effects. Save this as `mod1`

3. Repeat the above regression, but weight it by the variable `pop` using the weight option: `lm( y ~ x, data = df, weight = pop)`. Save this as `mod2`

4. Repeat your above two regressions (with and without weights) usign the control variables `beertaxa`, `beerpercap`, `winepercap`, `spiritpercap`, `totpercap`. Save these as `mod3` and `mod4`.

5. Output your regression results using `stargazer`, but only keep the variable `legal1820` using the `stargazer` option `keep`. Interpret the output from your table.

6. Repeat the above steps, but use the dependent variable `internal`. This is death from internal causes, and thought to be unrelated to alcohol consumption. Thus, it serves as a **falsification** test. We should not find that drinking laws are correlated to internal death cause rates.

if you can give me some advices and I would appreciate it.

↧

Dropping leading zero observations within several variables

April 5, 2020, 9:55 pm

≫ Next: covid sandbox

≪ Previous: how to solve those questions about difference in difference

Dear Statalisters,

I need to drop the leading zeros of lots of separate variables. I have data like this:

days	a	b	c	d	e
1	3	0	0	0	0
2	1	0	0	0	0
3	0	0	0	0	0
4	0	0	0	0	0
5	1	0	0	0	0
6	1	0	0	0	1
7	0	0	0	0	0
8	3	0	0	1	1
9	2	0	1	0	0
10	1	0	0	0	0
11	6	0	2	1	0
12	6	2	0	1	0
13	9	0	2	0	1
14	4	0	0	0	0
15	5	0	0	0	1
16	5	0	0	0	0
17	4	0	0	0	0
18	0	0	0	0	0

And I would like to have data like this:

days	a	b	c	d	e
1	3	2	1	1	1
2	1	0	0	0	0
3	0	0	2	0	1
4	0	0	0	1	0
5	1	0	2	1	0
6	1	0	0	0	0
7	0	0	0	0	0
8	3		0	0	1
9	2		0	0	0
10	1		0	0	1
11	6			0	0
12	6				0
13	9				0
14	4
15	5
16	5
17	4
18	0

Please help!

↧

covid sandbox

April 6, 2020, 12:43 am

≫ Next: Extracting year from Date variable

≪ Previous: Dropping leading zero observations within several variables

Being locked up at home due to the COVID-19 crisis, I developed an interest in models for the spread of a disease. The way I like to learn these things is to play with them. So, I created a little Agent Based Model in Stata (a lot of it in Mata). The first (very crude) version is done, and it is available here:

https://github.com/maartenteaches/covid-sandbox

The way I work with Agent Based Models is I get a very basic version running, then I can experiment with it and gradually make it more complex. This is at the "got the basic model running" stage. To emphasize the obvious, this model is for playing and learning, not for making life or death decisions.

Still I got some interesting results from this very basic model. Or at least I found them interesting, I suspect most professionals will find them self-evident. I made a video on those results, which is available here: https://youtu.be/cV6xKMjiwFE and a video explaining the code behind the simulation: https://youtu.be/i6-U0sl78-Q

Anyone who whishes to join me in playing and learning is welcome to fork the model. If you found some interesting experiments and/or created some interesting changes, then I would love to hear about them by a pull request to this repository.

↧

Extracting year from Date variable

April 6, 2020, 1:13 am

≫ Next: Interpreting Depend Variable in Percent (i.e. Growth Rate)

≪ Previous: covid sandbox

Hi all,

I want extract year from a date variable. No sure how to do this. Appreciate your assistance in advance.

Describe date

storage display value
variable name type format label variable label
------------------------------------------------------------------------------
date long %10.0g DATE_Birth

Date in the stata file look like this.

date
26112007
5072007
4032008
7032008
7042008
30012008
25072007
19122007
14092007
10122007
7112007
8112007
20122007

Thanks,

↧

Interpreting Depend Variable in Percent (i.e. Growth Rate)

April 6, 2020, 2:36 am

≫ Next: MGARCH-DCC model with "could not calculate numerical derivatives -- discontinuous region with missing values encountered r(430)".

≪ Previous: Extracting year from Date variable

Dear Stat List,

I know this question has been raised before in various forms, however, I haven't been able to find an encompassing answer that speaks to all parts of the problem.

I have a regression for which the dependent variable is a growth rate and measured in percent. I calculated it as follows: (ln(Y_t) - ln(Y_t-1)) * 100.
You could also derive it like this: ((Y_t - Y_t-1) / Y_t-1) * 100
Both versions correlate at 0.99 and yield nearly the same regression output. My variable is the growth rate of an index. However, you could also imagine the growth of GDP in percent or the sales growth of a firm (it's the general concept I'm interested in).
(Side note: "percent" here really means "growth rate" or "return", not "percentage share" (e.g. expenditure share on GDP) which also gets asked a lot in this context).

I run a OLS panel regression with this growth rate as the dependent variable, here called IndexGrowth. My right hand side variable of interest is Temperature (which is not transformed). Here is the output:

Code:

. reghdfe IndexGrowth Temperature, a(Country_ID Region_ID##Month_Year) cl(Country_ID)
(dropped 49 singleton observations)
(MWFE estimator converged in 9 iterations)

HDFE Linear regression                            Number of obs   =      9,963
Absorbing 2 HDFE groups                           F(   1,     53) =       8.94
Statistics robust to heteroskedasticity           Prob > F        =     0.0042
                                                  R-squared       =     0.5220
                                                  Adj R-squared   =     0.4373
                                                  Within R-sq.    =     0.0007
Number of clusters (Country_ID) =         54      Root MSE        =     2.9610

                            (Std. Err. adjusted for 54 clusters in Country_ID)
------------------------------------------------------------------------------
             |               Robust
 IndexGrowth |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
 Temperature |  -.2352437   .0786779    -2.99   0.004    -.3930516   -.0774357
       _cons |   .7727861    .032899    23.49   0.000     .7067991    .8387731
------------------------------------------------------------------------------

Absorbed degrees of freedom:
----------------------------------------------------------------+
            Absorbed FE | Categories  - Redundant  = Num. Coefs |
------------------------+---------------------------------------|
             Country_ID |        54          54           0    *|
   Region_ID#Month_Year |      1446           0        1446     |
----------------------------------------------------------------+
* = FE nested within cluster; treated as redundant for DoF computation

.
end of do-file

My question concerns the interpretation of the temperature coefficient. Does it imply that a 1 unit increase in temperature leads to ...
a) a 0.23% decrease of the index level
b) a 0.23% decrease of the index growth rate
c) a 0.23%-point decrease of the index level
d) a 0.23%-point decrease of the index growth rate?

I apologize for this basic question, but l looked at many published articles with growth rates as dependent variables and you find all of these interpretations circulating in the literature. Econometrics textbook cover the "linear-linear", "log-linear", "linear-log" and "log-log" cases to great extends, but percentage variables not as often (or don't say explicitly to which category percentage variables belong).

Therefore, I am grateful for your help!

PS: I am aware that this is not strictly a Stata question, but with so much know-how in econometrics here, I am sure this question can be answered

↧

MGARCH-DCC model with "could not calculate numerical derivatives -- discontinuous region with missing values encountered r(430)".

April 6, 2020, 2:48 am

≫ Next: STATA packages for population-adjusted indirect comparisons (MAIC and STC)

≪ Previous: Interpreting Depend Variable in Percent (i.e. Growth Rate)

Hi,
I have an issue on the same error: "could not calculate numerical derivatives -- discontinuous region with missing values encountered r(430)".
I am trying to estimate the conditional variances and conditional correlations with the MGARCH-DCC model. My data sample is composed by banks' returns with daily frequency.
I run the following commands:

mgarch dcc (banki market =, arch(1) garch(1))
predict var*, variance
predict corr*, correlation
predict res*, residuals

However, the model worked for half of my sample, the other half showed me the same error r(430). I tried to do some basic statistics and the data I have are similar to the ones where the model worked. Do you have any idea?

As solutions I already tried:

1) to increase the lag of ARCH, GARCH , and both.
2) to change the distribution from Gaussian to T-student
3) to increase the interaction such as mgarch dcc (banki market =, arch(1) garch(1)), iterate(500).

It worked for a few banks but I still have some problem with the others. I would like to understand what the problem could be.

Thanks in advance,
AV

↧

STATA packages for population-adjusted indirect comparisons (MAIC and STC)

April 6, 2020, 3:11 am

≫ Next: Independent T-test for two samples

≪ Previous: MGARCH-DCC model with "could not calculate numerical derivatives -- discontinuous region with missing values encountered r(430)".

Anyone with experience in conducting population-adjusted indirect comparisons (MAIC and STC) using STATA? I didn't identify any existing packages for STATA. Could u please advise?

↧

Independent T-test for two samples

April 6, 2020, 3:12 am

≫ Next: IV Tobit Type-2 Command

≪ Previous: STATA packages for population-adjusted indirect comparisons (MAIC and STC)

Hello,

I am doing a bachelor project about empathy's class efficiency at school. To do so, we gave an "empathy course" to a class of 10 years old children.
We then gave to the children of this class a test to measure their level of empathy and compared those results to the ones of another class who did not have an empathy class.
I want to compare the means of those two classes but I am not sure about which test to use. I was thinking using the "Independent t-test for two samples" (the
two samples being : 1) the class who took the empathy classes and 2) the class who had not empathy class).

What do you think of using this test?

Thank you very much

Abi

↧

IV Tobit Type-2 Command

April 6, 2020, 4:28 am

≫ Next: Scaling a variable by another variable

≪ Previous: Independent T-test for two samples

Long time reader, first-time poster!

I am analysing a data set where my dependent variable (expenses claimed) is censored at 0 and thus wish to use a Tobit model to analyse my results. I am interested in testing both a univariate version (i.e. Tobit type-1 or one-step Tobit) and bivariate version (i.e. Tobit type-2 or double hurdle model). These are the current commands I am using:

Code:

* Tobit Type-1
tobit depvar varlist1 varlist2, ll(0)
* Tobit Type-2
probit depvar varlist1 varlist2
truncreg depvar varlist1 varlist2, ll(0)

However, this becomes complicated when I try to include an instrumented variable for varlist2, which I believe is endogenous. Whilst STATA appears to have a command for Tobity Type-1, I cannot find the equivalent for Tobit Type-2 in STATA or extensions (e.g. craggit)

Code:

* IV Tobit Type-1
ivtobit depvar varlist1 (varlist2 = varlistiv), ll(0)

Would it be valid to estimate my two-steps separately using the same IV?

Code:

* IV Tobit Type-2
ivprobit depvar varlist1 (varlist2 = varlistiv)
ivregress depvar varlist1 (varlist2 = varlistiv) if depvar>0

Many thanks!

Luca

↧

Scaling a variable by another variable

April 6, 2020, 4:34 am

≫ Next: Margins after Mixed

≪ Previous: IV Tobit Type-2 Command

Hi everyone,

I need to calculate a variable as the standard deviation of another variable scaled by yet another variable, requiring at least 3 years of data.
This is how I calculate the standard deviation:

egen cfv = sd(oibdp)

But then I don't know how to proceed to scale the variable as explained above. Can anyone help me?

Thank you in advance!

↧

Margins after Mixed

April 6, 2020, 4:34 am

≫ Next: Interaction effect outliers?

≪ Previous: Scaling a variable by another variable

Hey all,

I'm running a 2-level model using the - mixed - command.
Can I use the regular - margins - command afterwards, or are there any additional options that I should use in the case of a ,umultilevel model?

Thanks,
Eran

↧

Interaction effect outliers?

April 6, 2020, 4:40 am

≫ Next: clustering robust standard errors in one wave dataset

≪ Previous: Margins after Mixed

I am very new to STATA and doing research.

Something I do not quite understand. I am looking into an interaction effect between 2 variables. (UAI & Altman)
I had to winsorize the Altman variable (highonly) in order to get a realistic result (I created the variable Altman_high).

So as follows: I multiplied the UAI (which is on a scale of 1-120) and the WINSORIZED Altman variable (going from -3,85 to 14).
If I take the summary of the interaction variable (Interaction), I notice on first hand there are outliers.

Do I have to winsorize this interaction effect once again? Or can I just use it in my model?

Code:

. gen Interaction= UAI* Altman_high

. sum Altman_win Altman_high UAI Interaction

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
  Altman_win |        176    5.688867    10.86269  -3.853731   46.44995
 Altman_high |        176    3.734336    4.828677  -3.853731   14.24408
         UAI |        176    66.88636    22.04952          8         96
 Interaction |        176     243.902    299.8308  -327.5671   1310.455

↧

clustering robust standard errors in one wave dataset

April 6, 2020, 5:32 am

≫ Next: Update to -survsim- on SSC: Simulating time-to-event data from custom distributions, competing risk models and general multi-state models

≪ Previous: Interaction effect outliers?

Hello,

I have a one wave dataset of 158 firms, is it useful to cluster the standard errors? If so, I want to cluster them in industry. However, I have 35 industry dummies. I have tried to do it like the internet mentioned:

Code:

. regress abschangecarbonintensity firmsize profitability leverage age capitalintensity CAPEX KZindex elektrici
> tygenerator Carbonleakage industry10 industry11 industry13 industry16 industry17 industry19 industry20 indust
> ry21 industry22 industry23 industry24 industry25 industry28 industry29 industry30 industry35 industry42 indus
> try46 industry47 industry49 industry52 industry63 industry70 industry72 industry81 WestFlanders Hainaut Antwe
> rp Brussels FlemishBrabant Limbourg Liege Namur WalloonBrabant Luxembourg SME  publicfirm, robust cluster  in
> dustry10 industry11 industry13 industry16 industry17 industry19 industry20 industry21 industry22 industry23 i
> ndustry24 industry25 industry28 industry29 industry30 industry35 industry42 industry46 industry47 industry49 
> industry52 industry63 industry70 industry72 industry81
option cluster incorrectly specified
r(198);

but I get an error. Do you have any advice?

Kind regards,
Timea De Wispelaere

↧

Update to -survsim- on SSC: Simulating time-to-event data from custom distributions, competing risk models and general multi-state models

April 6, 2020, 5:37 am

≫ Next: Obtaining pooled OLS estimates

≪ Previous: clustering robust standard errors in one wave dataset

Thanks to Kit Baum, the -survsim- package has been updated on SSC. This is a complete re-write, with many new features.

-survsim- simulates survival data from a parametric distribution, a user-defined distribution, a cause-specific hazards competing risks model, a general multi-state model, or from an estimated -merlin- model. Baseline covariates and time-dependent effects can be specified when defining a data-generating model. Delayed entry/left truncation is allowed.

For more details and lots of examples, a pre-print is available hear:
www.mjcrowther.co.uk/publication/survsim

Thanks,
Michael

↧

Obtaining pooled OLS estimates

April 6, 2020, 6:51 am

≫ Next: conditional pricing model using GMM

≪ Previous: Update to -survsim- on SSC: Simulating time-to-event data from custom distributions, competing risk models and general multi-state models

I'm trying to compare Oaxaca-Blinder decomposition results to OLS estimates regarding the male wage gap between work-limited disabled (DISTYPE =1) and non-disabled (DISTYPE = 3) (to note: DISTYPE is a categorical not a dummy variable).

I want to obtain results such that I have pooled, quarter 1 and quarter 5 OLS estimates for both DISTYPE = 1 and DISTYPE = 4. My dataset is 5 quarter and longitudinal, with variables ending in e.g 5 to represent they were a quarter 5 variable. Thus I had to 'reshape long'.

I have ran quarter 1 and quarter 5 estimates (shown below only for DISTYPE = 1):

Code:

 regress logGRSSWK WHITE i.AGE i.RESIDENCE i.INDUSTRY i.EDUCATION i.WORKREGION
i.JOBTENURE if DISTYPE == 1 & quarter == 1

regress logGRSSWK WHITE i.AGE i.RESIDENCE i.INDUSTRY i.EDUCATION
i.WORKREGION i.JOBTENURE if DISTYPE == 1 & quarter == 5

However, I am struggling to understand what 'pooled' would relate to here? This may be a foolish question but any help would be greatly appreciated

↧

conditional pricing model using GMM

April 6, 2020, 6:59 am

≫ Next: Help with converting string to numeric variables

≪ Previous: Obtaining pooled OLS estimates

Hi everyone!
I have a question about GMM command of Stata
Following Choi,Hiriki, Takezawa(1998), I utilize the conditional pricing model and pricing error and innovation.
Three equation is as follow:
1.Conditional model: E(R_i,tlomega_t-1)=lamda₀(omega_t-1)+gamma_LMcov(R_i,t,R_LMtㅣomega_t-1)+gamma_W_Mcov(R_i,t,R_W_Mtㅣomega_t-1)+gamma_FXcov(R_i,t,R_FX_tㅣomega_t-1)
2.pricing error: u_t=-Z_t-1r₀+Z_t-1r_LMR_LM,t+Z_t-1r_W_MR_W_M,t+Z_t-1r_FXR_FX_,t
3.innovation: h_i,t=R_i,t-R_i,tu_t

,where Z is instumental variables. The pricing error should be zero and the expected value of innovations is zero. Choi,Hiriki, Takezawa(1998) used GMM method to the set of equations in pricing error and innovation.So they obtained estimates of r₀,r_LM,r_W_M,r_FX. I want to analyze conditional model following them.

I use Hansen(1982)’s GMM method aplying pricing kernel.
So the STATA code I executed is as follows.
[CODE]
.gmm(eq1:rit_a-{a0}-{a1}*cov1-{a2}*cov2-{a3}*cov3)
(eq2:rit_a-{a0}-{a1}*cov1-{a2}*cov2-{a3}*cov3), instruments(eq1:lag1 lag2 lag3 Jandum) instruments(eq2:lag1) winitial(identity)
warning: 359978 missing values returned for equation 1 at initial values
warning: 359978 missing values returned for equation 2 at initial values

Step 1
Iteration 0: GMM criterion Q(b) = 1.9949011
Iteration 1: GMM criterion Q(b) = 1.537e-06
Iteration 2: GMM criterion Q(b) = 1.537e-06

Step 2
Iteration 0: GMM criterion Q(b) = 8.837e-07
Iteration 1: GMM criterion Q(b) = 7.546e-07
Iteration 2: GMM criterion Q(b) = 7.546e-07

GMM estimation

Number of parameters = 4
Number of moments = 7
Initial weight matrix: Identity Number of obs = 575207
GMM weight matrix: Robust

------------------------------------------------------------------------------
| Robust
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
/a0 | -.4532532 .0876545 -5.17 0.000 -.6250529 -.2814535
/a1 | -.0992952 .0192035 -5.17 0.000 -.1369334 -.061657
/a2 | .0293273 .005053 5.80 0.000 .0194235 .039231
/a3 | -64.23148 9.697605 -6.62 0.000 -83.23844 -45.22452
------------------------------------------------------------------------------
Instruments for equation 1: lag1 lag2 lag3 Jandum _cons
Instruments for equation 2: lag1 _cons

. estat overid

Test of overidentifying restriction:

Hansen's J chi2(3) = .434048 (p = 0.9331)

But I'm not sure if this is done right.
I want to analyze all the above three equations.
Any help is highly appreciated.
Thank you for your time.

↧

Help with converting string to numeric variables

April 6, 2020, 8:15 am

≫ Next: Competing risk survival analysis

≪ Previous: conditional pricing model using GMM

I have a dataset on infant and child ages, where age is stored as a string variable and by week,month or year. I need a way to convert all to age in weeks.

Here is a sample of the data:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str9 age
"5 weeks"  
"12 weeks"
"5 months"
"3 year"   
"7 week"   
"2 year"   
"19 months"
"2 months"
"3 months"
"6 year"   
"2 months"
"10 months"
"2 year"   
"3 year"   
"4 year"   
"6 week"   
"4 year"   
"5 year"   
"4 month"  
"8 month"  
end

I appreciate help with this problem. Hope everyone is doing well. Like many, I am working from home.

↧