Drop every id on a panel that meets certain variable value

May 19, 2018, 9:05 pm

≫ Next: Creating a 3-dimensional map in MDS

Hello, i have a panel like the example below, and i want to drop the entire id (and all its variables) if it has no event_time equal 1 or no event_time equals -1, but keep the missing ones. In stata words, i want to drop entire id if event_time==1 | event_time==-1 for each id.

Code:

id    year  event_time 
701 1928   -6
701 1929   -5
701 1930   -4
701 1931   -3
701 1932   -2
701 1933   -1
701 1934    0
701 1935    .
702 1928    .
702 1929    .
702 1930    .
702 1931    .
703 1928   -2
703 1929   -1
703 1930   0
703 1931   1
703 1932   2

Thanks.

↧

Creating a 3-dimensional map in MDS

May 19, 2018, 10:53 pm

≫ Next: rdplot line misfit

≪ Previous: Drop every id on a panel that meets certain variable value

I have been creating 2-dimensional maps in non-metric MDS, with the commands below. However, if I identify 3 or 4 dimensions from the Loss Criterion value, I would still create several 2-D maps to represent all 3 dimensions, for example a map for dim1 and dim2, then dim1 and dim3, and finally dim2 and dim3.. Is there a way in Stata to create a 3-D map by just amending the command below, or should I use an ADO file? Thank you for any suggestion, and/or for where to get the ADO file.

Jose

Here is the command:after reviewing stress and finding 3 dimensions:

* This saves the dimensions scores into variables Dim1, Dim2, and Dim3
mat score = e(Y)
svmat score
rename score* Dim*

* This produces a "pretty" MDS Map with Dim1 and Dim2.
twoway scatter Dim1 Dim2, msymbol(oh) mcolor(navy) mlab(Factors) mlabsize(tiny) xline(0) yline(0) scheme(s1color) name(labeldim1dim2, replace)

* This produces a "pretty" MDS Map with Dim1 and Dim3.
twoway scatter Dim1 Dim3, msymbol(oh) mcolor(navy) mlab(Factors) mlabsize(tiny) xline(0) yline(0) scheme(s1color) name(labeldim1dim3, replace)

* This produces a "pretty" MDS Map with Dim2 and Dim3.
twoway scatter Dim2 Dim3, msymbol(oh) mcolor(navy) mlab(Factors) mlabsize(tiny) xline(0) yline(0) scheme(s1color) name(labeldim2dim3, replace)

↧

rdplot line misfit

May 20, 2018, 3:06 am

≫ Next: IV Probit Model Interpretation of margins

≪ Previous: Creating a 3-dimensional map in MDS

Hi everyone! I have a problem while trying to do the graphic illustration of the linear RDD. The graph looks OK when I use a simple specification - just the dependent variable and one independent (running) variable. However, as I include covariates to see how they affect the point estimation and the slope of the lines at both sides of the cutoff point, the plotted line is suddenly well below the points on the graph. I looks like the intercept is not adjusted after the inclusion of the covariates, but I can't find any way to fix it. I will be grateful for help. Below I paste the syntax, and the graph
Thank you!
Mik

rdplot wynik_gm_m_std2015 dist_rus_aus_bord if abs(dist_rus_aus_bord)<50& ordinary==1&(ktory_zabor==2|ktory_zabor==3), p(1) q(2) h(50) covs (perc_boys l_uczG perc_dysl log_popul proc_pom_spol_2008 log_doch_wlas_pc_2015 perc_higher wyniks_narybku) weights(l_uczG)

Array

↧

IV Probit Model Interpretation of margins

May 20, 2018, 4:33 am

≫ Next: Summary statistics according to quartiles

≪ Previous: rdplot line misfit

Dear all,

I am a graduate student and want to estimate a ivprobit model to account for simultaneous causality. My research question is how heart conditions (heart) affect labor supply (inlf). Therefore, my dependend variable is a dummy for labor force participation (=1 if working and 0 otherwise) and my main independent variable is a dummy for having a heart condition (=1 if heart condition and 0 otherwise). I use panel data and adult body height as a instrument for heart conditions.

My command is : ivprobit inlf age male low_work_sat med_work_sat ever_married diabetes stroke high_blood_pres depression i.year (heart = height_new), first vce(cluster pid)

Using the margins command: "margins, dydx(heart)" gives the "fitted values". These are the predictions of the underliying latent variable right?

margins, dydx(heart)

Average marginal effects Number of obs = 75,120
Model VCE : Robust

Expression : Fitted values, predict()
dy/dx w.r.t. : heart

------------------------------------------------------------------------------
| Delta-method
| dy/dx Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
heart | -3.752775 .026704 -140.53 0.000 -3.805114 -3.700436
------------------------------------------------------------------------------

So in my example this would be something like utility diffrences between working or not and I am not intrested in that.

Using the margins command with predict option: "margins, dydx(heart) predict(pr)" gives the probability of a positive outcome:

. margins, dydx(heart) predict(pr)

Average marginal effects Number of obs = 75,120
Model VCE : Robust

Expression : Probability of positive outcome, predict(pr)
dy/dx w.r.t. : heart

------------------------------------------------------------------------------
| Delta-method
| dy/dx Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
heart | -.1270953 .0521788 -2.44 0.015 -.2293639 -.0248266
------------------------------------------------------------------------------

My interpretation is that heart conditions decreases the probability of beeing employed by 12.7 percentage points (c.p.) and this represents an average marginal effect. Is this the correct way to interprete this margins command?

Furhter, this does not account for the panel structure of my data. Is there any way to take the panel sturcture into account? I tried to estimate both stages by hand with the following xtprobit commands:

*1 Stage
xtprobit heart height_new age male low_work_sat med_work_sat ever_married diabetes ///
stroke depression i.year high_blood_pres, re vce(cluster pid)
predict heart_hat, pr
corr heart_hat heart

*2. Stage with probit and
xtprobit inlf heart_hat age male low_work_sat med_work_sat ever_married diabetes ///
stroke depression high_blood_pres i.year, re vce(cluster pid)

margins, dydx(heart_hat) predict(pr)

Unfortunately, I get verry different and confusing results results:

. margins, dydx(heart_hat) predict(pr)

Average marginal effects Number of obs = 75,120
Model VCE : Robust

Expression : Pr(inlf=1), predict(pr)
dy/dx w.r.t. : heart_hat

------------------------------------------------------------------------------
| Delta-method
| dy/dx Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
heart_hat | -4.528016 .0984684 -45.98 0.000 -4.721011 -4.335022
------------------------------------------------------------------------------

Here, I would interprete that getting a heart condition decreases the probability of beeing employed by 453 percentage points (c.p.). This is really unrealistic and something seems to be wrong with the model. The effect is the same when I do not include random effects when estimating both stages by hand.

Any ideas why the effect increases so heavily and how I can directly include the panel structure in the ivprobit?

I thank you for any comments and help.

Best,
Claudio Schiener

↧

Summary statistics according to quartiles

May 20, 2018, 5:45 am

≫ Next: Xtreg analysis

≪ Previous: IV Probit Model Interpretation of margins

Dear users,

I need to get summary statistics, such as the mean, of a variable according to quartiles of another variable.
For instance, I have the GDP at municipality level (continuous variable), and I would like to have the mean of another variable (say investments) according to quartiles of GDP.
I do not find any useful command.
Thanks for your help.
Adam

↧

Xtreg analysis

May 20, 2018, 6:24 am

≫ Next: Nonlinear interactions in Stata

≪ Previous: Summary statistics according to quartiles

Hi everyone, since I'm new here I apologize in advance for strange questions

I'm analyzing a Panel Data referred to an industry, which comprises more or less 450 firms.
The variables in my dataset are:
- total sales
- operating profit
- debts (less than 12 months)
- return on sales index
- labour productivity
- patents
In order to analyze the effects of patents and labour productivity on the firms' profitability I run a correlation matrix.
Furthermore, I deepened my analysis by running a pooled-OLS regression using ROS as Y, and the other variables as Xs. In addition, I put some other control variables in the model.
Hoping that (since here) everything, more or less, is right...
Now here's the problem. Unfortunately I'm not a big expert of XTREG command: I decided to create a table using both fixed effects and random effects, and then run a Hausman test in order to choose between the effects.
Is it right? Since I've never seen econometric models before: is there any possibility to have a really basic explanation of the main differences between the effects?
My Hausman test suggests to use the random model, but I cannot understand how to explain random effects

Thank you very much for any insight about this topic

Andrea

↧

Nonlinear interactions in Stata

May 20, 2018, 6:31 am

≫ Next: Mundlak approach for the time dimension (i.e. considering years as first-level unit)

≪ Previous: Xtreg analysis

Hi,

I have two variables X and Z that are hypothesized to influence Y. The effect of X on Y is nonlinear, that is, X squared is also significant. So we have:

Y = B0 + B₁X + B₂X² + B₃Z

The code in stata (a mixed model regression is as follows): mixed Y c.X##c.X Z

Now if I want to estimate how the interaction effect of X and Z are, that is, how does Z moderate the impact of X on Y, which of the following codes are correct:

1. mixed Y c.X##c.X c.X##c.Z

2. mixed Y c.X##c.X##c.Z

In other words, should I interact Z with X, or should I interact it with X#X again?

As usual, I tag dear Clyde since his comments have always been helpful for me.

↧

Mundlak approach for the time dimension (i.e. considering years as first-level unit)

May 20, 2018, 8:03 am

≫ Next: recode command labeling issue

≪ Previous: Nonlinear interactions in Stata

Hi all!

I'm trying to show that the year-fixed effects which were interpreted in some papers as a quite unrealistic behaviour, are actually driven by time variation in other company-specific variables and / or macroeconomic factors. Therefore, this would be one very specific situation where I am interested to estimate the higher-level variation in cross-section-invariant factors (such as GDP for example, which would affect equally all companies in a given year, ceteris paribus).
This is how i discovered the Mundlak model, which has been used for estimating time-invarying factors so far.
I'm pretty bad with the technicalities so i wouldn't be able to follow and understand the mathematic steps of the estimation, so i (perhaps stupidly?) assumed that I could do the same for the time dimension and get an equivalent model to one using time-fixed effects. Hence I calculated the yearly means for my company-specific variables and figured that I could use i(year) to get a period RE estimation (which i'm not sure it's the correct way to write it).
When i compare the models, however, I don't get the same coefficients (and SE), and my question is..why? I would really appreciate any input on this, since many hours of googling couldn't get me any example on the time dimension (or I have no idea what to look for..)
It also looks like with one variable the results are almost identical, but they become more different the more variables I add, so perhaps it is really an estimation issue. However, with my limited technical knowledge I have hard times understanding why Mundlak would only work in cross-section but not in time..

So here's the output (i only picked two variables just to make it as simple as possible):

year FE ( I have no idea how to use xtreg in this context, since just writing xtreg with i.year would automatically add company random effects, right?):

Code:

. reg rating capex_w lev_w i.year

      Source |       SS           df       MS      Number of obs   =    26,561
-------------+----------------------------------   F(35, 26525)    =    296.59
       Model |  101874.948        35  2910.71279   Prob > F        =    0.0000
    Residual |  260310.482    26,525  9.81377877   R-squared       =    0.2813
-------------+----------------------------------   Adj R-squared   =    0.2803
       Total |   362185.43    26,560  13.6364996   Root MSE        =    3.1327

------------------------------------------------------------------------------
      rating |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     capex_w |  -.0035686   .3275162    -0.01   0.991    -.6455179    .6383807
       lev_w |   9.213182   .0955961    96.38   0.000     9.025808    9.400555
             |
        year |
       1984  |  -3.211079   3.836789    -0.84   0.403    -10.73139    4.309233
       1985  |  -2.722106    3.13585    -0.87   0.385    -8.868539    3.424328
       1986  |  -2.176657   3.135453    -0.69   0.488    -8.322313    3.968999
       1987  |  -2.184854   3.135357    -0.70   0.486    -8.330321    3.960613
       1988  |  -2.568382   3.135455    -0.82   0.413    -8.714041    3.577277
       1989  |  -2.858261   3.135588    -0.91   0.362    -9.004181    3.287658
       1990  |  -3.012191   3.135746    -0.96   0.337     -9.15842    3.134038
       1991  |  -3.031841   3.135674    -0.97   0.334    -9.177929    3.114248
       1992  |  -2.901572   3.135491    -0.93   0.355    -9.047302    3.244159
       1993  |  -2.679974   3.135191    -0.85   0.393    -8.825115    3.465168
       1994  |  -2.566824    3.13508    -0.82   0.413    -8.711748    3.578099
       1995  |  -2.526583   3.134879    -0.81   0.420    -8.671114    3.617948
       1996  |  -2.403677   3.134599    -0.77   0.443    -8.547659    3.740305
       1997  |  -2.347413   3.134418    -0.75   0.454    -8.491039    3.796214
       1998  |  -2.462711   3.134271    -0.79   0.432     -8.60605    3.680628
       1999  |   -2.26744   3.134294    -0.72   0.469    -8.410823    3.875943
       2000  |  -1.927371   3.134283    -0.61   0.539    -8.070734    4.215992
       2001  |  -1.875278   3.134352    -0.60   0.550    -8.018775     4.26822
       2002  |  -1.690621   3.134331    -0.54   0.590    -7.834077    4.452835
       2003  |  -1.475715   3.134319    -0.47   0.638    -7.619148    4.667717
       2004  |  -1.220183    3.13429    -0.39   0.697     -7.36356    4.923193
       2005  |  -1.153913    3.13433    -0.37   0.713    -7.297367    4.989541
       2006  |  -1.038949    3.13434    -0.33   0.740    -7.182423    5.104524
       2007  |  -1.146393   3.134395    -0.37   0.715    -7.289976    4.997189
       2008  |  -1.264413   3.134461    -0.40   0.687    -7.408124    4.879299
       2009  |  -.9995234   3.134537    -0.32   0.750    -7.143384    5.144337
       2010  |  -1.042056   3.134526    -0.33   0.740    -7.185894    5.101782
       2011  |  -1.117649   3.134483    -0.36   0.721    -7.261403    5.026105
       2012  |  -1.148772   3.134424    -0.37   0.714    -7.292411    4.994867
       2013  |  -1.191228   3.134365    -0.38   0.704    -7.334751    4.952295
       2014  |  -1.290347   3.134275    -0.41   0.681    -7.433693    4.852999
       2015  |  -1.477527   3.134301    -0.47   0.637    -7.620925     4.66587
       2016  |  -1.423933   3.134322    -0.45   0.650    -7.567371    4.719505
             |
       _cons |   8.922157   3.133133     2.85   0.004     2.781048    15.06327
------------------------------------------------------------------------------

And here comes the Mundlak equivalent (I first generate the yearly means):

Code:

. bysort year: egen mean_capex_w=mean(capex_w)
. bysort year: egen mean_lev_w=mean(lev_w)
. xtreg rating capex_w lev_w mean_capex_w mean_lev_w, i(year)

Random-effects GLS regression                   Number of obs     =     26,561
Group variable: year                            Number of groups  =         34

R-sq:                                           Obs per group:
     within  = 0.2602                                         min =          1
     between = 0.1962                                         avg =      781.2
     overall = 0.2710                                         max =      1,024

                                                Wald chi2(4)      =    9441.85
corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000

------------------------------------------------------------------------------
      rating |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     capex_w |  -.0034932   .3277473    -0.01   0.991    -.6458661    .6388798
       lev_w |   9.212831   .0956635    96.30   0.000     9.025334    9.400328
mean_capex_w |  -45.04788   4.116032   -10.94   0.000    -53.11515   -36.98061
  mean_lev_w |  -2.917378   1.917391    -1.52   0.128    -6.675396      .84064
       _cons |   11.01925   .6841172    16.11   0.000     9.678405     12.3601
-------------+----------------------------------------------------------------
     sigma_u |  .24895015
     sigma_e |  3.1326951
         rho |  .00627559   (fraction of variance due to u_i)
------------------------------------------------------------------------------

Ideally, I would also like to add company-fixed effects to this, but I assume then i would have to use multi-level which is perhaps too complicated for a paper which should be done within the next 2 weeks. However, if that is even possible (is there even possible to have a two-way Mundlak??), it would be extremely useful to get at least some references about that (or an answer on whether it is possible or not..)

Thanks a lot!!
Anamaria

↧

recode command labeling issue

May 20, 2018, 12:36 pm

≫ Next: Recoding Racial Groups; Latino =2, but I want Latino =1, etc.

≪ Previous: Mundlak approach for the time dimension (i.e. considering years as first-level unit)

I am attempting to recode a variable generating a new variable with different value labels. My code is:

recode aidunemp (1 = 1 “defin should be”) (2/4= 0) (.=.), gen(stsupport_aid)

The error message I get is :

) expected, "“defin" found
r(198);

The probably seems to be with using a recode value with spaces. No error if I drop the quotes and make my label one word.

E.g.,

recode aidunemp (1 = 1 definitely) (2/4= 0) (.=.), gen(stsupport_aid)

Further, ...

recode aidunemp (1 = 1 defin should be) (2/4= 0) (.=.), gen(stsupport_aid)

...not surprisingly generates this error message:

) expected, " defin " found
r(198);

The manual entry for -recode- pretty clearly suggests that I should be able to use quotes and have a label with spaces. (See bolded and italicized section )
Quoting
----------------------------------------------------------------------------
Setup
. webuse fullauto, clear

For rep77 and rep78, collapse 1 and 2 into 1, change 3 to 2, collapse 4
and 5 into 3, store results in newrep77 and newrep78, and define a new
value label newrep
. recode rep77 rep78 (1 2 = 1 "Below average") (3 = 2 Average) (4 5 =
3 "Above average"), pre(new) label(newrep)

Am I missing something here?

↧

Recoding Racial Groups; Latino =2, but I want Latino =1, etc.

May 20, 2018, 12:46 pm

≫ Next: How to calculate expected idiosyncratic skewness?

≪ Previous: recode command labeling issue

My current dataset has racial groups coded as: White = 0, Black = 1, Latino = 2, Multi-Racial = 3, Filipino = 4.
However, I would like White = 0, Latino = 1, Black = 2, Asian = 3, Multi-Racial = 4.
How do I reorder these codes to match the correct racial group, and how do I tell Stata to make Filipino = Asian.

I've spent several hours looking at recode, rename, gen, and I'm lost. Your help is greatly appreciated.

Thank you,

↧

How to calculate expected idiosyncratic skewness?

May 20, 2018, 1:04 pm

≫ Next: Using the local macro in loop

≪ Previous: Recoding Racial Groups; Latino =2, but I want Latino =1, etc.

Hello everyone,

I want to calculate the expected idios. skewness as in Boyer, B.H., Mitton, T. and Vorkink, K. (2010), “Expected idiosyncratic skewness”, The review of financial studies, Vol. 23 No. 1, pp. 169–202. The problem I'm having is to calculate the historical estimates of idiosyncratic volatility and skewness.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input double(date ym return5 return6 rf rmrf smb hml)
10503 345   -.01894089646965491 -.008849557522123908  .000422 -.01127306  .00525148   .0006878
10504 345   .004851915495805104                    0 .0004233  .00170627 -.00085576 -.00139754
10505 345                     0                    0 .0004251  .00996818 -.00446347  .00140606
10506 345                     0  .008928571428571397 .0004254   .0059196 -.00249881  .00250139
10507 345                     0 -.008849557522123908 .0004254  .00319251  .00030991 -.00277563
10510 345                     0                    0 .0004223 -.00024731  .00422974 -.00149079
10511 345                     0                    0 .0004216 -.00274863  .00377518  .00549557
10512 345                     0                    0 .0004223 -.01161097  .00538493  .00050242
10513 345                     0                    0 .0004223  .00748774 -.00172834  .00512555
10514 345                     0  .008928571428571397 .0004223  .00450467 -.00003376  .00478568
10517 345   .024041846896690533 -.008849557522123908 .0004254  .01022073 -.00470417  .00443819
10518 345   .014047151277013725                    0 .0004264  -.0011541  .00067627 -.00090353
10519 345   .013900997771965518  .008928571428571397 .0004264  .00363072  .00327984 -.00298672
10520 345                     0                    0 .0004251  .00043836 -.00042181  .00095446
10521 345   .004586060287584015 -.008849557522123908 .0004254 -.00152961   .0025553   .0011501
10524 345   .004517570973417762                    0 .0004233 -.00565686  .00289626 -.00235665
10525 345   -.00449725430789627                    0 .0004227 -.00111718 -.00253788  .00240047
10526 345                     0  .008928571428571397 .0004223  .00032483  -.0014962  .00228125
10527 345   .018165390651005753   .07079646017699126 .0004206  .00044753  .00076314  .00001995
10528 345   .008920648264910458  .008264462809917328 .0004244  .00309713  .00082262  .00215505
10531 345                     0 -.016393442622950838 .0004216 -.00123408  .00574452  .00172679
10532 346                     0                    0 .0004251  .00287933 -.00061362     .00232
10533 346                     0                    0 .0004233 -.00697585  .00488589  .00050676
10534 346                     0 -.008333333333333304 .0004244 -.00285437  .00281202   .0029033
10535 346   .004444032959910915                    0 .0004254 -.00192175   .0009521  .00358065
10538 346                     0                    0 .0004254 -.00776155   .0027741  .00311697
10539 346                     0  .008403361344537785 .0004264  .00952688 -.00659687  .00322087
10540 346                     0  -.01666666666666672 .0004254 -.00755174  .00424434  .00203433
10541 346                     0  .008474576271186418 .0004244   .0000658 -.00040594  .00213749
10542 346                     0  -.01680672268907568 .0004227 -.01219847  .00586826  .00206838
10545 346  -.013227025532307035  -.03418803418803418 .0004227 -.00454815 -.00256307  .00250378
10546 346                     0  .017699115044247815 .0004233  .00395039  .00248942  .00142837
10547 346                     0 -.004347826086956497 .0004213  .00237563 -.00113014  .00247123
10548 346                     0    .0393013100436681 .0004223  .00695646 -.00743721 -.00091841
10549 346                     0                    0 .0004244 -.00040684  .00169483  .00343632
10552 346  .0044836766148239615  -.01680672268907568 .0004227 -.00631036  .00254483 -.00003972
10553 346  -.004463662993443918   .02564102564102555  .000423  .00421284 -.00169779  .00349698
10554 346                     0  .008333333333333304  .000424  .00715559 -.00209694  .00426515
10555 346 -.0044836766148241836 -.016528925619834656 .0004244 -.00287966   .0015931  .00042118
10556 346  -.013417780905465748 -.008403361344537785 .0004233 -.02226113  .00126729 -.00038323
10559 346                     0 -.025423728813559365 .0004548 -.00686149  .00134545  .00068337
10560 346  -.004565124352085115                    0 .0004582  .00304422 -.00087426 -.00150591
10561 346                     0  .017391304347825987 .0004606  .00311911  .00006603  .00334213
10562 347  -.004586060287584237 -.017094017094017144 .0004595 -.00880611 -.00284232  .00227736
10563 347 -.0045591975812258045  -.02608695652173909 .0004592 -.00939757 -.00306623 -.00021765
10566 347                     0  .008928571428571397 .0004585 -.00323499 -.00183136  .00373087
10567 347   -.00920836949185222                    0 .0004595  .00212448 -.00235887  .00124775
10568 347  -.004671305532577441  .008849557522123908 .0004595  .00061133 -.00497766  -.0044441
10569 347  -.004644341236861527  -.02631578947368418 .0004578 -.00950524 -.00064092  .00103131
10570 347  -.004715127701375277  -.05405405405405406 .0004575 -.00722724 -.00339905  .00152378
end
format %td date
format %tm ym

I would like to simply describe my calculation steps:
For each month t, I should:
1. do a Fama French regression using daily data from last 60 months, that is, from month t-59 to t
2. define residual of each day in last 60 months using the regression coefficient from step 1
3. calculate the sd and skewness of the residual of last 60 months

I wrote the following code, but the calculation of Residual might be wrong, each month the the Residual of last 60 months should be calculated, that means I might have different Resids for each month instead of only one for all months.

Code:

            gen excessreturn5= return5- rf
            rangestat (reg) excessreturn5 rmrf smb hml, interval(ym -59 0) 
            gen Resid5= excessreturn5 - b_cons - b_hml*hml - b_smb*smb - b_rmrf*rmrf
            rangestat (sd) IV5=Resid5, int(ym -59 0)
            rangestat (skewness) IS5=Resid5, int(ym -59 0)

Best Regards and many thanks in advance!
Yao JIN

↧

Using the local macro in loop

May 20, 2018, 3:40 pm

≫ Next: Multilevel analysis

≪ Previous: How to calculate expected idiosyncratic skewness?

I am getting caught up in the code at the last point and was wanting some advice,

I have a data of breast cancer
Two Categorical Variables
stage - Stage 1, 2, or 3
type - Ductal (1), Medullary (2). Mucinous (3), Papillary (4)

One Indicator Variable
sex - Male (1), Female (2)

One continous variable
survtime

I want to work out the average survival in stage by sex and type by sex,

I have tried to code but get tripped up at the end,

The following works and gives me overall means stratified by sex

local vars stage type
foreach x in local vars {
forvalues i=1/2 {
summarize survtime if sex==`i' , detail
return list
}
}

But when I try to split the groups by their individual categories I keep getting errors

local vars stage type
foreach x in local vars {
forvalues i=1/2 {
forvalues j=1/4{
capture noisily summarize survtime if sex==`i' & `x”==`j', detail
return list
}
}
}

`x” invalid name ???

Thanks

↧

Multilevel analysis

May 20, 2018, 4:48 pm

≫ Next: SEM Model Identification-- fixing variances

≪ Previous: Using the local macro in loop

I am analyzing a global sample and wanted to carry out multilevel analysis by stratifying according to regional variation ("region"). While this is possible for cox regression, I could not find any code for the Fine-Grey model for competing risk analysis.
Even though we carry out stratification for nested models, is it possible to simply adjust for the regional variation?
I carried out Cox regression using two models. First by stratifying for "region" and second by adjusting for "region". The results were very similar.
Will it be reasonable to carry out the Fine-Grey regression by only adjusting for "region" instead of doing any stratification.
Thank you.

↧

SEM Model Identification-- fixing variances

May 20, 2018, 5:27 pm

≫ Next: Convergence problems

≪ Previous: Multilevel analysis

Dear Statalist users, I am using Stata 14, and am working with cross-sectional data. I am trying to run Confirmatory Factor Analysis (CFA) on nine items using the 'sem' command. The items are 4-category ordinal variables. Items are called i1, i2, i3, ..i9, and the factors are called f1, f2 and f3. My goal is to fit a second-order model where a fourth latent variable, f4, may be an overarching construct (second-order factor) which f1, f2 and f3 loads strongly on. The command I use :

Code:

 sem (f1-> i1 i2 i3) (f2-> i4 i5 i6 i7) (f3-> i7 i8 i9) (f4-> f1 f2 f3), latent (f1 f2 f3 f4) ///  cov( e.f1@1 e.f2@1 e.f3@1 f4@1) nocapslatent difficult ml

This is to override Stata's default anchoring on some factor loadings. I would like to see the factor loadings; that's why I try to fix the variances at 1, but the model does not converge. Could you help me find out what I can do differently? Thanks, Sule

↧

Convergence problems

May 20, 2018, 7:35 pm

≫ Next: Fama MacBeth regression using xtfmb

≪ Previous: SEM Model Identification-- fixing variances

Hi,

I am trying to calculate RR using logistic regression but there are apparently some convergence problems:

. glm resistance_new i.sa4code_analysis, family (binomial) link(log) eform

Iteration 0: log likelihood = -4999.9042 (not concave)
Iteration 1: log likelihood = -3998.9689 (not concave)
Iteration 2: log likelihood = -3961.6864 (not concave)
Iteration 3: log likelihood = -3954.0891 (not concave)
Iteration 4: log likelihood = -3953.9001 (not concave)
Iteration 5: log likelihood = -3953.8996 (not concave)
Iteration 6: log likelihood = -3953.8996 (not concave)
Iteration 7: log likelihood = -3953.8996 (not concave)
Iteration 8: log likelihood = -3953.8996 (not concave)
Iteration 9: log likelihood = -3953.8996 (not concave)
Iteration 10: log likelihood = -3953.8996 (not concave)
Iteration 11: log likelihood = -3953.8996 (not concave)
Iteration 12: log likelihood = -3953.8996 (not concave)
Iteration 13: log likelihood = -3953.8995 (not concave)
Iteration 14: log likelihood = -3953.8995 (not concave)
Iteration 15: log likelihood = -3953.8995 (not concave)
Iteration 16: log likelihood = -3953.8995 (not concave)
Iteration 17: log likelihood = -3953.8995 (not concave)
Iteration 18: log likelihood = -3953.8995 (not concave)

When I specify number of iterations at 20, I get this:

. glm resistance_new i.sa4code, family (binomial) link(log) eform iter(20)

Iteration 0: log likelihood = -4999.9042 (not concave)
Iteration 1: log likelihood = -3998.9689 (not concave)
Iteration 2: log likelihood = -3961.6864 (not concave)
Iteration 3: log likelihood = -3954.0891 (not concave)
Iteration 4: log likelihood = -3953.9001 (not concave)
Iteration 5: log likelihood = -3953.8996 (not concave)
Iteration 6: log likelihood = -3953.8996 (not concave)
Iteration 7: log likelihood = -3953.8996 (not concave)
Iteration 8: log likelihood = -3953.8996 (not concave)
Iteration 9: log likelihood = -3953.8996 (not concave)
Iteration 10: log likelihood = -3953.8996 (not concave)
Iteration 11: log likelihood = -3953.8996 (not concave)
Iteration 12: log likelihood = -3953.8996 (not concave)
Iteration 13: log likelihood = -3953.8995 (not concave)
Iteration 14: log likelihood = -3953.8995 (not concave)
Iteration 15: log likelihood = -3953.8995 (not concave)
Iteration 16: log likelihood = -3953.8995 (not concave)
Iteration 17: log likelihood = -3953.8995 (not concave)
Iteration 18: log likelihood = -3953.8995 (not concave)
Iteration 19: log likelihood = -3953.8995 (not concave)
Iteration 20: log likelihood = -3953.8995 (not concave)
convergence not achieved

Generalized linear models No. of obs = 7,651
Optimization : ML Residual df = 7,633
Scale parameter = 1
Deviance = 7907.799058 (1/df) Deviance = 1.036001
Pearson = 6400006511 (1/df) Pearson = 838465.4

Variance function: V(u) = u*(1-u) [Bernoulli]
Link function : g(u) = ln(u) [Log]

AIC = 1.038269
Log likelihood = -3953.899529 BIC = -60351

--------------------------------------------------------------------------------
| OIM
resistance_new | Risk Ratio Std. Err. z P>|z| [95% Conf. Interval]
---------------+----------------------------------------------------------------
sa4code |
302 | .3186184 .0543363 -6.71 0.000 .2280916 .4450742
303 | .27928 .0513765 -6.93 0.000 .1947387 .4005229
304 | .4046953 .1011731 -3.62 0.000 .2479301 .6605824
305 | .2603081 .0624054 -5.61 0.000 .1627134 .4164395
306 | .231827 .0414789 -8.17 0.000 .1632547 .3292019
307 | .4443705 .1560037 -2.31 0.021 .2233144 .8842472
308 | .5627156 .0598641 -5.40 0.000 .4568089 .6931758
309 | .2273888 .0338319 -9.95 0.000 .1698727 .304379
310 | .5681479 .0649963 -4.94 0.000 .4540292 .71095
311 | .3718394 .0459352 -8.01 0.000 .2918784 .4737058
312 | 2.470708 .0673254 33.19 0.000 2.342214 2.60625
313 | .2835836 .0781954 -4.57 0.000 .165185 .4868458
314 | .327395 .059736 -6.12 0.000 .2289618 .4681458
315 | .6347662 .0340739 -8.47 0.000 .5713758 .7051894
316 | 2.231104 .0917053 19.52 0.000 2.058415 2.418282
317 | 1.011726 .2652771 0.04 0.965 .6051682 1.691415
318 | 3.322281 . . . . .
319 | .59012 .0511128 -6.09 0.000 .4979826 .6993048
|
_cons | .3088778 3.86e-10 -9.4e+08 0.000 .3088778 .3088778
--------------------------------------------------------------------------------
Note: _cons estimates baseline risk.
Warning: parameter estimates produce inadmissible mean estimates in one or
more observations.
Warning: convergence not achieved

Apparently, there is a problem with sa4code=318. Is there any way I could correct this? Option of converting this categorical variable to continuous is not possible in this case. I tried to change the baseline but didn't work either.

Thanks in advance.

Andrea

↧

Fama MacBeth regression using xtfmb

May 21, 2018, 11:05 am

≫ Next: ICD9 Issue

≪ Previous: Convergence problems

Hello everyone,

I would like to make cross sectional regression over 60 months following the Fama MacBeth procedure. However, I can't figure out how to run it correctly, when I enter "xtfmb x y" with x and y as my variables I get a series of "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xx". Do you know why ?

On the other hand do you know how to simply ask STATA to run a cross sectional regression on a particular date ? I mean, if i want to run a cross sectionnal regression for the month number 12 for example using my entire data set, is it possible ?

Thanks in advance,

Geolien,

↧

ICD9 Issue

May 21, 2018, 11:13 am

≫ Next: problem with rdplot (RDD graph) when covariates included

≪ Previous: Fama MacBeth regression using xtfmb

Hi, I am new here, I am working on (HCUP) data and run into invalid ICD9 codes in one of Var , when I look at the Data Editor Browse I found some of ICD9 value is "invl" ? , all other values are numerical and correct codes. how can I fix this ?

this is what I go when I do icd9p check pr1, any

1. Invalid placement of period 0
2. Too many periods 0
3. Code too short 0
4. Code too long 0
5. Invalid 1st char (not 0-9) 27,472
6. Invalid 2nd char (not 0-9) 0
7. Invalid 3rd char (not 0-9) 0
8. Invalid 4th char (not 0-9) 0
-----------
Total 27,472

↧

problem with rdplot (RDD graph) when covariates included

May 21, 2018, 11:48 am

≫ Next: Question about new variable generation by group

≪ Previous: ICD9 Issue

Hi everyone! I have a problem while trying to do the graphic illustration of the RDD using rdplot. The graph looks OK when I use a simple specification - just the dependent variable and one independent (running) variable. However, as I include covariates to see how they affect the point estimation and the slope of the lines at both sides of the cutoff point, the plotted line is suddenly well below the points on the graph. I looks like the intercept is not adjusted after the inclusion of the covariates. I know I probably do some stupid mistake but I can't find any way to fix it, and the help file is not verey informative
I will be grateful for help. Below I paste the syntax, and the graph
rdplot wynik_gm_m_std2015 dist_rus_aus_bord if abs(dist_rus_aus_bord)<50& ordinary==1&(ktory_zabor==2|ktory_zabor==3), p(1) covs (perc_boys l_uczG perc_dysl log_popul proc_pom_spol_2008 log_doch_wlas_pc_2015 perc_higher ) weights(l_uczG) h(50)

Array

↧

Question about new variable generation by group

May 21, 2018, 12:10 pm

≫ Next: Question in Testing Joint Inequality

≪ Previous: problem with rdplot (RDD graph) when covariates included

Hi all,

How can I generate a new variable "countyindicator",
if for a person, all county equal to 25, then generate countyindicator = 1
if for a person, one of county equal to 25, then generate countyindicator = 2
if for a person, none of county equal to 25, then generate county indicator = 3

Here is the example below,

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(id county)
1 25
1 25
1 25
1 25
2 78
2 29
3 64
3 25
3 97
4 25
end

Here is the dataset that expected.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(id county countyindicator)
1 25 1
1 25 1
1 25 1
1 25 1
2 78 3
2 29 3
3 64 2
3 25 2
3 97 2
4 25 1
end

Thanks

Jack Liang

↧

Question in Testing Joint Inequality

May 21, 2018, 12:12 pm

≫ Next: Xtfisher vs xtunitroot fisher

≪ Previous: Question about new variable generation by group

Dear all,

I've been trying to use the supermodularity approach to check the pairwise complementarities between innovations.
My question is about testing joint inequality in STATA.

In the supermodularity approach, there is complementaritiy if C (1, 1) – C (0, 1) ≥ C (1, 0) – C (0, 0), where the binary variables indicate whether a firm conducts a type of innovation.
To test joint inequality,recent studies follow Kodde and Palm’s approach (1986) and use Wald test. They use inequality in their null hypothesis: C (11XX) – C (01XX) - C (10XX) + C (00XX) ≥ 0

The problem is that the test command, which is for Wald test, in STATA doesn't allow joint inequality, so I can only test joint equality in my null hypothesis: C (11XX) – C (01XX) - C (10XX) + C (00XX) = 0
My Stata code looks like:

Code:

 test (C1111 – C0111  - C1011 - C0011 = 0) (C1110 – C0110  - C1010 - C0010 = 0)  (C1101 – C0101  - C1001 - C0001 = 0)  (C1100 – C0100  - C1000 - C0000 = 0)  
  
  chi2     (   4) = 1.57
  Prob > chi2 = 0.8137

Currently, I run test command for Wald test after the xtprobit command.

I'm looking for a way to do this:

Code:

test (C1111 – C0111  - C1011 - C0011 >= 0) (C1110 – C0110  - C1010 - C0010 >= 0)  (C1101 – C0101  - C1001 - C0001 >= 0)  (C1100 – C0100  - C1000 - C0000 >= 0)

I wonder if you could let me know how to perform Wald test for multiple inequality constraints (joint hypotheses).

Also, I've check the FAQ: "How can I perform a one-sided test?"
https://www.stata.com/support/faqs/s...-coefficients/
But this only applies to single hypothesis, not joint hypothesis.

↧