Quantcast
Channel: Statalist
Viewing all 65631 articles
Browse latest View live

Diff command - Kernel-based Propensity Score Matching DID in the context of RCS data

$
0
0
Hello,

Hope you all are keeping safe. I hope some of you would be better positioned to answer a few of my queries in the context of diff command (developed by Juan M. Villa and available on SSC) that can perform PSM and DID analyses.
  1. How are the standard errors computed in the context of kernel PSM for repeated cross-sectional (rcs) data? The help file states that under certain circumstances parametric estimation can be problematic and thus bootstrapped standard errors can be employed as an alternative. What are such circumstances? It would be nice if any of you could elaborate a bit on that and suggest an apt reference(s) for the same.
  2. Can the number of observations vary with the use of sampling weights in the context of rcs? If so, why?
Thanks in anticipation.

Best,
Jaya

Change scores in two wave panel data

$
0
0
Hello,

I have a two wave balanced panel dataset and I am trying to model change in employment status, so whether one remained unemployed (consistently not in employment), remained employed (consistently in employment), were unemployed and became employed or were employed and became unemployed.

I would like to model the four probabilities and I have both time invariant and time varying predictors. For example, does the education or family size of a person mean they are more likely to be continuously in employment or to be employed in first wave and unemployed in second wave. My observations are nested within participants which are nested in different districts, so I have this hierarchical structure to my data.

I am not sure how to calculate the change score and predict the four probabilities. My outcome is a binary variable for whether a person is employed/unemployed at wave 1 and 2. I have calculated the change in the outcome (“change”) and generated another variable (“empchange”) to see within person change in my data:

Code:
xtset ID year 
       panel variable:  ID (strongly balanced)
        time variable:  year, 2012 to 2018, but with gaps
                delta:  1 unit

. bysort ID (year): gen change = cremp2-cremp2[_n-1]
(5,588 missing values generated)

. ta change

     change |      Freq.     Percent        Cum.
------------+-----------------------------------
         -1 |        664       11.88       11.88
          0 |      3,641       65.16       77.04
          1 |      1,283       22.96      100.00
------------+-----------------------------------
      Total |      5,588      100.00

gen empchange=.
recode empchange .=0 if change==-1
recode empchange .=1 if change==1
recode empchange .=2 if change==0 & cremp2==0
recode empchange .=3 if change==0 & cremp2==1
label define chnge 0 "Employed - unemployed" 1 "Unemployed - employed" 2 "Consistently unemployed" 3 "Consistently employed"
label values empchange chnge
My “change” variable then represents whether one’s employment status changed (-1, 1) or stayed the same (0) measured at wave 2. I also have my predictors which are measured at wave 1 and 2. I am not sure which modelling strategy would be suitable or whether this is the correct way to calculate the change score. I have though about multilevel mixed effects models, but I am not sure if that’s possible with a nominal outcome variable?

An example of my data:

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input double ID int year byte(district educ2 hhsize) float(change empchange) byte cremp2
601004004 2012  1 1 6  . . 0
601004004 2018  1 1 7  0 2 0
601004603 2012  1 0 5  . . 0
601004603 2018  1 1 5  0 2 0
601005902 2012  1 3 4  . . 1
601005902 2018  1 3 4  0 3 1
601012406 2012  1 2 4  . . 0
601012406 2018  1 2 4  0 2 0
601020204 2012  1 3 4  . . 0
601020204 2018  1 3 3  0 2 0
601029206 2012  1 0 4  . . 0
601029206 2018  1 0 2  0 2 0
601032102 2012  1 3 2  . . 1
601032102 2018  1 3 2 -1 0 0
601032309 2012  1 2 5  . . 0
601032309 2018  1 2 6  0 2 0
601037505 2012 21 3 4  . . 0
601037505 2018 21 3 5  0 2 0
601038302 2012  1 3 3  . . 0
601038302 2018  1 3 5  0 2 0
601045003 2012  1 2 4  . . 0
601045003 2018  1 2 5  0 2 0
601050804 2012  1 3 6  . . 0
601050804 2018  1 3 7  0 2 0
601051404 2012  1 2 5  . . 0
601051404 2018  1 2 6  0 2 0
601054402 2012  1 2 4  . . 0
601054402 2018  1 2 5  0 2 0
601055607 2012  1 2 2  . . 0
601055607 2018  1 2 1  1 1 1
601056302 2012  1 2 4  . . 0
601056302 2018  1 2 5  0 2 0
601058302 2012  1 3 4  . . 1
601058302 2018  1 3 5  0 3 1
601058602 2012  1 2 5  . . 0
601058602 2018  1 2 5  0 2 0
601060002 2012  1 0 5  . . 0
601060002 2018  1 0 6  1 1 1
601060202 2012  1 2 4  . . 0
601060202 2018  1 2 4  0 2 0
601060302 2012  1 1 4  . . 0
601060302 2018  1 1 4  0 2 0
601061602 2012  1 3 2  . . 0
601061602 2018  1 3 2  0 2 0
601062002 2012  1 2 5  . . 0
601062002 2018  1 2 5  0 2 0
601062602 2012  1 0 5  . . 0
601062602 2018  1 1 6  0 2 0
601063102 2012  1 0 4  . . 0
601063102 2018  1 2 5  1 1 1
601067002 2012  1 3 4  . . 1
601067002 2018  1 3 4 -1 0 0
601072002 2012  1 2 6  . . 0
601072002 2018  1 2 6  1 1 1
601072702 2012  1 0 3  . . 0
601072702 2018  1 1 4  0 2 0
601072902 2012  1 1 4  . . 0
601072902 2018  1 1 4  1 1 1
601073102 2012  1 1 4  . . 0
601073102 2018  1 1 4  0 2 0
601074502 2012  1 3 4  . . 0
601074502 2018  1 3 4  1 1 1
601075902 2012  1 2 4  . . 0
601075902 2018  1 2 5  0 2 0
601077402 2012  1 2 6  . . 0
601077402 2018  1 2 7  0 2 0
601078802 2012  1 3 5  . . 1
601078802 2018  1 1 5  0 3 1
601080302 2012  1 2 5  . . 0
601080302 2018  1 2 5  1 1 1
601082202 2012  1 2 4  . . 0
601082202 2018  1 2 4  0 2 0
601083402 2012  1 2 5  . . 0
601083402 2018  1 3 5  0 2 0
601085202 2012  1 2 4  . . 0
601085202 2018  1 2 5  0 2 0
601085602 2012  1 2 5  . . 0
601085602 2018  1 2 5  0 2 0
601085703 2012  1 2 4  . . 0
601085703 2018  1 2 6  0 2 0
601086402 2012  1 0 4  . . 0
601086402 2018  1 0 6  0 2 0
601088002 2012  1 3 4  . . 0
601088002 2018  1 3 4  0 2 0
601088202 2012  1 2 5  . . 0
601088202 2018  1 2 5  0 2 0
601088305 2012  1 2 5  . . 0
601088305 2018  1 2 2  0 2 0
601088306 2012  1 2 5  . . 0
601088306 2018  1 2 6  0 2 0
601089002 2012  1 2 4  . . 0
601089002 2018  1 2 5  0 2 0
601089303 2012  1 2 4  . . 1
601089303 2018  1 3 4  0 3 1
601090502 2012  1 3 5  . . 1
601090502 2018  1 3 4  0 3 1
601090602 2012  1 2 5  . . 0
601090602 2018  1 2 5  0 2 0
601091602 2012  1 2 6  . . 0
601091602 2018  1 2 5  0 2 0
end
label values year code
label values district Lgov
label def Lgov 1 "Cairo", modify
label def Lgov 21 "Giza", modify
label values educ2 educ
label def educ 0 "Illiterate", modify
label def educ 1 "Less than vocational secondary", modify
label def educ 2 "Vocational secondary", modify
label def educ 3 "University & post-grad", modify
label values empchange chnge
label def chnge 0 "Employed - unemployed", modify
label def chnge 1 "Unemployed - employed", modify
label def chnge 2 "Consistently unemployed", modify
label def chnge 3 "Consistently employed", modify
label values cremp2 emp
label def emp 0 "No", modify
label def emp 1 "Yes", modify

Matching country specific data with relationship specific data

$
0
0
Hello,

I am trying to build up panel data for my gravity model. the dependent variable is trade flow of 23 countries to each other from 1980 to 2015. I created ids and managed to identify time variable since there are repeating time values eg: the exports from Albania to Greece in 1980, the exports from Albania to Turkey in 1980..etc. however now I need to add country specific data such as GDP (as independent variable) which occurs 1980 just once whereas 1980 has 22 relationship spesific observation point for each country. How do I match/merge data?

Reshape Problem

$
0
0
Hello Statalisters,

I have downloaded some information from NZ.Stat (New Zealand Quarterly Work Statistics) and am having trouble reshaping due to the format of the data. I've posted an example so you can see what I mean. The data is downloadable (the easy part), but the format is layered in a way where I can't retain the region-specific information with the industry sector. Is there some way I can connect the region (in row 2) and add it to industry type (from row 3) and then reshape from wide to long so I can then collapse to get the annual figures for each year in the data? I know each of these commands so I don't need help with anything past the first (and most challenging step). See the thumbnail for the format of the data as it show in STATA and the csv file for the raw data that was imported to STATA. I have put in a request for NZ Stats to create this for me, but it accrues a cost for custom files, and since I have the data it doesn't make sense for me to pay for something that just needs a bit of manipulation to be useful for my project.

Best,

Davia Downey

Graphs domain-codomain

$
0
0

Hello all,

Right now I'm using the margins and marginsplot commands to look at the graphs of two of my functions.
Both serve me to observe the relationship between two dependent variables (HHI2 and c.HHI2#c.HHI2) on two independent variables (FSTS and FATA).
This is the result:
Code:
eg FSTS_5 c.HHI2 c.HHI2#c.HHI2 emp xad FSTS i.fyear i.naicsh2, robust

Linear regression                               Number of obs     =      2,153
                                                F(31, 2121)       =     298.99
                                                Prob > F          =     0.0000
                                                R-squared         =     0.3360
                                                Root MSE          =     244.71

------------------------------------------------------------------------------
             |               Robust
      FSTS_5 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        HHI2 |   5.748962   5.958279     0.96   0.335    -5.935718    17.43364
             |
      c.HHI2#|
      c.HHI2 |  -6.141813   2.916205    -2.11   0.035    -11.86073   -.4228923
             |
         emp |   .0175882   .0280968     0.63   0.531    -.0375119    .0726883
         xad |  -.0046756   .0023271    -2.01   0.045    -.0092393    -.000112
        FSTS |  -.9846829   .0135233   -72.81   0.000    -1.011203   -.9581625
             |
       fyear |
       2010  |   28.35477    21.9708     1.29   0.197    -14.73179    71.44133
       2011  |   15.00335    17.3051     0.87   0.386     -18.9334    48.94009
       2012  |   9.730732   17.83667     0.55   0.585    -25.24845    44.70992
       2013  |   10.75918   16.95665     0.63   0.526    -22.49421    44.01258
       2014  |  -120.2575   138.6513    -0.87   0.386    -392.1641    151.6492
             |
     naicsh2 |
         21  |    14.1921    8.22762     1.72   0.085    -1.942946    30.32715
         23  |   1131.067   921.8457     1.23   0.220    -676.7488    2938.883
         31  |   18.74618   6.566223     2.85   0.004     5.869268    31.62308
         32  |   9.011241   4.928855     1.83   0.068     -.654653    18.67714
         33  |   20.72985   12.44844     1.67   0.096    -3.682577    45.14228
         42  |   12.48527   6.227395     2.00   0.045      .272827    24.69771
         44  |  -9.355896   18.03097    -0.52   0.604    -44.71612    26.00433
         45  |   26.66661   29.08853     0.92   0.359    -30.37842    83.71163
         48  |   4.820748   5.842496     0.83   0.409    -6.636873    16.27837
         49  |  -3.108557   9.183999    -0.34   0.735    -21.11914    14.90203
         51  |   14.96792   6.146632     2.44   0.015     2.913862    27.02198
         52  |   6.339456   4.996703     1.27   0.205    -3.459493     16.1384
         53  |   8.078983   5.903122     1.37   0.171    -3.497529     19.6555
         54  |   6.421113   5.475224     1.17   0.241    -4.316257    17.15848
         56  |   5.639013   5.717088     0.99   0.324     -5.57267     16.8507
         61  |  -3.834198   5.770391    -0.66   0.506    -15.15041    7.482019
         62  |   7.551174   8.546145     0.88   0.377    -9.208525    24.31087
         71  |   1.529445   9.937071     0.15   0.878    -17.95798    21.01687
         72  |   3.060731   10.66999     0.29   0.774      -17.864    23.98546
         81  |   3.465968   10.77805     0.32   0.748    -17.67069    24.60263
         99  |   13.02404   10.43827     1.25   0.212    -7.446282    33.49436
             |
       _cons |    -8.4586   14.86106    -0.57   0.569    -37.60237    20.68517
------------------------------------------------------------------------------

. margins, at(HHI2=(-2423761(1117477)3163628))

Predictive margins                              Number of obs     =      2,153
Model VCE    : Robust

Expression   : Linear prediction, predict()

1._at        : HHI2            =    -2423761

2._at        : HHI2            =    -1306284

3._at        : HHI2            =     -188807

4._at        : HHI2            =      928670

5._at        : HHI2            =     2046147

6._at        : HHI2            =     3163624

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         _at |
          1  |  -3.61e+13   1.71e+13    -2.11   0.035    -6.97e+13   -2.48e+12
          2  |  -1.05e+13   4.98e+12    -2.11   0.035    -2.02e+13   -7.22e+11
          3  |  -2.19e+11   1.04e+11    -2.11   0.035    -4.23e+11   -1.51e+10
          4  |  -5.30e+12   2.52e+12    -2.11   0.035    -1.02e+13   -3.65e+11
          5  |  -2.57e+13   1.22e+13    -2.11   0.035    -4.97e+13   -1.77e+12
          6  |  -6.15e+13   2.92e+13    -2.11   0.035    -1.19e+14   -4.23e+12
------------------------------------------------------------------------------

. marginsplot, noci

  Variables that uniquely identify margins: HHI2

. reg FATA_5 c.HHI2 c.HHI2#c.HHI2 ppent xad FATA i.fyear i.naicsh2, robust

Linear regression                               Number of obs     =        792
                                                F(27, 763)        =          .
                                                Prob > F          =          .
                                                R-squared         =     0.0595
                                                Root MSE          =     1.1202

------------------------------------------------------------------------------
             |               Robust
      FATA_5 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        HHI2 |  -.0165341   .0359016    -0.46   0.645    -.0870117    .0539435
             |
      c.HHI2#|
      c.HHI2 |   .0539992   .0216495     2.49   0.013     .0114995    .0964989
             |
       ppent |  -7.95e-06   6.37e-06    -1.25   0.212    -.0000204    4.55e-06
         xad |   .0000378    .000062     0.61   0.542     -.000084    .0001596
        FATA |   .1343711   .2019304     0.67   0.506    -.2620341    .5307762
             |
       fyear |
       2010  |  -.2459593   .2176589    -1.13   0.259    -.6732407    .1813222
       2011  |  -.2333696   .1971044    -1.18   0.237    -.6203008    .1535617
       2012  |  -.3467578   .2111575    -1.64   0.101    -.7612764    .0677608
       2013  |  -.3213393   .1899402    -1.69   0.091    -.6942068    .0515282
       2014  |  -.2064751   .2127234    -0.97   0.332    -.6240677    .2111174
             |
     naicsh2 |
         31  |  -.0743192   .2753656    -0.27   0.787    -.6148835     .466245
         32  |  -.1966656   .1492444    -1.32   0.188    -.4896441    .0963129
         33  |   -.019096   .1576282    -0.12   0.904    -.3285325    .2903405
         42  |  -.0332439   .1507483    -0.22   0.826    -.3291746    .2626869
         44  |  -.0854561   .1480687    -0.58   0.564    -.3761265    .2052143
         45  |   .0687517   .2181854     0.32   0.753    -.3595633    .4970667
         48  |   .3265516   .6709194     0.49   0.627    -.9905156    1.643619
         49  |  -.2243832   .3350705    -0.67   0.503    -.8821527    .4333863
         51  |  -.0832483   .1714566    -0.49   0.627     -.419831    .2533344
         52  |  -.1686548    .170126    -0.99   0.322    -.5026255    .1653158
         53  |   .0673404    .210384     0.32   0.749    -.3456597    .4803406
         54  |  -.0741498   .1613922    -0.46   0.646    -.3909753    .2426756
         56  |   .5535467   .3152264     1.76   0.079    -.0652673    1.172361
         61  |   .4271323   .2017446     2.12   0.035     .0310919    .8231727
         62  |   .5869191   .1646887     3.56   0.000     .2636223    .9102159
         71  |  -1.044368   1.085733    -0.96   0.336    -3.175747     1.08701
         72  |   .5358754   .3583476     1.50   0.135    -.1675888     1.23934
         99  |   .0488011   .1782962     0.27   0.784    -.3012082    .3988104
             |
       _cons |   .1388832   .2227807     0.62   0.533    -.2984527    .5762191
------------------------------------------------------------------------------

. reg FATA_5 c.HHI2 c.HHI2#c.HHI2 ppent xrd FATA i.fyear i.naicsh2, robust

Linear regression                               Number of obs     =      1,275
                                                F(27, 1246)       =          .
                                                Prob > F          =          .
                                                R-squared         =     0.0961
                                                Root MSE          =      .8586

------------------------------------------------------------------------------
             |               Robust
      FATA_5 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        HHI2 |   .0195055   .0213385     0.91   0.361    -.0223578    .0613687
             |
      c.HHI2#|
      c.HHI2 |   .0495493   .0145082     3.42   0.001     .0210862    .0780125
             |
       ppent |  -3.15e-06   1.82e-06    -1.73   0.085    -6.73e-06    4.31e-07
         xrd |   .0000206   .0000115     1.80   0.072    -1.84e-06    .0000431
        FATA |   .1481938    .228158     0.65   0.516    -.2994225      .59581
             |
       fyear |
       2010  |  -.2366492   .1873272    -1.26   0.207    -.6041608    .1308625
       2011  |   -.229709   .1766047    -1.30   0.194    -.5761844    .1167663
       2012  |  -.2354147   .1751642    -1.34   0.179    -.5790641    .1082346
       2013  |  -.3049266   .1750912    -1.74   0.082    -.6484328    .0385795
       2014  |  -.2289092   .1882133    -1.22   0.224    -.5981592    .1403407
             |
     naicsh2 |
         21  |  -.2419569   .8036023    -0.30   0.763     -1.81852    1.334606
         23  |   .1010646   .1894885     0.53   0.594    -.2706872    .4728164
         31  |   .0828105   .2214138     0.37   0.708    -.3515745    .5171956
         32  |  -.0947726   .1571047    -0.60   0.546    -.4029917    .2134464
         33  |   .0363552   .1754657     0.21   0.836    -.3078856    .3805961
         42  |   .0456493   .1812305     0.25   0.801    -.3099014    .4011999
         44  |   .0290768   .1645694     0.18   0.860    -.2937869    .3519405
         45  |   .0810528   .1920919     0.42   0.673    -.2958066    .4579121
         51  |  -.0342049   .1866898    -0.18   0.855    -.4004659    .3320561
         52  |   .8368939   .4879775     1.72   0.087    -.1204544    1.794242
         53  |  -.0331553   .2014176    -0.16   0.869    -.4283105    .3619999
         54  |   .1043922   .2340104     0.45   0.656    -.3547056      .56349
         56  |   1.284458   .7020799     1.83   0.068    -.0929312    2.661847
         62  |   .4936155   .2511334     1.97   0.050     .0009244    .9863066
         71  |  -1.716256   1.933938    -0.89   0.375    -5.510391     2.07788
         72  |   .4055285   .4908074     0.83   0.409    -.5573716    1.368429
         81  |   .3935147   3.828506     0.10   0.918    -7.117516    7.904545
         99  |   1.312217   .2205636     5.95   0.000     .8794997    1.744934
             |
       _cons |   .0099779   .2152347     0.05   0.963    -.4122845    .4322403
------------------------------------------------------------------------------

. margins, at(HHI2=(-2423761(1117477)3163628))

Predictive margins                              Number of obs     =      1,275
Model VCE    : Robust

Expression   : Linear prediction, predict()

1._at        : HHI2            =    -2423761

2._at        : HHI2            =    -1306284

3._at        : HHI2            =     -188807

4._at        : HHI2            =      928670

5._at        : HHI2            =     2046147

6._at        : HHI2            =     3163624

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         _at |
          1  |   2.91e+11   8.52e+10     3.42   0.001     1.24e+11    4.58e+11
          2  |   8.45e+10   2.48e+10     3.42   0.001     3.60e+10    1.33e+11
          3  |   1.77e+09   5.17e+08     3.42   0.001     7.52e+08    2.78e+09
          4  |   4.27e+10   1.25e+10     3.42   0.001     1.82e+10    6.73e+10
          5  |   2.07e+11   6.07e+10     3.42   0.001     8.83e+10    3.27e+11
          6  |   4.96e+11   1.45e+11     3.42   0.001     2.11e+11    7.81e+11
------------------------------------------------------------------------------

.
As you can see the two functions share the same domain, but different codomains (completely different margins values), which seems suspicious to me.
Do you happen to know what might be the cause and a possible solution to this problem?



Choosing reference/base periods in panel data where the base period varies across firms

$
0
0
Hi

I am looking to do two things with a panel dataset of firm-year observations.

1. I would like to set the DealYear as the base period for the variable i.Year in the following regression and normalize the corresponding coefficient to zero. I am interested in the coefficients on the year dummy variables produced.

I understand the command for this if the base year is the same for all firms ie. if the base year = 2010 then we have:

Code:
xtreg investment ib2010.Year i.firmid, fe cluster(firmid)
However, I want the base period to equal the DealYear (see datatex example below), which varies across firms, and I'm not sure how to incorporate this in.


2. I would like to set the omitted variable below to DealYear for the Year variable so as to include coefficients for (t-2*eba), (t-1*eba), (t+1*eba), (t+2*eba) etc - but not (t*eba) where t = DealYear. Again, given that the DealYear varies for each firm, I'm unsure how to incorporate this.

If the DealYear were constant across all firms, such as 2010, I would have:

Code:
xtreg investment i.eba##ib2010.Year, fe cluster(firmid)
Data sample:
Code:
 * Example generated by -dataex-. To install: ssc install dataex clear input float(firmid Year DealYear Investment eba)  1 2005 2006 4 0  1 2006 2006 9 0  1 2007 2006 1 0  1 2008 2006 3 0  1 2009 2006 1 0  1 2010 2006 7 0  1 2011 2006 7 0  1 2012 2006 4 0  2 2005 2009 3 1  2 2006 2009 4 1  2 2007 2009 6 1  2 2008 2009 7 1  2 2009 2009 4 1  2 2010 2009 2 1  2 2011 2009 5 1  2 2012 2009 7 1  3 2005 2009 3 1  3 2006 2009 5 1  3 2007 2009 2 1  3 2008 2009 1 1  3 2009 2009 1 1  3 2010 2009 4 1  3 2011 2009 5 1  3 2012 2009 6 1  4 2005 2011 7 0  4 2006 2011 3 0  4 2007 2011 8 0  4 2008 2011 4 0  4 2009 2011 7 0  4 2010 2011 8 0  4 2011 2011 3 0  4 2012 2011 2 0  5 2005 2009 5 1  5 2006 2009 8 1  5 2007 2009 3 1  5 2008 2009 3 1  5 2009 2009 9 1  5 2010 2009 9 1  5 2011 2009 1 1  5 2012 2009 7 1  6 2005 2006 4 1  6 2006 2006 9 1  6 2007 2006 1 1  6 2008 2006 3 1  6 2009 2006 1 1  6 2010 2006 7 1  6 2011 2006 7 1  6 2012 2006 4 1  7 2005 2008 3 1  7 2006 2008 4 1  7 2007 2008 6 1  7 2008 2008 7 1  7 2009 2008 4 1  7 2010 2008 2 1  7 2011 2008 5 1  7 2012 2008 7 1  8 2005 2007 3 1  8 2006 2007 5 1  8 2007 2007 2 1  8 2008 2007 1 1  8 2009 2007 1 1  8 2010 2007 4 1  8 2011 2007 5 1  8 2012 2007 6 1  9 2005 2011 7 0  9 2006 2011 3 0  9 2007 2011 8 0  9 2008 2011 4 0  9 2009 2011 7 0  9 2010 2011 8 0  9 2011 2011 3 0  9 2012 2011 2 0 10 2005 2010 5 1 10 2006 2010 8 1 10 2007 2010 3 1 10 2008 2010 3 1 10 2009 2010 9 1 10 2010 2010 9 1 10 2011 2010 1 1 10 2012 2010 7 1 end
Any advice appreciated

Interaction effect

$
0
0
I am curious about the difference between interaction (X##M) and multiplying terms (X + M + X_M, X_M is the multiplying term X*M). Sometimes, I found that the two methods produced different results.
I am not sure about whether there are differences. Or, it depends on X and M. If either X or M is a continuous variable, multiplying method (X_M) is right?

command and interpretation for categorical dummy interaction

$
0
0
Dear Statalist,

I am using ordered logistic regression, depvar: 5-scale happiness level (women only), indepvars: employment status (employed, unemployed, housewife, other employment status) and educational level (low edu, mid edu, high edu) as well as their interactions. Is the below command correct? I obviously leave a reference category out for employment status and edu level as indep variables (ref categories: employed and high edu), should I do it for the interactions as well (employed and highedu not included in any interaction terms)?

ologit happiness unemployed housewife otherstatus lowedu midedu unemployed#lowedu unemployed#midedu housewife#lowedu housewife#midedu other#lowedu other#midedu if gender==2

As per interpretation, which one (or if neither what) is correct?

E.g regarding the coefficient for unemployed#lowedu interaction:

a) it is the impact of being unemployed AND low educated compared to all other employment statuses and educational levels on happiness level
b) if it is correct to keep employed and highedu out of the interaction terms as well, should I say being unemployed rather than employed and being low educated rather than highly educated is ...

Do you think I should follow up with "margins unemployed#lowedu unemployed#midedu housewife#lowedu housewife#midedu other#lowedu other#midedu" as well?

Thank you SO much in advance!

A.

Re coding many variable simultaneously

$
0
0
Hiii All,
I am using a datasheet which have many categorical variables like v1, v2 , v3, v4, v5 .............v60 with values (0 , 1 , 8, 9) only and also have a variable named "stage of disease" (having 3 categories: mild, moderate and severe) ....
Here i want to replace all the values from v1 to v20 =9 (here 9 means not applicable) if "stage of disease" is mild.
It would me most helpful if anyone help me doing so.... as it is so time taking to type replace function too many time.

thanks in advance...

Coefplot: arrow on confidence intervals

$
0
0
Hi all,

I am performing a logistic regression analysis using the xtlogit command. I am trying to plot the results using coefplot on Stata 15.0, comparing the results of two different models, for two different age groups. The problem is some of the confidence intervals are too large, making it difficult to compare all the results, so I am trying to reduce the x axis scale and trim the confidence intervals, in the same way as can easily be achieved in forestplot using “force”: forestplot (xlab (0 5 10, force). I would also like the large CI to have an arrow at the end indicating they are extended beyond the x axis scale.

Code:
coefplot A, bylabel(18-49 years) || B, bylabel(50-64 years) ||, eform

Array


I have tried various combinations of pstyle and ciopts, and none worked, for example:

Code:
coefplot A, bylabel(18-49 years) || B, bylabel(50-64 years) || ///
    (., pstyle(p1) if(@ll>0|@ul<8) ciopts(recast(rcap) lcolor(gs1))) (., pstyle(p1) if(@ul>=8)  ciopts(recast(pcarrow) lcolor(gs1))) ///
    (., pstyle(p1) if(@ul>=8) ciopts(recast(pcbarrow) lcolor(gs1))), ///
    eform base drop(_cons) byopts(compact rows(1)) ///
        nooffsets transform(*= min(max(@,0),8))
Array

This works trimming my x axis and CI, but
- fails to put an arrow at the end of them, and
- creates a new third column that I do not know what it is.

Any help would be appreciated! Thanking you in advance!

Keeping Observations According to Age

$
0
0
Hi,

In my data, there is an age variable. I want to keep observations having age between 20-70. How should I delete the other observations?

I have tried the keep command but it did not work for me.

Duration of non-employment spell

$
0
0
Hi All,

I have to compute the duration of a non-employment spell. Currently, I have identified the non-employment spells (which are defined as spells of unemployment and non-participation). However, when computing the duration, I am unable to get a cumulative sum for the number of months of non-participation.

Below is an example of my dataset:

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input str16 idww int year byte(month begin) float(spell spellemp spellue spellnp duration durationemp durationue durationnp spellnonemp durationnonemp)
"0190037546300201" 1996  8 0 0 . . 0 25  . . 25 0 25
"0190037546300201" 1996  9 0 0 . . 0 25  . . 25 0 25
"0190037546300201" 1996 10 0 0 . . 0 25  . . 25 0 25
"0190037546300201" 1996 11 0 0 . . 0 25  . . 25 0 25
"0190037546300201" 1996 12 0 0 . . 0 25  . . 25 0 25
"0190037546300201" 1997  1 0 0 . . 0 25  . . 25 0 25
"0190037546300201" 1997  2 0 0 . . 0 25  . . 25 0 25
"0190037546300201" 1997  3 0 0 . . 0 25  . . 25 0 25
"0190037546300201" 1997  4 0 0 . . 0 25  . . 25 0 25
"0190037546300201" 1997  5 0 0 . . 0 25  . . 25 0 25
"0190037546300201" 1997  6 0 0 . . 0 25  . . 25 0 25
"0190037546300201" 1997  7 0 0 . . 0 25  . . 25 0 25
"0190037546300201" 1997  8 0 0 . . 0 25  . . 25 0 25
"0190037546300201" 1997  9 0 0 . . 0 25  . . 25 0 25
"0190037546300201" 1997 10 0 0 . . 0 25  . . 25 0 25
"0190037546300201" 1997 11 0 0 . . 0 25  . . 25 0 25
"0190037546300201" 1997 12 0 0 . . 0 25  . . 25 0 25
"0190037546300201" 1998  1 0 0 . . 0 25  . . 25 0 25
"0190037546300201" 1998  2 0 0 . . 0 25  . . 25 0 25
"0190037546300201" 1998  3 0 0 . . 0 25  . . 25 0 25
"0190037546300201" 1998  4 0 0 . . 0 25  . . 25 0 25
"0190037546300201" 1998  5 0 0 . . 0 25  . . 25 0 25
"0190037546300201" 1998  6 0 0 . . 0 25  . . 25 0 25
"0190037546300201" 1998  7 0 0 . . 0 25  . . 25 0 25
"0190037546300201" 1998  8 0 0 . . 0 25  . . 25 0 25
"0190037546300201" 1998  9 1 1 1 . . 16 16 .  . .  .
"0190037546300201" 1998 10 0 1 1 . . 16 16 .  . .  .
"0190037546300201" 1998 11 0 1 1 . . 16 16 .  . .  .
"0190037546300201" 1998 12 0 1 1 . . 16 16 .  . .  .
"0190037546300201" 1999  1 0 1 1 . . 16 16 .  . .  .
"0190037546300201" 1999  2 0 1 1 . . 16 16 .  . .  .
"0190037546300201" 1999  3 0 1 1 . . 16 16 .  . .  .
"0190037546300201" 1999  4 0 1 1 . . 16 16 .  . .  .
"0190037546300201" 1999  5 0 1 1 . . 16 16 .  . .  .
"0190037546300201" 1999  6 0 1 1 . . 16 16 .  . .  .
"0190037546300201" 1999  7 0 1 1 . . 16 16 .  . .  .
"0190037546300201" 1999  8 0 1 1 . . 16 16 .  . .  .
"0190037546300201" 1999  9 0 1 1 . . 16 16 .  . .  .
"0190037546300201" 1999 10 0 1 1 . . 16 16 .  . .  .
"0190037546300201" 1999 11 0 1 1 . . 16 16 .  . .  .
"0190037546300201" 1999 12 0 1 1 . . 16 16 .  . .  .
"0190037546300203" 1996  8 0 0 0 . . 15 15 .  . .  .
"0190037546300203" 1996  9 0 0 0 . . 15 15 .  . .  .
"0190037546300203" 1996 10 0 0 0 . . 15 15 .  . .  .
"0190037546300203" 1996 11 0 0 0 . . 15 15 .  . .  .
"0190037546300203" 1996 12 0 0 0 . . 15 15 .  . .  .
"0190037546300203" 1997  3 0 0 0 . . 15 15 .  . .  .
"0190037546300203" 1997  4 0 0 0 . . 15 15 .  . .  .
"0190037546300203" 1997  5 0 0 0 . . 15 15 .  . .  .
"0190037546300203" 1997  6 0 0 0 . . 15 15 .  . .  .
"0190037546300203" 1997  7 0 0 0 . . 15 15 .  . .  .
"0190037546300203" 1997  8 0 0 0 . . 15 15 .  . .  .
"0190037546300203" 1997  9 0 0 0 . . 15 15 .  . .  .
"0190037546300203" 1997 10 0 0 0 . . 15 15 .  . .  .
"0190037546300203" 1997 11 0 0 0 . . 15 15 .  . .  .
"0190037546300203" 1997 12 0 0 0 . . 15 15 .  . .  .
"0190037546300203" 1998  1 1 1 . . 1  4  . .  4 1  4
"0190037546300203" 1998  2 0 1 . . 1  4  . .  4 1  4
"0190037546300203" 1998  3 0 1 . . 1  4  . .  4 1  4
"0190037546300203" 1998  4 0 1 . . 1  4  . .  4 1  4
"0190037546300203" 1998  5 1 2 . 1 .  1  . 1  . 2  1
"0190037546300203" 1998  6 1 3 . . 2 19  . . 19 3 19
"0190037546300203" 1998  7 0 3 . . 2 19  . . 19 3 19
"0190037546300203" 1998  8 0 3 . . 2 19  . . 19 3 19
"0190037546300203" 1998  9 0 3 . . 2 19  . . 19 3 19
"0190037546300203" 1998 10 0 3 . . 2 19  . . 19 3 19
"0190037546300203" 1998 11 0 3 . . 2 19  . . 19 3 19
"0190037546300203" 1998 12 0 3 . . 2 19  . . 19 3 19
"0190037546300203" 1999  1 0 3 . . 2 19  . . 19 3 19
"0190037546300203" 1999  2 0 3 . . 2 19  . . 19 3 19
"0190037546300203" 1999  3 0 3 . . 2 19  . . 19 3 19
"0190037546300203" 1999  4 0 3 . . 2 19  . . 19 3 19
"0190037546300203" 1999  5 0 3 . . 2 19  . . 19 3 19
"0190037546300203" 1999  6 0 3 . . 2 19  . . 19 3 19
"0190037546300203" 1999  7 0 3 . . 2 19  . . 19 3 19
"0190037546300203" 1999  8 0 3 . . 2 19  . . 19 3 19
"0190037546300203" 1999  9 0 3 . . 2 19  . . 19 3 19
"0190037546300203" 1999 10 0 3 . . 2 19  . . 19 3 19
"0190037546300203" 1999 11 0 3 . . 2 19  . . 19 3 19
"0190037546300203" 1999 12 0 3 . . 2 19  . . 19 3 19
"0190037546300204" 1996  8 0 0 . 0 .  2  . 2  . 0  2
"0190037546300204" 1996  9 0 0 . 0 .  2  . 2  . 0  2
"0190037546300204" 1996 10 1 1 1 . . 39 39 .  . .  .
"0190037546300204" 1996 11 0 1 1 . . 39 39 .  . .  .
"0190037546300204" 1996 12 0 1 1 . . 39 39 .  . .  .
"0190037546300204" 1997  1 0 1 1 . . 39 39 .  . .  .
"0190037546300204" 1997  2 0 1 1 . . 39 39 .  . .  .
"0190037546300204" 1997  3 0 1 1 . . 39 39 .  . .  .
"0190037546300204" 1997  4 0 1 1 . . 39 39 .  . .  .
"0190037546300204" 1997  5 0 1 1 . . 39 39 .  . .  .
"0190037546300204" 1997  6 0 1 1 . . 39 39 .  . .  .
"0190037546300204" 1997  7 0 1 1 . . 39 39 .  . .  .
"0190037546300204" 1997  8 0 1 1 . . 39 39 .  . .  .
"0190037546300204" 1997  9 0 1 1 . . 39 39 .  . .  .
"0190037546300204" 1997 10 0 1 1 . . 39 39 .  . .  .
"0190037546300204" 1997 11 0 1 1 . . 39 39 .  . .  .
"0190037546300204" 1997 12 0 1 1 . . 39 39 .  . .  .
"0190037546300204" 1998  1 0 1 1 . . 39 39 .  . .  .
"0190037546300204" 1998  2 0 1 1 . . 39 39 .  . .  .
"0190037546300204" 1998  3 0 1 1 . . 39 39 .  . .  .
end
label values month rhcalmn
label def rhcalmn 1 "January", modify
label def rhcalmn 2 "February", modify
label def rhcalmn 3 "March", modify
label def rhcalmn 4 "April", modify
label def rhcalmn 5 "May", modify
label def rhcalmn 6 "June", modify
label def rhcalmn 7 "July", modify
label def rhcalmn 8 "August", modify
label def rhcalmn 9 "September", modify
label def rhcalmn 10 "October", modify
label def rhcalmn 11 "November", modify
label def rhcalmn 12 "December", modify
The variables were generated according to the following code

Code:
*Compute a variable to compute the beginning of a new spell
by ssuid eentaid epppnum: gen byte begin = jobesr != jobesr[_n-1]
*Removing those who enter the sample with a certain spell.
by ssuid eentaid epppnum: replace begin=0 if _n==1 

*Identify distinct spells 
by ssuid eentaid epppnum: gen spell= sum(begin)
by ssuid eentaid epppnum: gen spellemp = sum(begin) if jobesr==1
by ssuid eentaid epppnum: gen spellue = sum(begin) if jobesr==2
by ssuid eentaid epppnum: gen spellnp = sum(begin) if jobesr==3

*Non-employment spell
by ssuid eentaid epppnum: gen spellnonemp = sum(begin) if jobesr==2 | jobesr==3

*Generate the duration of each spell
sort ssuid eentaid epppnum spell year month
by ssuid eentaid epppnum spell: gen duration = _N
by ssuid eentaid epppnum spell: gen durationemp = duration if jobesr==1
by ssuid eentaid epppnum spell: gen durationue = duration if jobesr==2
by ssuid eentaid epppnum spell: gen durationnp = duration if jobesr==3

*Non-employment spell
by ssuid eentaid epppnum spell: gen durationnonemp = duration if jobesr==2 | jobesr==3
Here the variable JOBESR is the variable which describes whether you are employed (jobesr==1), unemployed(jobesr==2) or not participating (jobesr==3), in the labour market. And the variables SSUID EENTAID and EPPPNUM are used to identify the individual in the dataset. The variable BEGIN computes any change in the labour market status. SPELL computes the number of spells of each type for labour market status and DURATION computes the duration of each labour market spell.

The variable DURATIONNONEMP currently computes the duration of each spell of non-employment (unemployment and non-participation) spell. However, I would like that the variable gives me the answer 26 for individual ID "0190037546300203" instead of 4, 1, 19. Since that individual has consecutive unemployment and non-participation spells for which I require the duration.

Any help on this would be great.

Thank you very much.

Mridula

Importing an excel file

$
0
0
I have an excel file that I need to import with correct variable names.
Data in the excel sheet starts with the third row, so I run the following command


import excel "$raw_data\2018.xlsx", sheet("NC") cellrange(A3:G7) firstrow allstring


When I browse the data that Stata has imported I see that it didn't return me some variable name, instead it ignored but kept it as a variable labels.
My excel file looks like this:
A B C D E F G
1
2
3 Cliente Numero ID 2018/01 2018/02 2018/03 2018/04 2018/05
4 Abel 000000000001 156546 0 654 654 74564
5 Eriberto 000000000002 564 56146 654 654 5648
6 Herrera 000000000003 5156 564 86 48625 5646
7 Adelayda 000000000004 4 564 6564 4686 7423
So basically in my data, Stata recognized variable names "Client" and "Numero ID", but ignored variable names "2018/01", "2018/02", ""2018/03", "2018/04", "2018/05".
I can change the variable names with separate commands but I wonder why it doesn't recognize these other variable names and how I can correct it.
Thank you in advance for your advice.

Calculate number of companies in the panel data with at least one turnover

$
0
0
Hello STATA community,

I have the following panel dataset:
NAME YEAR TURN
Apple 2010 0
Apple 2011 1
Google 2010 0
Google 2011 0
Google 2012 1
IBM 2010 0
IBM 2011 0
Turnover is a dummy variable which is equal to 1 if there was a managment turnover in this particular year.
I want to get the number of companies which have at least one turnover in the data.
I was looking around the forums and playing with "count if" command, but didn't get the result I need.

Thank you.

Code folding hotkeys in code editor (doeditor)

$
0
0
Dear All,

Before suggesting a feature for Stata 17, I wanted to double check that it doesn't exist yet.

So are there any code-folding shortcuts (keyboard hotkeys) in Stata's do-editor?
Equivalent of https://docs.microsoft.com/en-us/vis...g?view=vs-2019

I commonly need to switch between the top-level outline (e.g. all programs defined in a particular file) and their implementation.
Each of them can be individually folded with a mouse-based action, so it's only a matter of accessing this via a shortcut.

Thank you, Sergiy



Discrete variables - Mean comparison and model

$
0
0
Hi.
Within the explanatory variables of a multiple regression model I am working on, there are two variables:

1. Number of chronic illnesses - this variable was created from information on the number of illnesses a person reported for other questions (eg, do you have hypertension?, do you have asthma? If they answered "yes" to these two questions, 2 was added to the variable "number of chronic diseases" for that person, etc.)

Code:
 tab cd
   cd |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |        807       33.05       33.05
          1 |        761       31.16       64.21
          2 |        505       20.68       84.89
          3 |        225        9.21       94.10
          4 |        106        4.34       98.44
          5 |         38        1.56      100.00
------------+-----------------------------------
      Total |      2,442      100.00
2. Number of medications. This variable was collected by asking the number of medications that he usually used (how many pills, creams, or other medications do you currently use?).

Code:
tab med

med |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |        297       12.16       12.16
          1 |        366       14.99       27.15
          2 |        409       16.75       43.90
          3 |        399       16.34       60.24
          4 |        326       13.35       73.59
          5 |        213        8.72       82.31
          6 |        144        5.90       88.21
          7 |         94        3.85       92.06
          8 |         73        2.99       95.05
          9 |         57        2.33       97.38
         10 |         47        1.92       99.30
         11 |          9        0.37       99.67
         12 |          4        0.16       99.84
         13 |          2        0.08       99.92
         14 |          1        0.04       99.96
         18 |          1        0.04      100.00
------------+-----------------------------------
      Total |      2,442      100.00
My questions are:

1. Could these variables (which appear to be discrete variables to me, is that so?; or are they categorical variables instead?) be treated as continuous in the model.

2. What would be better: mean (SD) or median (IQR) for these variables?

3. If I wanted to compare means with a variable expressed in tertiles, could I use Kruskal-Wallis test?

Thank you in advance.

Multilevel binary logistic Regression

$
0
0
I wish to run multilevel logistic regression with my dependent variable being suffering from hypertension (0=no, 1=yes) for the population residing in different states. I run the following command on stata 14.
melogit hyper || shdistri:

However, I got the following error message
Unknown #command
unexpected end of file
(error occurred while loading ssd.ado)

I have even increase the memory is allocated to ado files type. Still it is showing the same error. Please let me know how to deal it.

fcolor questions for spmap / maptile

$
0
0
Hi every one! :D I have a question regarding the color scheme of spmap / maptile. I want to ask if is there a color scheme that is green for positive values and red for negative valeus, while the color intensity varies with the magnitude of the values. The data I have is as follows, which ranges between -1 and 0.1 and biased towards the negative values:

pref x1
1 -.4469565
2 -.24058
3 -.0500387
4 -.9554904
5 -.7973559
6 -.5381312
7 -.2876952
8 .0013459
9 -.2509857
10 -.057429
11 -.929378
12 -.0770269
13 -.3170978
14 -.5554068
15 .0765003
16 -.6009551
17 -.6547931
18 -.2981216
19 -.7943069
20 -.4441176
21 -.4196064
22 -.1361244
23 -.4810928
24 -.7470174
25 -.122549
26 -.8185665
27 .0262395
28 -.5600753
29 .0870186
30 .0216642
31 -.2695584
32 -.9595158
33 -.6329852
34 -.1393275
35 -.9812946
36 -.7493975
37 -.3639289
38 -.1713046
39 -.0572921
40 .0255021
41 -.6431083
42 -.8198518
43 .0540211
44 -.3390531
45 .0706824
46 -.739979
47 -.2558736

The map I created from maptile is as follows (it is a map for Japan):
Array


The codes to produce the graph is as follows:

use temp1.dta, clear
keep pref
duplicates drop
* my data may be biased to -1, but there are some positive values, and I want to lable those values with green but those negative values with red
gen x1 = runiform(-1,0.1)
maptile x1 , geography(jpn_pref) compressed simple fcolor(RdYlGn)
graph export ./figure/trial.png, as(png) name("Graph") replace
However, what I want is that the positive values are painted in green, while the negative values are painted in red, and the color intensity for red color changes with teh magnitude of values. I learned that there is the cutvalue command that I can use to set the cutoff mannually. However, I am using the loop to create the graphs, so I want to ask if there is an automical way to assign the colors? Thank you very much in advance!

Stata Interface Preference

$
0
0
Hello everyone! :D I have a question regarding the user interface question. I am used to setting up the do file, main window, and data editor in the same screen in the following arrangement:

Array

I want to ask if there is a way to fix this arrangement every time I click on Stata, so I don't need to manually adjust the position of the do file, data editor, and the main window?

Moreover, I want to ask if there is a way to create a new do file through right-click? I try this method but it seems to fail: https://stackoverflow.com/questions/...ht-click-optio

Dropping a sequence of variables, ignore if not present

$
0
0
Hi All,

Is there an option in Stata, wherein one can drop multiple variables, but where the execution is not interrupted if one of the variables has already been dropped? Perhaps like a
Code:
force
or
Code:
ignore
type option? It seems like a useful feature to have.

Best,
CS
Viewing all 65631 articles
Browse latest View live


Latest Images

<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>