ivreg, ivprobit and biprobit which one to use? (any theoretical reasoning?)

February 28, 2019, 8:34 pm

As I do not have any econometric background I find it difficult to understand which one to use when outcomes differ a lot. My dependent (female labor-force participation status), independent (has more than three children) and instrumental (combination of gender of first two children) variables are all binary. OLS, logit, and probit models all have shown similar results (as mentioned in "Mostly harmless econometric" when it comes to average marginal effect there is no difference between these models). But when I run ivreg (2SLS), ivprobit, and biprobit results are totally different. Previous literature just simply use OLS and 2SLS, but I can not find convincing solid reason behind it. (I checked "Mostly harmless econometrics" by Angrist and "Econometric analysis of cross section and panel data" 2nd edition by Wooldridge where explanations were quite vague) What are the trade-off between these models, and how to interpret difference in estimations and how to choose right one?

↧

Betareg

February 28, 2019, 9:03 pm

≫ Next: parsing with variant length strings; overcoming ustrregexra greediness

≪ Previous: ivreg, ivprobit and biprobit which one to use? (any theoretical reasoning?)

Hi all,

I have a propotion dependent variable that’s greater than zero and less then one. The mean is 0.1240891 and SD 0.1363 and its positivly skewed. All the independent variables are dummy variables. I have read that betareg is most appropriate model for propotion data.
Want to check did I use the right model and also do I need to check any assumptions before carrying betareg ?

Betareg dep i.var1 i.var2 i.var3 i.var4
margins dep

After running estat ic to check the model fit AIC and BIC are approx -4400

Thank you in advance.

↧

parsing with variant length strings; overcoming ustrregexra greediness

March 1, 2019, 9:09 am

≫ Next: LSDV, problem with dummy variables and significance

≪ Previous: Betareg

I'm currently trying to parse text from string variables like this:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str1964 FieldLabel
`"<div style="background:tomato; padding:5px; margin:5px;"><font color="black"> <font size="5">Verify alpha codes are equal, values are not the same </font></strong></div>"'
`"<div style="background:lightgrey; padding:5px; margin:5px;"><font color="black"><font size="5">Please answer the following questions. </font>"'                                   
end

On a former post of mine for a similar issue, William Lisowski recommended using -ustrregexra- before -split- to parse using length-variant substrings that were book-ended by like-symbols. The only problem using that strategy here is that -ustrregextra- is "greedy", meaning that if I want to use "<" and ">" to create new symbols to parse with, the entire string will be replaced. To illustrate,

using:

Code:

g newtext = ustrregexra(FieldLabel,"<.*>","!!split!!")

I want:

Code:

!!split!!!!split!!!!split!!Please verify PID codes are equal, values are not the same !!split!!!!split!!!!split!!

but I get:

Code:

!!split!!

Can anyone recommend a solution to the greedy problem, or perhaps another strategy entirely?

Thank you!

-Reese

v.14.2

↧

LSDV, problem with dummy variables and significance

March 1, 2019, 10:03 am

≫ Next: Storing results with a large loop (>11000)

≪ Previous: parsing with variant length strings; overcoming ustrregexra greediness

Hello,

my data: panel data, 260 observations; 20 Regions and time period: 2004 to 2016.

I have found endogeneity between corruption and GDP growth and I have chosen FE over RE because of the results of the Hausman test. Also, I chose LSDV over FE-2SLS because I would like to see the dummy coefficients.

Hence, I am running a LSDV model where I want to see the effect that the Corruption level (coded as Cor) of each Region(coded as countrynum) has on Y(which is GDP growth rate).

I do not understand why the model is significant with i.countrynum, but it becomes insignificant when I include i.Year. Also, when using i.countrynum, why would my variable of corruption become insignificant when I use log(corruption) instead of Corruption?

Also, I have run testparm on i.countrynum and i.year and the p values are close to 0.

Code:

 . reg  Y I logYlevel_1 n H Cor i.countrynum, robust

Linear regression                               Number of obs     =        240
                                                F(24, 215)        =       4.04
                                                Prob > F          =     0.0000
                                                R-squared         =     0.2233
                                                Root MSE          =     .02424

------------------------------------------------------------------------------
             |               Robust
           Y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           I |   .0421328   .0235268     1.79   0.075      -.00424    .0885055
 logYlevel_1 |  -.2725869   .0477639    -5.71   0.000    -.3667323   -.1784415
           n |  -.0011075   .0002068    -5.36   0.000    -.0015151      -.0007
           H |  -.5249337   .1279176    -4.10   0.000     -.777067   -.2728005
         Cor |   152.1827     43.705     3.48   0.001     66.03756    238.3279
             |
  countrynum |
          2  |  -.0568746    .018295    -3.11   0.002     -.092935   -.0208142
          3  |  -.1177966   .0196517    -5.99   0.000    -.1565313    -.079062
          4  |   -.132077   .0269762    -4.90   0.000    -.1852487   -.0789053
          5  |   .0762482   .0194006     3.93   0.000     .0380084    .1144879
          6  |   .0528407   .0149889     3.53   0.001     .0232966    .0823847
          7  |   .0857303   .0238629     3.59   0.000     .0386951    .1327655
          8  |   .0685547   .0154954     4.42   0.000     .0380124     .099097
          9  |   .0485808   .0342438     1.42   0.157    -.0189157    .1160773
         10  |   .0277995     .01135     2.45   0.015     .0054279    .0501711
         11  |  -.0446546   .0128474    -3.48   0.001    -.0699775   -.0193316
         12  |   .0164898   .0176551     0.93   0.351    -.0183094     .051289
         13  |   -.129616   .0228253    -5.68   0.000    -.1746061    -.084626
         14  |  -.0697799   .0127563    -5.47   0.000    -.0949234   -.0446365
         15  |  -.1306724   .0235665    -5.54   0.000    -.1771235   -.0842214
         16  |   .0428457   .0137236     3.12   0.002     .0157956    .0698958
         17  |   .1167549    .021726     5.37   0.000     .0739316    .1595782
         18  |   .0173388   .0115789     1.50   0.136    -.0054839    .0401616
         19  |   .0970503   .0240081     4.04   0.000     .0497288    .1443717
         20  |   .0306902   .0163222     1.88   0.061    -.0014819    .0628623
             |
       _cons |   2.812614   .4884072     5.76   0.000     1.849934    3.775293
------------------------------------------------------------------------------

Code:

. reg  Y I logYlevel_1 n H logCor i.countrynum,robust

Linear regression                               Number of obs     =        225
                                                F(24, 200)        =       4.83
                                                Prob > F          =     0.0000
                                                R-squared         =     0.2391
                                                Root MSE          =      .0232

------------------------------------------------------------------------------
             |               Robust
           Y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           I |   .0438102    .023208     1.89   0.061    -.0019535    .0895738
 logYlevel_1 |  -.2952572   .0474944    -6.22   0.000    -.3889112   -.2016031
           n |  -.0010141   .0001769    -5.73   0.000     -.001363   -.0006652
           H |  -.5967442   .1222761    -4.88   0.000      -.83786   -.3556284
      logCor |  -.0001871    .002452    -0.08   0.939    -.0050222     .004648
             |
  countrynum |
          2  |  -.0776557   .0163377    -4.75   0.000     -.109872   -.0454395
          3  |  -.1267072   .0195275    -6.49   0.000    -.1652135   -.0882009
          4  |  -.1387356   .0270324    -5.13   0.000    -.1920407   -.0854305
          5  |   .0833008   .0189025     4.41   0.000      .046027    .1205746
          6  |   .0572389   .0149254     3.83   0.000     .0278075    .0866703
          7  |   .0958031   .0238787     4.01   0.000     .0487167    .1428895
          8  |   .0765805   .0158141     4.84   0.000     .0453967    .1077642
          9  |   .0565331   .0340127     1.66   0.098    -.0105365    .1236026
         10  |    .030444    .011336     2.69   0.008     .0080905    .0527975
         11  |  -.0304818   .0129706    -2.35   0.020    -.0560585   -.0049051
         12  |   .0196087   .0175451     1.12   0.265    -.0149885    .0542058
         13  |   -.139783   .0227806    -6.14   0.000    -.1847039   -.0948621
         14  |  -.0761153   .0127299    -5.98   0.000    -.1012175   -.0510132
         15  |  -.1402884   .0235798    -5.95   0.000    -.1867854   -.0937915
         16  |   .0473936   .0139128     3.41   0.001      .019959    .0748281
         17  |   .1254732    .021914     5.73   0.000      .082261    .1686854
         18  |   .0189846   .0122194     1.55   0.122    -.0051108    .0430799
         19  |   .1095675   .0269841     4.06   0.000     .0563576    .1627774
         20  |   .0338714   .0162419     2.09   0.038     .0018441    .0658987
             |
       _cons |   3.049778   .4834469     6.31   0.000     2.096471    4.003085
------------------------------------------------------------------------------

Code:

. reg  Y I logYlevel_1 n H Cor i.countrynum i.Year, robust

Linear regression                               Number of obs     =        240
                                                F(35, 204)        =      19.07
                                                Prob > F          =     0.0000
                                                R-squared         =     0.7447
                                                Root MSE          =     .01427

------------------------------------------------------------------------------
             |               Robust
           Y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           I |  -.0017374   .0152018    -0.11   0.909    -.0317101    .0282353
 logYlevel_1 |  -.1344197    .043968    -3.06   0.003    -.2211097   -.0477297
           n |  -.0000933   .0003057    -0.31   0.761    -.0006961    .0005095
           H |   .1089849   .1461164     0.75   0.457     -.179107    .3970768
         Cor |   42.22968   27.68753     1.53   0.129    -12.36074    96.82009
             |
  countrynum |
          2  |  -.0151863   .0160518    -0.95   0.345     -.046835    .0164625
          3  |  -.0482293   .0157838    -3.06   0.003    -.0793496    -.017109
          4  |  -.0381451   .0174077    -2.19   0.030    -.0724672    -.003823
          5  |   .0448873   .0187381     2.40   0.018     .0079421    .0818325
          6  |   .0300398   .0107687     2.79   0.006     .0088075    .0512721
          7  |   .0336804   .0214411     1.57   0.118    -.0085941     .075955
          8  |   .0259668    .011634     2.23   0.027     .0030284    .0489052
          9  |   .0567836   .0316731     1.79   0.074     -.005665    .1192323
         10  |   .0101893   .0064318     1.58   0.115    -.0024919    .0228706
         11  |  -.0263025   .0101509    -2.59   0.010    -.0463165   -.0062884
         12  |   .0268867   .0158145     1.70   0.091    -.0042941    .0580675
         13  |  -.0394516   .0159554    -2.47   0.014    -.0709102    -.007993
         14  |  -.0212305   .0097003    -2.19   0.030    -.0403562   -.0021048
         15  |  -.0406477   .0160236    -2.54   0.012    -.0722408   -.0090546
         16  |    .026545   .0117709     2.26   0.025     .0033367    .0497533
         17  |   .0621137   .0196756     3.16   0.002       .02332    .1009073
         18  |  -.0060011   .0064424    -0.93   0.353    -.0187033     .006701
         19  |    .049247   .0188249     2.62   0.010     .0121307    .0863634
         20  |   .0361759   .0169272     2.14   0.034     .0028012    .0695507
             |
        Year |
       2006  |   .0145628   .0034744     4.19   0.000     .0077125    .0214131
       2007  |   .0062804   .0040101     1.57   0.119    -.0016263     .014187
       2008  |  -.0235486    .005161    -4.56   0.000    -.0337243    -.013373
       2009  |  -.0631985   .0054573   -11.58   0.000    -.0739584   -.0524386
       2010  |  -.0056302   .0071231    -0.79   0.430    -.0196746    .0084142
       2011  |  -.0116207   .0053297    -2.18   0.030     -.022129   -.0011123
       2012  |  -.0420052    .006943    -6.05   0.000    -.0556944   -.0283159
       2013  |  -.0412766   .0104555    -3.95   0.000    -.0618912   -.0206619
       2014  |  -.0257527   .0091024    -2.83   0.005    -.0436995   -.0078059
       2015  |  -.0074938   .0120268    -0.62   0.534    -.0312067    .0162191
       2016  |  -.0120389   .0093528    -1.29   0.199    -.0304795    .0064017
             |
       _cons |   1.349846   .4464339     3.02   0.003     .4696303    2.230063
------------------------------------------------------------------------------

Code:

 reg  Y I logYlevel_1 n H Cor  i.Year, robust

Linear regression                               Number of obs     =        240
                                                F(16, 223)        =      36.46
                                                Prob > F          =     0.0000
                                                R-squared         =     0.7030
                                                Root MSE          =     .01472

------------------------------------------------------------------------------
             |               Robust
           Y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           I |   .0002541   .0022204     0.11   0.909    -.0041216    .0046298
 logYlevel_1 |   .0047408   .0051855     0.91   0.362    -.0054781    .0149596
           n |  -.0002482   .0002666    -0.93   0.353    -.0007736    .0002772
           H |  -.0516428   .0513114    -1.01   0.315    -.1527602    .0494745
         Cor |   .9710216   20.83673     0.05   0.963    -40.09107    42.03312
             |
        Year |
       2006  |   .0150445   .0029372     5.12   0.000     .0092562    .0208328
       2007  |   .0066831   .0030496     2.19   0.029     .0006734    .0126929
       2008  |  -.0241049   .0040395    -5.97   0.000    -.0320654   -.0161444
       2009  |  -.0608203   .0044387   -13.70   0.000    -.0695674   -.0520731
       2010  |   .0056591   .0052307     1.08   0.280    -.0046489    .0159672
       2011  |  -.0011653   .0038145    -0.31   0.760    -.0086823    .0063518
       2012  |  -.0292677   .0043704    -6.70   0.000    -.0378802   -.0206552
       2013  |  -.0219385   .0081194    -2.70   0.007     -.037939    -.005938
       2014  |  -.0047909   .0041872    -1.14   0.254    -.0130425    .0034608
       2015  |   .0144844   .0063197     2.29   0.023     .0020304    .0269385
       2016  |   .0083592   .0044414     1.88   0.061    -.0003933    .0171117
             |
       _cons |  -.0385648   .0492351    -0.78   0.434    -.1355904    .0584608
------------------------------------------------------------------------------

↧

Storing results with a large loop (>11000)

March 1, 2019, 10:45 am

≫ Next: Collapse (mean) and data ordering

≪ Previous: LSDV, problem with dummy variables and significance

I am trying to store results across a large loop. I have previously done as follows but am limited to the matrix size (11000).

matrix z=J[1,6,.]
matrix Catch=J[`n',1]

forval i =1/`n' {
quietly reg y x`i'
quietly estat ic
matrix z=r(S)
matrix Catch(`i',1)=z[1,5]
}

Where `n' is a number significantly greater than >11000.

Is there a way to store these values elsewhere or output them quickly? I know I can use the "putexcel" command but it is slow having to call the command in each loop. Are there other options?

↧

Collapse (mean) and data ordering

March 1, 2019, 10:46 am

≫ Next: How to avoid losing variable labels when using collapse command

≪ Previous: Storing results with a large loop (>11000)

Hi statalist,

I'm working with a data set that resembles the following:

ID Income
1 50
1 40
1 20
2 10
2 40
2 50
3 60
3 20
3 10

I used collapse (mean) Income, by (ID)

now after collapsing the data is appearing in the following form
ID Income
2 Mean(2)
3 Mean(3)
1 Mean (1)

I need the output in the same order as before, i.e, in the form of
ID Income
1 Mean(1)
2 Mean(2)
3 Mean(3)
what should I do to obtain the means in the same order as the original data?

Thanks.

↧

How to avoid losing variable labels when using collapse command

March 1, 2019, 10:59 am

≫ Next: Drawing a line through the outer XY combinations below the trend line

≪ Previous: Collapse (mean) and data ordering

I have smallholder agriculture plot level commercialization data which i want to collapse to household level data. To avoid losing variable labels i resorted to the foreach loop method below by Cox;

Copy variable labels before collapse . foreach v of var * { . local l`v' : variable label `v' . if `"`l`v''"' == "" { . local l`v' "`v'" . } . } Attach the saved labels after collapse . foreach v of var * { . label var `v' "`l`v''" . }
Whenever i try running this codes, i am getting the error message:

"foreach command may not result from a macro expansion interactively or in do files"

What could be the reason? Thank you in advance

Takesure Tozooneyi

↧

Drawing a line through the outer XY combinations below the trend line

March 1, 2019, 11:10 am

≫ Next: Identifying recession points

≪ Previous: How to avoid losing variable labels when using collapse command

The eventual goal is to assign minimum value of actl for a new exp value (where actl is missing). I think I should first come over the issue of drawing a line passing through these observations to use for a decision rule, but not really sure how to attack this problem.
Thank you in advance

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(actl exp) byte minactl
 .  .4306503 .
.6 .43129745 .
.4  .4318457 .
.4   .432377 .
.4  .4326184 .
.4  .4332968 .
.3  .4353066 .
.3  .4364009 .
.4  .4389387 .
 .  .4391871 .
 .  .4402585 .
.4  .4403833 .
 . .44135585 .
 .  .4416456 .
 .  .4417434 .
 .  .4419167 .
 .  .4438399 .
.4  .4453161 .
 .  .4481203 .
 .  .4515771 .
.4  .4517835 .
 .  .4522247 .
.4  .4527117 .
.3  .4529815 .
.3  .4544157 .
.5  .4677871 .
.4  .4767839 .
.6  .4869735 .
.6  .4926984 .
.4 .51891696 .
.3 .52825445 .
.4 .53248686 .
.4  .5328437 .
 .  .5332037 .
.6  .5334881 .
.6  .5335155 .
.5  .5338645 .
.5 .53483564 .
.6 .53746325 .
.5  .5404169 .
 .  .5407301 .
 . .54236794 .
.4  .5438738 .
.5  .5484574 .
.5 .55163646 .
.4 .55418557 .
 .  .5592771 .
 . .56299645 .
.8  .5669411 .
 .  .5672508 .
 .  .5673133 .
 . .56743836 .
 . .56973493 .
 .  .5727739 .
.8 .57350165 .
.6 .57357556 .
 .   .573942 .
 .  .5740153 .
 . .57531005 .
 .  .5799186 .
.6  .5843775 .
.5 .58538944 .
.6 .58968616 .
.5 .59198374 .
 . .59305274 .
.4  .5949966 .
.4  .5956155 .
.7  .5968927 .
.8  .5977873 .
.7  .5997788 .
 .  .6584147 .
 1  .6827189 .
 1  .6881836 .
 .  .6882014 .
 .  .6898127 .
 1  .6934553 .
 .  .7054474 .
.7   .707083 .
 .  .7075651 .
 .  .7090087 .
.9  .7098652 .
 .  .7100893 .
 .  .7105488 .
 .  .7108207 .
 .  .7110068 .
 .  .7116353 .
 .  .7138406 .
.6  .7139603 .
 .  .7140106 .
 .    .71864 .
 .  .7186521 .
.4  .7188953 .
.5  .7188953 .
 .  .7204999 .
.7   .720687 .
 .  .7210785 .
 .  .7215325 .
.7  .7218328 .
 .  .7218585 .
 .   .728121 .
end

enter
scatter actl exp

to see the scatter plot. Visually I am trying to get a line passing through the lowest XY points to assign the minimum value of Actl to a new Exp var that does not have a known Actl value yet.
Actl is always discrete between 0 and 1 in multiples of 0.1 and Exp is continuous between 0-1 domain.

↧

Identifying recession points

March 1, 2019, 11:28 am

≫ Next: Command to obtain effect sizes.

≪ Previous: Drawing a line through the outer XY combinations below the trend line

Hello,

I have a dummy variable with 1 and -1: 1 corresponds to a peak, and -1 - to a trough. I want to replace the points between 1 and -1 (i.e., the recession) with 1, and all the other points with 0. That is, I'm interested only in the points between a peak (1) and the next trough (-1). Here's an example of the data:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(per_q logGDP_point)
120  0
121  0
122  0
123  0
124  0
125  0
126  0
127  0
128  0
129  0
130 -1
131  0
132  0
133  0
134  0
135  0
136  0
137  1
138  0
139 -1
140  0
141  0
142  0
143  0
144  0
145  0
146  1
147  0
148  0
149  0
150  0
151  0
152  0
153  0
154  0
155  0
156  0
157 -1
158  0
159  0
160  0
161  0
162  0
163  0
164  0
165  0
166  0
167  0
168  0
169  0
170  0
171  0
172  0
173  0
174  0
175  0
176  0
177  0
178  0
179  0
180  0
181  0
182  0
183  0
184  0
185  0
186  0
187  0
188  0
189  0
190  0
191  0
192  0
193  0
194  1
195  0
196  0
197  0
198  0
199  0
200 -1
201  0
202  0
203  0
204  0
205  0
206  0
207  0
208  0
209  0
210  0
211  0
212  0
213  0
214  0
215  0
216  0
217  0
218  0
219  0
end

I would appreciate any help.

↧

Command to obtain effect sizes.

March 1, 2019, 12:50 pm

≫ Next: Regular expression help

≪ Previous: Identifying recession points

Hello,
I am running two models on a panel data of 50 firms over 15 years (one using the xtnbreg and the other using xtreg). I tried to obtain the effect sizes by using the esize, esizei, and estat esize. However, all of them returned with error messages. Would you please help me to identify a command that will produce the effect sizes?
Many Thanks.

↧

Regular expression help

March 1, 2019, 1:08 pm

≫ Next: Comparing each observation from one variable to few hundred thousand in another

≪ Previous: Command to obtain effect sizes.

Hello,

Apologies for making a second regular expressions-related post today, but the topic is new to me and I've struggled to answer my current problem with the info online.

I have the following string values, say, for variable Branching:

Code:

st12a(5)==1 & st13b == 3
st8a(88) == 1
(e1 == 1  | e1 == 2) & e2==0

and I want to convert the (1 or 2-digit) number in the parentheses that may be followed by " ==" or "==" to two underscores and the number without parentheses :

Code:

st12a__5==1 & st13b == 3
st8a__88 == 1
(e1 == 1  | e1 == 2) & e2==0

(notice last value unchanged)

Any help?

Thanks a lot,
Reese

v 14.2

↧

Comparing each observation from one variable to few hundred thousand in another

March 1, 2019, 1:21 pm

≫ Next: Calculation of directly standardized rates "distrate" or "dstdize" commands

≪ Previous: Regular expression help

Hi all,

I am wondering if anyone has any ideas on how I can compare each observation from one variable to every observation in another variable. Essentially, imagine you have 10 observations in one variable A, and then another 1000 in another variable B. For each observation in variable A, I would like to compare it to every value in variable B. In reality, I have over 300,000 observations for each variable, so the computation becomes cumbersome quickly.

I have currently figured out the problem in Python, but it takes over 2 minutes to run through 300 observations (or roughly 10ish hours for the whole dataset). The algorithm is straightforward enough in Python - fix variable 1, compare to every observation in variable 2; fix variable 2, compare to every observation in variable 2, etc., etc... Is there anything in Stata a bit more sophisticated?

I am currently using StataIC 15 on MacOS.

↧

Calculation of directly standardized rates "distrate" or "dstdize" commands

March 1, 2019, 2:06 pm

≫ Next: Analyzing length of stay in clustered data (back transform)

≪ Previous: Comparing each observation from one variable to few hundred thousand in another

Hello,

First of all I dont know if this is the proper site to publish this, correct me if its not.

Need some help:

I'd used the "distrate" command on a database "DATA.dta" before.

Done the cleaning on the database and then I modified the "pop.dta" file in which i had the persons-year since the catchment area was modified.

The original code was:

Code:

 
 distrate cases pop using pop.dta, standstrata(age_grp) popstand(pop) by(year sex) format(%8.1f) mult(100000)

the results were good expressed this way
Code:
| year sex cases N crude rateadj lb_gam ub_gam se_gam |
Did it twice with 2 different "pop.dta" and "pop2.dta" since i needed it with both.

The code was saved on the do-file

Now the issue im having is that i try to run the code and it says that

Code:

varibale pop not found
r(111);

Need help: tried using old data base, tried making a new pop dta file for the person-years and nothing.

thanks for your help.

↧

Analyzing length of stay in clustered data (back transform)

March 1, 2019, 2:49 pm

≫ Next: Predicted probabilities with mimrgns after xtgee

≪ Previous: Calculation of directly standardized rates "distrate" or "dstdize" commands

Hello,

I'm a relatively new Stata user and am working on a project. I'm looking at length of stay (days) which is heavily right skewed and analgesic usage (days), the data is clustered within hospitals so I'm using fixed effects modeling.

Code:

 xtmixed  log_LOS ty_iv_usage opioid_usage keto_usage age_year sex ib1.race ib2.ethnicity ib0.insurance open perf year  ib3.region || hospital_number :, mle variance nostderr

I think this is a fair approach, however the interpretation is not as intuitive as being able to say that change in a drugs usage increases LOS by X days. I'm curious about potentially using Duan's smearing to retransform the coefficient as a potential way to make the interpretation more audience friendly.

One of the issues is the clustering within hospitals, which to me adds a layer of complexity.

Another thought was to leave LOS untransformed and run a median (quantile) regression with clustered bootstrapping.

Code:

bootstrap, cluster(hospital_number) reps(100) seed(5) : qreg length_of_stay post_ty_iv_usage post_opioid_usage post_keto_usage age_year sex ib1.race ib2.ethnicity ib0.insurance open perf year  ib3.region, quantile(.5)

This seems like a reasonable approach to being able to interpret the results as a change in drug use results in a median increase in LOS by X. However, I have not used the boostrap clustered command in Stata and want to verify that it is accurately accounting for patients being clustered within hospitals.

Hopefully I've given any readers enough information. Any suggestions/advice is welcome.

Thank you,

↧

Predicted probabilities with mimrgns after xtgee

March 1, 2019, 2:49 pm

≫ Next: Hep with spmap option polygon

≪ Previous: Analyzing length of stay in clustered data (back transform)

Hello,

This question refers to Daniel Klein's excellent mimrgns program. I am attempting to obtain predicted probabilities from a GEE model with a logit link / binomial family. I would use a regular logit, but there is some clustering in the sample. The data is multiply imputed. I have read Daniel's useful helpfile, but continue to get an error. This is the code I am attempting to run.

Code:

mi est, eform: xtgee y i.x1##c.x2##c.x3, family(binomial) link(logit) corr(exch)
mimrgns, dydx(x2) at(x1=(1) x3=(-5(.5)1.5) predict(pr) post

I get an error that says:

Code:

option pr not allowed
an error occurred when mi estimate executed mimrgns_estimate on m=1

I am able to run this without the "predict(pr)" and get the linear predictions, as is the default. However, because the model is nonlinear, I'd like to be able to see the predicted probabilities. I would appreciate any advice!

Robbie Dembo

↧

Hep with spmap option polygon

March 1, 2019, 3:07 pm

≫ Next: Help with scheme entries affecting rarea plots using marginsplot

≪ Previous: Predicted probabilities with mimrgns after xtgee

Hi, Im working in a map of the metro area of Medellín, Colombia. So far good, but when I try to impose a polygon of the constructed area of the metro area, the maps dont overlay each other

Here is an example of the two maps separated:

Array
Array

Further investigation showed me that the shapefiles are in different projections the first is in "4170:MAGNA-SIRGAS" and the second is in "EPSG:Datum: D_WGS_1984. Sistema de coordenadas: WGS_1984_UTM_Zone_18N. Proyeccion: Transverse_Mercator", so the database of the coordinates for each one, looks very different.

I was wondering if there is a way or method to change the projections of this two shapefiles so I can overlap the second to the first.

Thanks for your help.

↧

Help with scheme entries affecting rarea plots using marginsplot

March 1, 2019, 3:41 pm

≫ Next: Calculations of quarterly stock holding data by shareholder

≪ Previous: Hep with spmap option polygon

I am trying to create a scheme that changes the intensity, opacity, or color of a marginsplot when the confidence intervals (CIs) are recast as a rarea plot. In the first plot created by the code below you can see the outline of the CIs recast as an area. In the second I changed the opacity so they are not visible.

Can someone point me to the scheme entries that control the line color or line width of rarea plots so I can make this behavior the default in my marginsplots?

Code:

sysuse auto
regress mpg weight
margins, at(weight=(1800(25)4825))
marginsplot, recast(line) recastci(rarea) ciopts(fcolor(*.5)) name(g1)
marginsplot, recast(line) recastci(rarea) ciopts(fcolor(*.5) lcolor(%0)) name(g2)

Graph G1:
Array

Graph G2:
Array

Best,
Alan

↧

Calculations of quarterly stock holding data by shareholder

March 2, 2019, 9:49 am

≫ Next: Categorical dependent variable, how to choose the right model

≪ Previous: Help with scheme entries affecting rarea plots using marginsplot

Dear Statalists,

I am dealing with an unbalanced panel data for calculating the share holdings by a particular investing company. As shown from the table below, I would like to calculate the quarterly holding changes by investing company (id=1001) for each portfolio firm (e.g., a, b, c, d, e), and then use the difference times the stock price of each portfolio firm in the incumbent quarter.

year	quarter	investment company	portfolio firm	number of shares	stock price of each holding firm
1996	1	1001	a	1000	3.33
1996	1	1001	b	1200	3.33
1996	1	1001	c	1300	3.33
1996	1	1001	d	1100	3.33
1996	1	1001	e	1050	3.33
1996	2	1001	a	1100	4.44
1996	2	1001	b	1100	4.44
1996	2	1001	c	1200	4.44
1996	2	1001	d	1400	4.44
1996	2	1001	e	0	4.44
1996	3	1001	a	900	5.55
1996	3	1001	b	1300	5.55
1996	3	1001	c	1200	5.55
1996	3	1001	d	1400	5.55
1996	4	1001	a	1200	6.66
1996	4	1001	b	1030	6.66
1996	4	1001	c	1000	6.66
1996	4	1001	d	1409	6.66
1997	1	1001	a	2000	7.77
1997	1	1001	b	1700	7.77
1997	1	1001	c	1344	7.77
1997	1	1001	d	1278	7.77
1997	2	1001	a	1900	8.88
1997	2	1001	b	2000	8.88
1997	2	1001	c	1300	8.88
1997	2	1001	d	700	8.88

For example, for portfolio firm a, firstly, I need to calculate the difference in holding shares by investing firm 1001 between the first and second quarter in 1996, which is 1100-1000=100, secondly, I will use such difference times the stock price of portfolio firm a in the second quarter of 1996, which is 100*4.44. This is just for one firm, while the same process should be performed for other portfolio firms. Similarly, for portfolio firm b, the equation should be (1100-1200)*4.4...

In general, the calculation equation could be:

(N_{j, i, t} - N_{j, i, t-1})* P_j,t,where N_j,i,t is the number of shares holding by investing firm i in portfolio firm j in quarter t, P_{j, t} is the stock price for portfolio firm j in quarter t.

It would be much appreciated if some one can show me the stata codes in dealing with such issue.

Best,
Cong

↧

Categorical dependent variable, how to choose the right model

March 2, 2019, 10:07 am

≫ Next: Interpreting Panel data coefficient estimates where the variable doesnt change across observations

≪ Previous: Calculations of quarterly stock holding data by shareholder

I would like to Regress the likelihood that collateral are pledged on firm variables, loan contract variables and my test variables.
I have in mind to Control for time variant Variation by using time Dummies and to Control for the heterogeneity of Banks by Setting them as Panel variable. However, I would like to compare the normal probit model with the Panel probit model that use random effects per Default and the logit fixed effects model so that I justify my choice by the same comparisons as I did it for my other Analysis, where I used the loan rate as dependent variable (thus I could compare the fixed and the random effects model with the ols Regression by using Regress and xtreg command) Is this Approach for the justification of my model choice useful? This is my dataset:
input float Collateraldummy int Age long Totalassets byte Numberofemployees float Corporationdummy long Grossprofit double(Profitability Leverage) long Loansize byte(Maturity g1 g2 g3) double Duration byte Housebank str6 Loantype
1 8 1500000 28 1 1600000 .0625 .95 475000 10 0 0 1 0 0 "Credit"
0 8 1500000 28 1 1600000 .0625 .95 475000 10 0 0 1 0 0 "Credit"
1 6 500000 15 1 800000 .0875 .5 150000 10 0 0 1 5.75 1 "Credit"
1 6 500000 15 1 800000 .0875 .5 30000 1 0 0 1 5.75 1 "LC"
1 6 500000 15 1 800000 .0875 .5 20000 1 0 0 1 6 1 "LC"
1 23 387000 10 0 815000 .0343558282208589 .72 80000 1 0 1 0 10 1 "LC"
1 24 415000 10 0 830000 .05060240963855422 .77 80000 1 0 1 0 11 1 "LC"
1 25 400000 10 0 850000 .03529411764705882 .9 120000 1 0 1 0 12 1 "LC"
0 24 415000 10 0 830000 .05060240963855422 .77 60000 6 0 1 0 1 0 "Credit"
1 15 800000 25 1 3500000 .03428571428571429 .2 100000 1 0 0 1 4.666666666666667 0 "LC"
1 15 800000 25 1 3500000 .03428571428571429 .2 620000 20 0 0 1 0 0 "Credit"
1 15 800000 25 1 3500000 .03428571428571429 .2 230000 3 0 0 1 5 0 "LC"
0 7 130000 8 0 300000 .23333333333333334 .4 50000 10 1 0 0 4.75 1 "Credit"
0 1 60000 3 0 190000 0 0 20000 10 1 0 0 0 1 "Credit"
0 7 130000 8 0 300000 .23333333333333334 .4 15000 3 1 0 0 3 0 "LC"
1 20 450000 12 1 800000 .08125 .26 50000 10 0 1 0 10.083333333333334 0 "Credit"
1 18 462000 12 1 830000 .0819277108433735 .32 125000 5 0 1 0 8 0 "Credit"
1 19 438000 12 1 755000 .07549668874172186 .3 100000 5 0 1 0 0 0 "Credit"
1 20 450000 12 1 800000 .08125 .26 15000 1 0 1 0 10 0 "LC"
1 19 438000 12 1 755000 .07549668874172186 .3 15000 1 0 1 0 9 0 "LC"
1 18 462000 12 1 830000 .0819277108433735 .32 15000 1 0 1 0 8 0 "LC"
1 19 438000 12 1 755000 .07549668874172186 .3 120000 1 0 1 0 10 0 "LC"
1 18 462000 12 1 830000 .0819277108433735 .32 120000 1 0 1 0 9 0 "LC"
0 20 450000 12 1 800000 .08125 .26 10000 1 0 1 0 10.583333333333334 0 "LC"
1 15 320000 10 1 1000000 .08 .55 70000 6 1 0 0 7 0 "Credit"
1 15 320000 10 1 1000000 .08 .55 100000 5 1 0 0 5.166666666666667 0 "Credit"
1 10 277000 12 1 800000 .09375 .6 150000 4 1 0 0 5.083333333333333 1 "Credit"
1 18 720000 25 1 1800000 .11388888888888889 .45 350000 3 1 0 0 12 1 "Credit"
0 20 695000 25 1 2000000 .105 .45 300000 6 1 0 0 14 1 "Credit"
1 3 248000 3 1 500000 .11 .44 30000 4 0 1 0 0 0 "Credit"
1 4 250000 3 1 600000 .08333333333333333 .5 50000 5 0 1 0 1.33 0 "Credit"
0 3 248000 3 1 500000 .11 .44 8000 1 0 1 0 0 0 "LC"
0 4 250000 3 1 600000 .08333333333333333 .5 8000 1 0 1 0 1 0 "LC"
0 4 250000 3 1 600000 .08333333333333333 .5 10000 3 0 1 0 1.083 0 "LC"
1 2 462000 25 1 1750000 .022857142857142857 .45 100000 1 0 1 0 0 0 "LC"
1 3 450000 29 1 1900000 .027105263157894736 .5 200000 3 0 1 0 .5833333333333334 0 "LC"
1 3 450000 29 1 1900000 .027105263157894736 .5 100000 1 0 1 0 1 0 "LC"
1 2 462000 25 1 1750000 .022857142857142857 .45 250000 5 0 1 0 0 0 "Credit"
1 4 440000 29 1 2000000 .025 .5 200000 5 0 1 0 1.4166666666666667 0 "Credit"
1 7 360000 9 1 415000 .18795180722891566 .25 15000 1 0 1 0 5 1 "LC"
1 8 350000 9 1 435000 .18620689655172415 .25 25000 1 0 1 0 6 1 "LC"
1 9 345000 9 1 430000 .18604651162790697 .3 15000 1 0 1 0 7 1 "LC"
1 45 1000000 14 0 1450000 .07931034482758621 .6 350000 7 1 0 0 15 1 "Credit"
0 50 1050000 15 0 1500000 .06666666666666667 .7 300000 10 1 0 0 20 1 "Credit"
1 45 1000000 14 0 1450000 .07931034482758621 .6 150000 1 1 0 0 15 1 "LC"
1 46 970000 15 0 1400000 .06785714285714285 .7 150000 1 1 0 0 16.5 1 "LC"
1 47 960000 15 0 1475000 .06779661016949153 .7 150000 1 1 0 0 17.75 1 "LC"
1 7 350000 3 0 400000 .125 .5 20000 1 0 1 0 7 1 "LC"
1 7 350000 3 0 400000 .125 .5 15000 5 0 1 0 7 1 "Credit"
0 25 500000 25 1 1100000 .18181818181818182 .8 150000 10 0 1 0 15 1 "Credit"
0 25 500000 25 1 1100000 .18181818181818182 .8 400000 15 0 1 0 15 1 "Credit"
0 25 500000 25 1 1100000 .18181818181818182 .8 50000 1 0 1 0 15 1 "LC"
0 40 620000 25 0 2000000 .15 .2 150000 10 0 1 0 20 1 "Credit"
0 40 620000 25 0 2000000 .15 .2 50000 1 0 1 0 20 1 "LC"
0 35 380000 12 1 1500000 .06666666666666667 .3 25000 5 0 1 0 15 1 "Credit"
1 4 400000 7 0 950000 .1368421052631579 .25 300000 5 0 1 0 3 1 "Credit"
0 7 425000 9 0 1000000 .123 .2 250000 7 0 1 0 6 1 "Credit"
1 4 400000 7 0 950000 .1368421052631579 .25 50000 1 0 1 0 3 1 "LC"
1 5 415000 8 0 975000 .14358974358974358 .2 80000 1 0 1 0 4.333333333333333 1 "LC"
1 6 410000 9 0 935000 .13368983957219252 .2 80000 1 0 1 0 5.333333333333333 1 "LC"
1 7 425000 9 0 1000000 .123 .2 80000 1 0 1 0 6 1 "LC"
1 102 370000 6 0 427000 .14285714285714285 .42 80000 5 0 1 0 23 1 "Credit"
1 102 370000 6 0 427000 .14285714285714285 .42 30000 1 0 1 0 8 0 "LC"
1 103 375000 6 0 430000 .13953488372093023 .45 45000 1 0 1 0 8.75 0 "LC"
0 102 370000 6 0 427000 .14285714285714285 .42 80000 5 0 1 0 0 0 "Credit"
0 17 3500000 28 1 2875000 .05495652173913043 .38 500000 10 0 0 1 14 1 "Credit"
0 22 3625000 30 1 3000000 .05 .4 400000 7 0 0 1 4 0 "Credit"
1 22 3625000 30 1 3000000 .05 .4 60000 2 0 0 1 5 0 "LC"
1 22 3625000 30 1 3000000 .05 .4 50000 2 0 0 1 .16666666666666666 0 "LC"
0 18 3100000 15 1 2600000 .06538461538461539 .5 150000 3 0 0 1 5 0 "Credit"
0 18 3100000 15 1 2600000 .06538461538461539 .5 130000 4 0 0 1 4 0 "Credit"
0 18 3100000 15 1 2600000 .06538461538461539 .5 50000 2 0 0 1 4 0 "LC"
1 26 2650000 35 1 2300000 .09 .21 300000 5 0 0 1 22 1 "Credit"
1 27 2710000 35 1 2425000 .09278350515463918 .28 250000 7 0 0 1 23 1 "Credit"
0 29 2665000 33 1 2400000 .0875 .25 50000 9 0 0 1 25.25 1 "Credit"
0 30 2700000 33 1 2350000 .08297872340425531 .25 80000 10 0 0 1 26.333333333333332 1 "Credit"
1 27 2710000 34 1 2425000 .09278350515463918 .28 80000 1 0 0 1 23.166666666666668 1 "LC"
1 17 1980000 26 1 1650000 .0893939393939394 .26 325000 10 0 1 0 16 1 "Credit"
0 19 2050000 26 1 1700000 .08941176470588236 .31 150000 8 0 1 0 18.333333333333332 1 "Credit"
0 20 1930000 26 1 1750000 .08857142857142856 .33 220000 5 0 1 0 19.166666666666668 1 "Credit"
0 19 2050000 26 1 1700000 .08941176470588236 .31 80000 1 0 1 0 18.166666666666668 1 "LC"
end
and the Code:

Code:

probit Collateraldummy Age Totalassets Numberofemployees Corporationdummy Grossprofit Profitability Leverage Loansize Maturity g1 g3 Duration Housebank if Loantype!="Crédit"

	Code:
	 which yield to an R squared of 100% that I cannot explain. Also to use clustered Standard error, what variable do I have to use for the vce command
 xtprobit Collateraldummy Age Totalassets Numberofemployees Corporationdummy Grossprofit Profitability Leverage Loansize Maturity g1 g3 Duration Housebank if Loantype!="Credit"

for which model the iterations do not come to an end. Could you please tell me the reason?
Thanks in Advance for your help.

↧

Interpreting Panel data coefficient estimates where the variable doesnt change across observations

March 2, 2019, 11:01 am

≫ Next: How to edit or delete a "signature" that is auto-added to one's post?

≪ Previous: Categorical dependent variable, how to choose the right model

Hi,

I am looking at the effect of different variables on the recycling rate in England.
I have 311 local authorities in England over 20 quarters and am running a regression including income, population density and household size. I have data on income and population density by quarter for each local authority.
However I was only able to obtain data for household size by year, and this does not separate by local authority it is an average for all the UK (I only have 5 values for household size).

How can I interpret the coefficient on household size?

↧