Heteroskedastic data

December 8, 2019, 5:15 pm

Hi,

What does 'panels(heteroskedastic) - use heteroskedastic but uncorrelated error structure' exactly mean? Does it mean that when we use xtgls along with this option, the heterskedasticity in the model is handled? Kindly confirm

Thanks,
Gopal

↧

Autocorrelation

December 8, 2019, 5:16 pm

≫ Next: Help with interrupted time series margins plot

≪ Previous: Heteroskedastic data

Hi,

I checked for serial correlation for my panel using xtserial command. The results show that my panel has serial correlation. Please tell me what command options I can use to circumvent this drawback?

Thanks
Gopal

↧

Help with interrupted time series margins plot

December 8, 2019, 5:52 pm

≫ Next: discrepancies in estimates between sumdist and pshare

≪ Previous: Autocorrelation

Hello all,

I am working on graphing yhat of death by month to illustrate the impact of a health care intervention (implemented in month 10), but I am having trouble putting together the appropriate code.

1. The model is a mixed effects logistic regression model [using meglm] and the design is interrupted time series

2. The below code is essentially what I want because it shows the predicted values of the outcome of interest before and after the intervention (intervention at month 10) for each month

twoway(scatter yhat month) (line yhat month if month < 10, sort) (line yhat month if month >= 10, sort), xline(10)

3. However, I'm trying to make it work for my purposes and I would love help!!
- I need to store yhats for each month for each meglm model (reason: takes days for each meglm model to run!)
- I want to adapt the above code so I am showing yhats for each month using stored estimates while hiding the scatter plot, it is looking too messy
- I need to find a way of storing the graphs and then editing them once they are stored (way of storing graphs as is and then editing them?)

Thanks for any advice!

Amy

↧

discrepancies in estimates between sumdist and pshare

December 8, 2019, 8:01 pm

≫ Next: PPML Fixed Effects

≪ Previous: Help with interrupted time series margins plot

Hi friends,

I have income data for two populations (population id=1 and 2). I have collapsed my data into the form of cross-tabulation because in my research I assume all people in the same cell has exactly the same income. My original data is pasted at the end of this post. I want to calculate income share of each decile in each of the 2 populations separately. I have tried pshare (by Ben Jann) and sumdist (By Stephen Jenkins). Both can be installed by "ssc install ...". However, their estimates are slightly different. I was wondering if it is due to my incorrect usage of these commands?

For example, the income share of the top 10% in population with id=1 is 25.35% by pshare but 25.31 by sumdist. I also do not understand why there is an additional row with "." in the first column in the output of sumdist.

Thank you in advance!

Code:

. pshare estimate income [iw=freq], over(id) n(10) gini
(variance estimation not supported with iweights)

Percentile shares (proportion)    Number of obs   =         40

            1: id = 1
            2: id = 2

--------------------------------------------------------------
      income |      Coef.   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
1            |
        0-10 |   .0193008          .             .           .
       10-20 |    .034384          .             .           .
       20-30 |   .0535935          .             .           .
       30-40 |   .0679086          .             .           .
       40-50 |   .0808105          .             .           .
       50-60 |   .0935073          .             .           .
       60-70 |   .1067746          .             .           .
       70-80 |     .12858          .             .           .
       80-90 |   .1615961          .             .           .
      90-100 |   .2535447          .             .           .
-------------+------------------------------------------------
2            |
        0-10 |   .0844804          .             .           .
       10-20 |    .087861          .             .           .
       20-30 |   .0886734          .             .           .
       30-40 |   .0904544          .             .           .
       40-50 |   .0927363          .             .           .
       50-60 |   .0940319          .             .           .
       60-70 |   .0971719          .             .           .
       70-80 |   .1013188          .             .           .
       80-90 |   .1245144          .             .           .
      90-100 |   .1387574          .             .           .
--------------------------------------------------------------

-------------------------
             |      Gini
-------------+-----------
           1 |  .3525345
           2 |  .0837606
-------------------------



. sumdist income [aw=freq] if id==1, ngps(10)
Distributional summary statistics, 10 quantile groups

---------------------------------------------------------------------------
Quantile  |
group     |    Quantile  % of median     Share, %      L(p), %        GL(p)
----------+----------------------------------------------------------------
        1 |    15090.00        22.30         1.93         1.93      1510.64
        2 |    31181.00        46.07         3.44         5.38      4204.04
        3 |    44526.00        65.79         5.37        10.75      8401.91
        4 |    53645.00        79.27         6.78        17.52     13700.59
        5 |    67677.00       100.00         8.09        25.62     20027.01
        6 |    76289.00       112.73         9.36        34.98     27347.58
        7 |    85703.00       126.64        10.65        45.63     35676.55
        8 |   104176.00       153.93        12.87        58.51     45741.38
        9 |   131401.00       194.16        16.18        74.69     58393.76
       10 |                                 25.31       100.00     78183.37
        . |                                  0.00       100.00     78183.37
---------------------------------------------------------------------------

. sumdist income [aw=freq] if id==2, ngps(10)
Distributional summary statistics, 10 quantile groups

---------------------------------------------------------------------------
Quantile  |
group     |    Quantile  % of median     Share, %      L(p), %        GL(p)
----------+----------------------------------------------------------------
        1 |    66284.00        92.84        11.99        11.99      9144.18
        2 |    67528.00        94.59         7.39        19.37     14777.63
        3 |    67677.00        94.80         7.08        26.45     20178.05
        4 |    70314.00        98.49         8.77        35.22     26866.54
        5 |    71393.00       100.00        13.36        48.58     37056.61
        6 |    74037.00       103.70        10.45        59.03     45026.36
        7 |    74224.00       103.97         5.40        64.43     49145.43
        8 |    82858.00       116.06        11.07        75.50     57592.88
        9 |    97434.00       136.48        10.66        86.16     65721.22
       10 |                                 13.84       100.00     76281.04
        . |                                  0.00       100.00     76281.04
---------------------------------------------------------------------------

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(income freq id)
 58340  46 2
 63079  35 2
 66056 132 2
 66284 175 2
 67528 230 2
 67677 220 2
 67773  88 2
 68921  79 2
 70314 100 2
 70686 196 2
 70696  56 2
 71393 144 2
 71651 123 2
 72303  11 2
 74037 167 2
 74224 153 2
 75841 188 2
 82858 109 2
 97434 230 2
105867 275 2
 15090 276 1
 17279   0 1
 24071 166 1
 29035   0 1
 31181 110 1
 32539   0 1
 34466   0 1
 38506   0 1
 38660 122 1
 39778   0 1
 39786   0 1
 40480   0 1
 44162   0 1
 44190   0 1
 44421   0 1
 44526 154 1
 45027   0 1
 47497   0 1
 47751   0 1
 48988   0 1
 50961   0 1
 51042   0 1
 51800  78 1
 51840   0 1
 51950   0 1
 52751   0 1
 53645 197 1
 53694   0 1
 55029   0 1
 57285   0 1
 57428   0 1
 57832   0 1
 58306   0 1
 58340   0 1
 59029   0 1
 59230   0 1
 59411  34 1
 59767   0 1
 62126   0 1
 62644   0 1
 63079   0 1
 63122   0 1
 63521 230 1
 66056   0 1
 66284   0 1
 67528   0 1
 67677  12 1
 67773   0 1
 68921   0 1
 70314   0 1
 70686   0 1
 70696   0 1
 71393   0 1
 71651   0 1
 72303 219 1
 74037   0 1
 74161   0 1
 74224   0 1
 75841   0 1
 76289  57 1
 76468   0 1
 78953   0 1
 79146   0 1
 81072   0 1
 81112   0 1
 82224 174 1
 82370   0 1
 82858   0 1
 83180   0 1
 83960   0 1
 85703 101 1
 86941   0 1
 88456   0 1
 88713   0 1
 91106   0 1
 91921   0 1
 93882   0 1
 96068   0 1
 96173   0 1
 96513 131 1
 97434   0 1
 98657   0 1
 98719   0 1
102264   0 1
104176 145 1
104184   0 1
104435   0 1
105847   0 1
105867   0 1
105867   0 1
111956   0 1
112972   0 1
115492  87 1
116767   0 1
118917   0 1
120198   0 1
121313   0 1
121679   0 1
122757   0 1
130627   0 1
131401 189 1
133087   0 1
134584   0 1
135674   0 1
144051   0 1
145649   0 1
147470   0 1
150625   0 1
159560  45 1
205999 230 1
 15090   0 2
 17279   0 2
 24071   0 2
 29035   0 2
 31181   0 2
 32539   0 2
 34466   0 2
 38506   0 2
 38660   0 2
 39778   0 2
 39786   0 2
 40480   0 2
 44162   0 2
 44190   0 2
 44421   0 2
 44526   0 2
 45027   0 2
 47497   0 2
 47751   0 2
 48988   0 2
 50961   0 2
 51042   0 2
 51800   0 2
 51840   0 2
 51950   0 2
 52751   0 2
 53645   0 2
 53694   0 2
 55029   0 2
 57285   0 2
 57428   0 2
 57832   0 2
 58306   0 2
 59029   0 2
 59230   0 2
 59411   0 2
 59767   0 2
 62126   0 2
 62644   0 2
 63122   0 2
 63521   0 2
 74161   0 2
 76289   0 2
 76468   0 2
 78953   0 2
 79146   0 2
 81072   0 2
 81112   0 2
 82224   0 2
 82370   0 2
 83180   0 2
 83960   0 2
 85703   0 2
 86941   0 2
 88456   0 2
 88713   0 2
 91106   0 2
 91921   0 2
 93882   0 2
 96068   0 2
 96173   0 2
 96513   0 2
 98657   0 2
 98719   0 2
102264   0 2
104176   0 2
104184   0 2
104435   0 2
105847   0 2
105867   0 2
111956   0 2
112972   0 2
115492   0 2
116767   0 2
118917   0 2
120198   0 2
121313   0 2
121679   0 2
122757   0 2
130627   0 2
131401   0 2
133087   0 2
134584   0 2
135674   0 2
144051   0 2
145649   0 2
147470   0 2
150625   0 2
159560   0 2
205999   0 2
end

↧

PPML Fixed Effects

December 8, 2019, 9:03 pm

≫ Next: Poisson regression | time-from-event count data

≪ Previous: discrepancies in estimates between sumdist and pshare

Hi,

I have a question:

Given this gravity model estimated with the PPML estimator with fixed effects:

TRADEijt = exp [πit + χjt + β1ln⁡(DISTij ) + β2CNTGij + β3LANGij + β4CLNYij + β5RTAijt ]x εijt

πit: The set of exporter-time fixed effects
χjt: The set of importer-time fixed effects

Is it possible to estimate the same model but using only the bilateral trade of one country?
For example, for Country A where TRADEijt is given by imports between Country A - j and imports between i - Country A

If yes, how could I define the sets of exporter and importer time fixed effects?

Thank you!

↧

Poisson regression | time-from-event count data

December 8, 2019, 10:46 pm

≫ Next: Independent variables change signs when interaction is included

≪ Previous: PPML Fixed Effects

Apologies in advance if this is a daft question from a healthcare professional with little/no experience of modelling count data over time...

I want to model two counts over time. The start date for each counts is different and in the past (January start, February start, etc). There is a fixed stop data (censorship) for all counts. All data derives from one source and so I guess a single-level model would be ok. I want to estimate the association between the two counts (ie. does one rise as the other does). Is Poisson regression the right method to analyse this time-from-event count data? Is there a function (eg. Cox "in reverse") for time-from-event for count data?

Many thanks for your guidance

↧

Independent variables change signs when interaction is included

December 8, 2019, 11:39 pm

≫ Next: if & else if syntax

≪ Previous: Poisson regression | time-from-event count data

Hey stata experts, I am currently writing my master thesis, therefore using stata. I have a question about an interpretation of my interactions. I wanna do a logistic regression, my dependent var is stock market participation (binary), my independent var are five personality traits (continuous), and my interactions are quality of government (continuous) with each trait. I also have several controls. It happens for two traits that the interaction is significant and positive, however, the respective independent variables change signs, but only for the regression in which the respective interaction is included. In all other regression with the other interactions, the sign remains the same. I have tried demeaned variables, probit, xtlogit, I have let stata calculate the interactions, and used a non-normed meancountryeqi, but results remain the same.

How is it possible to interpret these results?

This is what I regress:
logit smp bfi10_open bfi10_consc bfi10_extra bfi10_agree bfi10_neuro meancountrynormeqi CV, vce(robust)
logit smp bfi10_open bfi10_consc bfi10_extra bfi10_agree bfi10_neuro meancountrynormeqi oqog CV, vce(robust)
logit smp bfi10_open bfi10_consc bfi10_extra bfi10_agree bfi10_neuro meancountrynormeqi cqog CV, vce(robust)
logit smp bfi10_open bfi10_consc bfi10_extra bfi10_agree bfi10_neuro meancountrynormeqi eqog CV, vce(robust)
logit smp bfi10_open bfi10_consc bfi10_extra bfi10_agree bfi10_neuro meancountrynormeqi aqog CV, vce(robust)
logit smp bfi10_open bfi10_consc bfi10_extra bfi10_agree bfi10_neuro meancountrynormeqi nqog CV, vce(robust)

This is the stata output:

	Column 1	Column 2	Column 3	Column 4	Column 5	Column 6
Openness	-.054^*** (.011)	.449^*** (.060)	-.054^*** (.011)	-.051^*** (.011)	-.051^*** (.011)	-.053^*** (.011)
Conscientiousness	-.101^*** (.014)	-.103^*** (.014)	-.114 (.070)	-.100^*** (.014)	-.100^*** (.014)	-.100^*** (.014)
Extraversion	-.024^** (.012)	-.025^** (.012)	-.024^** (.012)	-.427^*** (.060)	-.026^** (.012)	-.025^** (.012)
Agreeableness	.063^*** (.014)	.059^*** (.014)	.063^*** (.014)	.059^*** (.014)	-.229^*** (.072)	.063^*** (.014)
Neuroticism	-.004 (.011)	-.001 (.011)	-.004 (.011)	-.000 (.012)	-.003 (.012)	.075 (.061)
QOG	.047^*** (.001)	.071^*** (.003)	.047^*** (.004)	.027^*** (.003)	.031^*** (.004)	.050^*** (.002)
O*QOG		-.007^*** (.001)
C*QOG			.000 (.001)
E*QOG				.006^*** (.001)
A*QOG					.004^*** (.001)
N*QOG						-.001 (.001)

Best, Sophie

↧

if & else if syntax

December 8, 2019, 11:52 pm

≫ Next: Unexpected error from XTPMG output

≪ Previous: Independent variables change signs when interaction is included

Dear Stata users,

I wrote a very simple program to caculate efficacy coefficient. The program is as follows. However, Stata only run the second piece of program (i.e. the second if sentence). Can anybody point out where the problem lay, thank you.

Code:

program define efcoef

syntax varlist[, Low(integer 100) High(integer 100) help]

 tokenize `varlist'

 if ("`help'"!="help" & "`low'`high'"=="") {
  foreach v of varlist `varlist' {
   tempvar `v'min `v'max `v'mean
   egen ``v'min'=min(`v')
   egen ``v'max'=max(`v')
   egen ``v'mean'=mean(`v')
   gen efcoef_`v'=round((`v'-``v'min')/(``v'max'-``v'min')*40+60, 0.01)
   label var efcoef_`v' "efficacy coef: min-max of `v'"
  }
 }

 if ("`help'"!="help" & "`low'`high'"!="") {
  foreach v of varlist `varlist' {
   tempvar `v'min `v'max `v'mean
   egen ``v'min'=min(`v')
   egen ``v'max'=max(`v')
   egen ``v'mean'=mean(`v')
   gen efcoef_`v'=round((`v'-``v'min')/(``v'max'-``v'min')*40+60, 0.01)
   gen efcoef_`v'_2=round((`v'-`low')/(`high'-`low')*40+60, 0.01)
   label var efcoef_`v' "efficacy coef: min-max of `v'"
   label var efcoef_`v'_2 "efficacy coef: low-high of `v'"
  }
 }

 if ("`help'"=="help") {
  display "efcoef var will generate ecoef_var=(var-`v'min)/(`v'max-`v'min)*40+60"
  display "efcoef var, low(int) high(int) help"
 }

end

↧

Unexpected error from XTPMG output

December 9, 2019, 12:24 am

≫ Next: Weighting data with AIPW and teffects

≪ Previous: if & else if syntax

Hi dear all. I am having an unexpected error message from xtpmg output.

I am using Stata 16 for Mac

I have time series panel data. When I run PMG and DFE, there is no problem. But when I try MG, I have the following

xtpmg dlnmtob dexri, ///
lr(l.lnmtob exri) ///
ec(ECT) replace mg
invalid new variable name;
variable name ECT is in the list of predictors
r(110);

Please, can anyone help me understand why?

Thanks

↧

Weighting data with AIPW and teffects

December 9, 2019, 12:44 am

≫ Next: calculating or identifying start/stop dates according to a rule with consecutive observations

≪ Previous: Unexpected error from XTPMG output

Hello everyone,

Actually, I try to understand how to use an IPW estimator.
I have 11 treatments variables in a panel dataset, and they can all takes a value of -1, 0 or 1.
Because of all of this, I choose to use the AIPW estimator, with an ordered logit model.
I have learned a bit how to use it, how it works, but I still have questions:
If I use, for example,

Code:

teffects aipw (outcome) (treatments covariates, logit)

there is no need to use xtlogit then predict? It is my understanding I have to choose between these two methods, am I right?
How to differenciate treatments and covariates in the code?

Thanks in advance for your assistance.

PS: I didn't start to test some code, so I may have further questions.

↧

calculating or identifying start/stop dates according to a rule with consecutive observations

December 9, 2019, 1:44 am

≫ Next: Pooled panel Heckman MLE vs. pooled panel Heckman Two-step

≪ Previous: Weighting data with AIPW and teffects

Hi, this is a snap shot of my data set (I have used dataex but I'm not sure if the result is correct). I have an id number, start and stop date for receipt of social benefits. If the stop date is missing it means they are still receiving the benefit. My objective is to create/calculate new runs for each individual where current runs that are 3 months or less apart are merged into one run. for ex for individual 1, 9 and 100 I would like the following result:

Anyone who know how I can achieve that?

ID	start	stop	first_start	last_stop
1	2012m2	2012m2	2012m2	2014m6
1	2012m4	2012m6	2012m2	2014m6
1	2012m8	2013m6	2012m2	2014m6
1	2013m9	2013m10	2012m2	2014m6
1	2013m12	2014m6	2012m2	2014m6
1	2018m1	.	2018m1	.
9	2016m11	.	2016m11	.
100	2013m3	2013m11	2013m3	2016m3
100	2014m1	2014m8	2013m3	2016m3
100	2014m10	2014m10	2013m3	2016m3
100	2014m12	2015m2	2013m3	2016m3
100	2015m4	2015m8	2013m3	2016m3
100	2015m10	2015m10	2013m3	2016m3
100	2015m12	2016m3	2013m3	2016m3
100	2016m8	2016m9	2016m8	2017m1
100	2016m12	2017m1	2016m8	2017m1

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(id start stop)
  1 625 625
  1 627 629
  1 631 641
  1 644 645
  1 647 653
  1 696   .
  9 682   .
 13 695   .
 17 694 694
 25 662 675
 25 677 678
 25 682   .
 29 634 680
 33 643 661
 33 690   .
 37 657 704
 37 706 710
 41 679   .
 45 707   .
 49 683   .
 53 670   .
 57 696   .
 61 687 693
 61 696   .
 65 634 680
 69 636 649
 73 628 658
 73 676 690
 73 702   .
 81 637 705
 85 638 639
 85 652 703
 89 633 639
 89 641 665
 93 707   .
 97 694   .
100 638 646
100 648 655
100 657 657
100 659 661
100 663 667
100 669 669
100 671 674
100 679 680
100 683 684
end
format %tm start
format %tm stop

↧

Pooled panel Heckman MLE vs. pooled panel Heckman Two-step

December 9, 2019, 3:27 am

≫ Next: Taking Average by id

≪ Previous: calculating or identifying start/stop dates according to a rule with consecutive observations

Dear Statalist,

I am running a pooled panel Heckman two-step estimation to find determinants of insurance purchases. A signficant effect of the inverse mills ratio (imr1) below indicates a selection problem and that Heckman is preferred over OLS.

Code:

*1st stage probit model
probit y_seen x18 i.x19 i.x20 round i.x3 x4 x5 x6 x7 i.x8 x9 i.x10##i.x11 i.x12 x13 x14 ///
i.x15##c.x16 x17 [pweight=pweight], vce(cluster HHID)
margins, dydx(*) post

Code:

Average marginal effects                        Number of obs     =        574
Model VCE    : Robust

Expression   : Pr(y_seen), predict()
dy/dx w.r.t. : x18 1.x19 1.x20 round 1.x3 x4 x5 x6 x7 1.x8 x9 1.x10 1.x11 1.x12 x13 x14 1.x15 x16 x17

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         x18 |   .0143955   .0015807     9.11   0.000     .0112974    .0174936
       1.x19 |  -.0449697   .0218515    -2.06   0.040    -.0877979   -.0021416
       1.x20 |  -.1017704   .0269623    -3.77   0.000    -.1546156   -.0489252
       round |  -.0479061   .0173747    -2.76   0.006    -.0819598   -.0138523
        1.x3 |  -.0209526   .0290927    -0.72   0.471    -.0779732    .0360681
          x4 |   .0000826   .0000152     5.43   0.000     .0000528    .0001124
          x5 |   .0042392   .0131803     0.32   0.748    -.0215938    .0300721
          x6 |   .0052349   .0060947     0.86   0.390    -.0067104    .0171802
          x7 |   .0043644   .0016943     2.58   0.010     .0010435    .0076852
        1.x8 |  -.0120274   .0366869    -0.33   0.743    -.0839324    .0598777
          x9 |   .0508862    .013231     3.85   0.000      .024954    .0768184
       1.x10 |   .0865615   .0383121     2.26   0.024     .0114712    .1616518
       1.x11 |    .010413   .0292698     0.36   0.722    -.0469548    .0677807
       1.x12 |  -.0040397   .0392662    -0.10   0.918    -.0810001    .0729207
         x13 |  -.0221218   .0120994    -1.83   0.067    -.0458363    .0015927
         x14 |   .0001126    .000437     0.26   0.797     -.000744    .0009691
       1.x15 |  -.0348978   .0388583    -0.90   0.369    -.1110588    .0412631
         x16 |   -.001754   .0004966    -3.53   0.000    -.0027274   -.0007806
         x17 |   .0290123   .0151913     1.91   0.056     -.000762    .0587867
------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.

Code:

qui probit y_seen x18 i.x19 i.x20 round i.x3 x4 x5 x6 x7 i.x8 x9 i.x10##i.x11 i.x12 x13 x14 ///
i.x15##c.x16 x17 [pweight=pweight], vce(cluster HHID)
predict p1, xb //calculates predicted value of \beta*X from selection regression
replace p1=-p1 //calculates -Z_i\gamma
generate phi = (1/sqrt(2*_pi))*exp(-(p1^2/2)) //normal distribution density function
generate capphi = normal(p1) //cumulative density function
generate imr1 = phi/(1-capphi) //calculates Inverse Mills ratio

*2nd stage Heckman model: truncated and pooled OLS
reg y x1 x2 round i.x3 x4 x5 x6 x7 i.x8 x9 i.x10##i.x11 i.x12 x13 x14 i.x15##c.x16 x17 imr1 ///
[pweight=pweight], vce(cluster HHID)

Code:

Linear regression                               Number of obs     =        472
                                                F(21, 124)        =      64.07
                                                Prob > F          =     0.0000
                                                R-squared         =     0.4332
                                                Root MSE          =     .58905

                                 (Std. Err. adjusted for 125 clusters in HHID)
------------------------------------------------------------------------------
             |               Robust
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          x1 |   1.086289   .0920832    11.80   0.000     .9040309    1.268548
          x2 |  -1.285324   1.143381    -1.12   0.263    -3.548395    .9777475
       round |   -.103336   .0590195    -1.75   0.082    -.2201521    .0134801
        1.x3 |   -.083443   .0714037    -1.17   0.245     -.224771     .057885
          x4 |   -.000065   .0000324    -2.00   0.047    -.0001292   -8.20e-07
          x5 |  -.0466427    .023452    -1.99   0.049    -.0930609   -.0002246
          x6 |   .0381533   .0112903     3.38   0.001     .0158066       .0605
          x7 |   .0170921   .0034209     5.00   0.000     .0103212     .023863
        1.x8 |  -.0287251   .0753869    -0.38   0.704    -.1779369    .1204868
          x9 |  -.0255863   .0329273    -0.78   0.439    -.0907586     .039586
       1.x10 |  -.1244701   .1442971    -0.86   0.390    -.4100745    .1611344
       1.x11 |   .0221183   .1491671     0.15   0.882    -.2731251    .3173617
             |
     x10#x11 |
        1 1  |    .188353   .1639117     1.15   0.253    -.1360742    .5127803
             |
       1.x12 |   .2493369    .069622     3.58   0.000     .1115354    .3871384
         x13 |   .0115066   .0247547     0.46   0.643    -.0374899    .0605032
         x14 |    .001514   .0008878     1.71   0.091    -.0002433    .0032712
       1.x15 |    .155291   .2137375     0.73   0.469    -.2677555    .5783375
         x16 |   -.004325   .0031301    -1.38   0.170    -.0105205    .0018704
             |
   x15#c.x16 |
          1  |    .003758   .0035941     1.05   0.298    -.0033558    .0108718
             |
         x17 |   .0196864   .0373166     0.53   0.599    -.0541735    .0935464
        imr1 |   .2430458   .1113475     2.18   0.031     .0226579    .4634338
       _cons |   13.41441   12.27384     1.09   0.277    -10.87896    37.70777
------------------------------------------------------------------------------

When comparing the two-step precedure with the MLE heckman, questions arise now:
(1) Is the selection equation only based on the truncated sample with y>0? Why? This does not make any sense as there is no variation in y!?
(2) Can this explain why the inverse mills ratio (selection effect) is here not significantly different from zero as indicated by the Wald test that cannot reject the null hypothesis of rho=0?

Code:

heckman y x1 x2 round i.x3 x4 x5 x6 x7 i.x8 x9 i.x10##i.x11 i.x12 x13 x14 i.x15##c.x16 x17 /// 
[pweight=pweight], ///
select(y_seen=x18 i.x19 i.x20 round i.x3 x4 x5 x6 x7 i.x8 x9 i.x10##i.x11 i.x12 x13 x14 ///
i.x15##c.x16 x17) vce(cluster HHID)

Code:

Heckman selection model                         Number of obs     =        574
(regression model with sample selection)              Selected    =        472
                                                      Nonselected =        102

                                                Wald chi2(20)     =    1384.02
Log pseudolikelihood = -519.9681                Prob > chi2       =     0.0000

                                 (Std. Err. adjusted for 126 clusters in HHID)
------------------------------------------------------------------------------
             |               Robust
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
y            |
          x1 |   1.079878   .0870764    12.40   0.000     .9092117    1.250545
          x2 |  -1.146409   1.003011    -1.14   0.253    -3.112275    .8194557
       round |  -.1038978   .0547836    -1.90   0.058    -.2112717     .003476
        1.x3 |  -.0767749    .073642    -1.04   0.297    -.2211107    .0675608
          x4 |  -.0000586   .0000311    -1.89   0.059    -.0001196    2.31e-06
          x5 |  -.0532746   .0271382    -1.96   0.050    -.1064644   -.0000848
          x6 |   .0335714   .0143009     2.35   0.019     .0055421    .0616007
          x7 |   .0169975   .0032445     5.24   0.000     .0106384    .0233566
        1.x8 |  -.0426975   .0950499    -0.45   0.653    -.2289919     .143597
          x9 |  -.0154203   .0327672    -0.47   0.638    -.0796428    .0488022
       1.x10 |  -.0930904   .1506446    -0.62   0.537    -.3883484    .2021676
       1.x11 |   .0727406   .1849587     0.39   0.694    -.2897718    .4352531
             |
     x10#x11 |
        1 1  |    .165025   .1665951     0.99   0.322    -.1614954    .4915454
             |
       1.x12 |   .2292365   .0845575     2.71   0.007     .0635068    .3949662
         x13 |  -.0052651   .0451982    -0.12   0.907     -.093852    .0833218
         x14 |   .0015364   .0009399     1.63   0.102    -.0003058    .0033786
       1.x15 |   .1121501   .2073449     0.54   0.589    -.2942384    .5185386
         x16 |  -.0061802   .0047053    -1.31   0.189    -.0154025     .003042
             |
   x15#c.x16 |
          1  |   .0056411   .0053074     1.06   0.288    -.0047612    .0160434
             |
         x17 |   .0331053   .0396846     0.83   0.404    -.0446752    .1108857
       _cons |    11.9172   10.75137     1.11   0.268    -9.155095     32.9895
-------------+----------------------------------------------------------------
y_seen       |
         x18 |   .0954856    .023458     4.07   0.000     .0495088    .1414624
       1.x19 |  -.4510818   .1957512    -2.30   0.021     -.834747   -.0674166
       1.x20 |  -.5628401   .4688995    -1.20   0.230    -1.481866     .356186
       round |  -.3500378   .1178559    -2.97   0.003    -.5810312   -.1190445
        1.x3 |  -.0514181   .2962523    -0.17   0.862    -.6320619    .5292257
          x4 |   .0006573   .0001697     3.87   0.000     .0003247    .0009898
          x5 |  -.0264533   .1181423    -0.22   0.823    -.2580079    .2051012
          x6 |   .0606577   .0973045     0.62   0.533    -.1300557     .251371
          x7 |   .0344755    .012954     2.66   0.008     .0090862    .0598649
        1.x8 |  -.1359341   .3216742    -0.42   0.673     -.766404    .4945357
          x9 |   .4386742   .1225535     3.58   0.000     .1984737    .6788748
       1.x10 |   .1281608   .3533009     0.36   0.717    -.5642962    .8206178
       1.x11 |  -.1410168   .6800977    -0.21   0.836    -1.473984     1.19195
             |
     x10#x11 |
        1 1  |   .4539155    .889635     0.51   0.610    -1.289737    2.197568
             |
       1.x12 |  -.2686829   .5875363    -0.46   0.647    -1.420233     .882867
         x13 |  -.1141443    .230882    -0.49   0.621    -.5666646     .338376
         x14 |   .0013928   .0038212     0.36   0.715    -.0060966    .0088822
       1.x15 |  -.1661187   .6020005    -0.28   0.783    -1.346018    1.013781
         x16 |  -.0155677   .0053069    -2.93   0.003     -.025969   -.0051663
             |
   x15#c.x16 |
          1  |   .0027121   .0076237     0.36   0.722    -.0122302    .0176543
             |
         x17 |   .2812853   .1835776     1.53   0.125    -.0785201    .6410908
       _cons |  -7.998821   1.645493    -4.86   0.000    -11.22393   -4.773715
-------------+----------------------------------------------------------------
     /athrho |   1.003006   1.329345     0.75   0.451    -1.602462    3.608475
    /lnsigma |  -.5072924   .1124413    -4.51   0.000    -.7276732   -.2869115
-------------+----------------------------------------------------------------
         rho |   .7628539   .5557379                     -.9220383     .998533
       sigma |   .6021237   .0677035                      .4830316    .7505781
      lambda |   .4593324   .3766132                     -.2788158    1.197481
------------------------------------------------------------------------------
Wald test of indep. eqns. (rho = 0): chi2(1) =     0.57   Prob > chi2 = 0.4505

Code:

margins, dydx(*) expression(normal(predict(xbsel))) //for AMEs of selection equation

Code:

Average marginal effects                        Number of obs     =        472
Model VCE    : Robust

Expression   : normal(predict(xbsel))
dy/dx w.r.t. : x1 x2 round 1.x3 x4 x5 x6 x7 1.x8 x9 1.x10 1.x11 1.x12 x13 x14 1.x15 x16 x17 x18 1.x19 1.x20

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          x1 |          0  (omitted)
          x2 |          0  (omitted)
       round |  -.0344662   .0111654    -3.09   0.002      -.05635   -.0125823
        1.x3 |  -.0051114   .0298637    -0.17   0.864    -.0636432    .0534204
          x4 |   .0000647   .0000153     4.22   0.000     .0000347    .0000948
          x5 |  -.0026047   .0115287    -0.23   0.821    -.0252005    .0199911
          x6 |   .0059726   .0093806     0.64   0.524    -.0124131    .0243583
          x7 |   .0033946   .0012834     2.64   0.008     .0008791    .0059101
        1.x8 |  -.0128698   .0297024    -0.43   0.665    -.0710854    .0453457
          x9 |   .0431937   .0113826     3.79   0.000     .0208843    .0655031
       1.x10 |   .0406141   .0486084     0.84   0.403    -.0546566    .1358847
       1.x11 |   .0174241   .0212462     0.82   0.412    -.0242176    .0590658
       1.x12 |  -.0264907   .0569879    -0.46   0.642     -.138185    .0852036
         x13 |  -.0112391   .0229835    -0.49   0.625     -.056286    .0338078
         x14 |   .0001371    .000378     0.36   0.717    -.0006037     .000878
       1.x15 |   -.006072   .0457602    -0.13   0.894    -.0957605    .0836164
         x16 |  -.0013274   .0004467    -2.97   0.003    -.0022029   -.0004518
         x17 |   .0276965   .0172296     1.61   0.108    -.0060728    .0614658
         x18 |   .0094019   .0026424     3.56   0.000      .004223    .0145808
       1.x19 |  -.0441527   .0200581    -2.20   0.028    -.0834659   -.0048395
       1.x20 |  -.0544046   .0450362    -1.21   0.227     -.142674    .0338648
------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.

Or did I do anything else wrong?
Any help is highly appreciated!
Thanks

↧

Taking Average by id

December 9, 2019, 3:54 am

≫ Next: Margins and regression output

≪ Previous: Pooled panel Heckman MLE vs. pooled panel Heckman Two-step

Hi Everyone,
I have a dataset looking like this:

Date id var1 var2 var3 var4...
2010 1 ... ... ... ...
2010 1 ... ... ... ...
2010 1 ... ... ... ...
2010 2 ... ... ... ...
2010 2 ... ... ... ...
2010 2 ... ... ... ...
2010 3 ... ... ... ...
2010 3 ... ... ... ...
2010 3 ... ... ... ...

I'd like to find mean of each variable by id. That is, at the end, I want to achieve something like this:

Date id mean_var1 mean_var2 mean_var3 mean_var4...
2010 1 ... ... ... ...
2010 2 ... ... ... ...
2010 3 ... ... ... ...

It would be great if you can help. Thank you in advance.
Best,

↧

Margins and regression output

December 9, 2019, 4:36 am

≫ Next: Is there any way to get -Look at these example(s)- after -unicode translate-?

≪ Previous: Taking Average by id

Dear Stata Users,

I am analyzing the probability that cohabiting couples will marry or dissolve according to certain characteristics.
I am using a multinomial logit, where marriage and cohabitation are considered competing risks.

I show the results that I get according to couple employment (employ2) on marriage (omitting dissolution)
by using the output of the multinomial.
I controlled for a series of other characteristics that I omitted.

uniontype2 = occurrence of marriage (or dissolution)

svy: mlogit uniontype2 i.durationtype i.agesm2_f##i.mage_gap_cat i.wave5 i.employ2 ib0.evunionpr i.shared i.educ2_f i.homog i.tercilehh if marital==3&duratyrs3>0&duratyrs3<6&educ2_m<4&educ2 _f<4&sampst==1&ivfioall==1&(minagentry>18&minagent ry<37)&employ2<5, level(90) rrr base

employ2 | Coef SE t P -value 90% CIs
Both employed | 1 (base)
Male unemployed | .4910494 .2654903 -1.32 0.189 .2013042 1.197837
Female unemployed | .7248931 .2145792 -1.09 0.278 .4448765 1.181159
Both unemployed | .7624074 .4288538 -0.48 0.630 .3014883 1.927985

All the results are not statistically significant.

If I compute the margins and test differences, such that

margins r.employ2, level(90) pr(out(1)) pr(out(2)) post atmeans*

I get that the following couple type (Male not employed/female employed) has a significant lower risk of marriage than those
who are both employed (I omitted dissolution)

Contrast Std. Err. [90% Conf. Interval]

(Male unemployed vs Both employed) -.0751986 .0345901 -.1322493 -.0181479

(Female unemployed vs Both employed) -.037832 .0281888 -.0843248 .0086607

(Both unemployed vs Both employed) 2 -.0317464 .0541869 -.1211187 .057626

I am wondering why there is this difference. I thought that both should have the same level of significant.
Is there a reason why these results diverge?
Which output should I trust if want to understand the relationship between being in a certain couple type
and the probability of entering a marriage or going through a dissolution?

Thank you.
Best,
Lydia

*The result remains also by using as observed

↧

Is there any way to get -Look at these example(s)- after -unicode translate-?

December 9, 2019, 4:41 am

≫ Next: Which regression should I use?

≪ Previous: Margins and regression output

Dear Forum,

I want to know whether it is possible to access the information displayed below -assertion is false- message after -unicode translate- command.
It seems that -set output- command does not help and there is no such macro containing those results. (I am using Stata 15.1 in Windows)
Below is an example of what I want to access, especially the -Look at these example(s)- part.

Code:

assertion is false
  9
          --------------------------------------------------------------------------------------------------------------
          Some elements of the file appear to be UTF-8 already.  Sometimes elements that need translating can look
          like UTF-8.  Look at these example(s):
              value-label contents "ȫöȣ "
          Do they look okay to you?
          If not, the file needs translating or retranslating with the transutf8 option.  Type
              . unicode   translate "2015 SSK survey (HRC151201).dta", transutf8
              . unicode retranslate "2015 SSK survey (HRC151201).dta", transutf8
          --------------------------------------------------------------------------------------------------------------
          File successfully translated

  File summary:
      all files successfully translated

I need those information since I want to embed -unicode translate- in my personal command and do not want to print a bunch of messages generated from -unicode translate- command; It would be great if I can show the might-be-wrong example(s) only.

Thank you in advance

↧

Which regression should I use?

December 9, 2019, 5:04 am

≫ Next: Moderation - Panel Data Analysis

≪ Previous: Is there any way to get -Look at these example(s)- after -unicode translate-?

Hello everybody,

I am stuck right now with the following problem. I have two hypothesis which I have to test in Stata and I am not sure, which regression I should use to come up with a proper answer?
Maybe the Hausman?

H1: X has a positive effect on Y.
H2: X positively moderates the relationship between Y and Z

Thanks to everybody for a proper answer!

Best,
Max

↧

Moderation - Panel Data Analysis

December 9, 2019, 5:16 am

≫ Next: metan forest plot formatting

≪ Previous: Which regression should I use?

Dear Stata experts,

I am trying to test the influence of my dummy moderator (high tech industry =1, not high tech =0) on the relationship between R&D (lagged with 1 year, lagrd) and Return on Sales (ros100) with panel data ranging from 2005 to 2010. As controls I included a dummy for the company being from the US or not being from the US, year dummies, and a dummy for marketing intensity. Hausman test argues a Fixed effect model.

So far so good. Hereafter I get some difficulties. I am not sure about:

1. which model to use. Using the fixed effects model will omit my US dummy, is it justified to just use random effects model instead?
2. if my input to see the results is correct: xtreg ros100 i.du_high_tech##c.lagrd, re

And when it is correct; what should be interpreted as the overall results.

Code:

. xtreg ros100  i.du_high_tech##c.lagrd, re

Random-effects GLS regression                   Number of obs     =      1,620
Group variable: company_id                      Number of groups  =        324

R-sq:                                           Obs per group:
     within  = 0.6199                                         min =          5
     between = 0.9172                                         avg =        5.0
     overall = 0.7350                                         max =          5

                                                Wald chi2(3)      =    4482.85
corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000

--------------------------------------------------------------------------------------
              ros100 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
---------------------+----------------------------------------------------------------
      1.du_high_tech |   .5100225   .0539294     9.46   0.000     .4043229    .6157222
               lagrd |   .0323437   .8364303     0.04   0.969     -1.60703    1.671717
                     |
du_high_tech#c.lagrd |
                  1  |  -3.476529   .8380226    -4.15   0.000    -5.119023   -1.834035
                     |
               _cons |   .0004879   .0430304     0.01   0.991    -.0838501     .084826
---------------------+----------------------------------------------------------------
             sigma_u |          0
             sigma_e |  .84690728
                 rho |          0   (fraction of variance due to u_i)
--------------------------------------------------------------------------------------

Thanks in advance for your help, it would be greatly appreciated!

Sjors

↧

metan forest plot formatting

December 9, 2019, 5:38 am

≫ Next: Order option with interactions using esttab

≪ Previous: Moderation - Panel Data Analysis

Hello,

As a novice Stata user I humbly seek insight from those more seasoned than myself with regards to a forest plot containing both subgroups and study names.

The plot displays the subgroup designation below the subgroups themselves (ex. Age is below the heading of age>65) as seen in the plot below.

My code is as follows:
metan logES logll logul, lcols(categoryb study) by(category) title (Subgroup Survival) favours(CDK inhibitor # placebo) randomi eform nooverall texts(200)

1. Is there an approach to move the main subgroup designation above the subgroups?
2. For visualization, since my forest plot is a tad crowded, does code exist to add space between each line of text in the forest plot?

Thank you kindly,
Adil

Array

↧

Order option with interactions using esttab

December 9, 2019, 5:48 am

≫ Next: Generating a date variable from a numeric variable after a reshape

≪ Previous: metan forest plot formatting

I am attempting to order the coefficients in my output using the "order" option, however, I cannot seem to manipulate the position of my interaction terms. Does anyone have syntax advice?

Code:

 essto: areg outcome v1
essto: areg outcome v2
essto: areg outcome c.v1##c.v2

esttab using "$filepath\output.csv", ///
n se nobaselevels ar2 aic bic replace ///
order (v1 v2 c.v1##c.v2)

↧

Generating a date variable from a numeric variable after a reshape

December 9, 2019, 5:59 am

≫ Next: Help: Sequential Count in Between Two Variable Values

≪ Previous: Order option with interactions using esttab

Hello.

I'm working with a dataset that looks like this:

company / numeric_date / v1 / v2 / .../ vk
A / 1 / .4 / .2 / ... / ..
A / 2 / .3 / .2 / ... / ..
A / .. / . / . / ... / ..
A / 60 / .6 / .3 / ... / ..
B / 1 / .4 / .2 / ... / ..
B / 2 / .4 / .2 / ... / ..
.. / ... / ... / .... / ... / ..
Z / 60 / .5 / .6 / ... / ..

As you can see, the dataset is a panel where I have Z companies (64) with information for k variables (5 variables) and a variable called 'numeric_date' which ranges from 1 to 60. This specific variable is the one I need to use to generate date variable with the following characteristics:

-the starting date is 01-01-2015, which corresponds to numeric_date==1
-the variable varies monthly, which means that with a marginal increase in numeric_date, the date should increase in one month. numeric_date==2, date= 02-01-2015
-this means that the value for numeric_date==60 is 12-01-2019

This outcome is the result of a reshape I made, with the following do-file:

clear
import excel "/Users/nicolasmorales/Downloads/export para stata reshape-2.xlsx", sheet("V21") firstrow clear
duplicates report company
duplicates tag entidad, gen(drop)
drop if drop>0
duplicates report company
drop drop
reshape long a b c d e f, i(company) j(numeric_date)
rename a v21
rename b v22
rename c v30
rename d v461
rename e roa
rename f v43
*here is where I need to generate the date variable

I thought of doing the process manually, by generating the date variable and replacing the 60 values one by one, but I figured there must be a more efficient way to do this process.

Any suggestions are much appreciated!!

Thank you

↧