Convert SAS PROC MIXED to Stata code

November 18, 2018, 7:29 pm

≫ Next: Inserting observations to fill gaps in a list created from survival data

≪ Previous: ASCI data to Stata format without a do-file

Hi all, I'm not sure that this is the best forum for this question but am not sure of another one. I need to convert SAS code from PROC MIXED to Stata code. Here is the SAS code:

Code:

proc mixed data=data1 noclprint method=reml covtest;

class batch bor mol pop var1 var2 var3;

model dep = var1 var2 var3 / solution;

random intercept / sub=pop type=un S G;

random intercept / sub=mol type=un S G;

random intercept / sub=bor type=un S G;

run;

Here is the Stata code I have come up with. It doesn't give exactly the same results. I noticed part of the SAS code specifies type=un, which refers to an unstructured, that is, arbitrary covariance matrix. Is there an equivalent Stata option for that, maybe that is why the results don't match? Or maybe there is something else wrong with my Stata code? I'm not sure I got the random effects right.

Code:

mixed dep var1 var2 var3 || _all: R.bor || _all: R.mol || pop:, reml

Any advice much appreciated.

↧

Inserting observations to fill gaps in a list created from survival data

November 18, 2018, 8:17 pm

≫ Next: Sequence by groups

≪ Previous: Convert SAS PROC MIXED to Stata code

Hi,

I am looking at the timing of an event. I have a dataset that looks like this (not actual data):

time	begin	fail
1	100	3
2	97	2
3	94	0
4	93	5
5	87	1
8	85	1
9	82	2
10	80	3
11	75	1

time = The number of days since the beginning of the follow-up period
begin = The number of observations that were still in the study at the beginning of this day
fail = the number of fails on this day.
Note: Some observations exited due to reasons other than failing. This is why the begin != begin[_n-1] - fail[_n-1]

I got to here by using the following code:

stset time, failure(fail)
quietly sts list, saving(list, replace)
use list, clear
keep time fail begin

As you can see, Time=6 and Time=7 are missing. This is because there were no fails and no non-fail exits on these days.

I would like to insert the missing observations, which would look like this:

Time	Begin	Fail
6	85	0
7	85	0

What is the best way to go about doing this? Could I simply merge a list of numbers from min to max time values and then code in the missing values? Or is there a better way?

Eg:

merge 1:1 time using "[file - a list of numbers from the min time value to the max time value]", nogen
gsort -Time
replace Begin = Begin[n-1] if missing(Begin)
replace Fail=0 if missing(Fail)

To me this seems to solve my problem, but I have read that you should never create observations in stata. I would appreciate feedback as to whether I'm achieving what I think I'm achieving, or if there is a better way of doing this.

Any advice is much appreciated!

A

↧

Sequence by groups

November 19, 2018, 1:02 am

≫ Next: mimrgns – problem of including ‘.class’ file format

≪ Previous: Inserting observations to fill gaps in a list created from survival data

Hi Statlist!

I have a dataset of the following structure:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float agefirm double(Year idfirm)
 4 2008  1
 4 2009  1
 4 2010  1
 4 2011  1
 4 2012  1
 4 2013  1
 4 2014  1
 4 2015  1
13 2004  2
13 2005  2
13 2006  2
13 2007  2
13 2008  2
13 2009  2
13 2010  2
13 2011  2
13 2012  2
13 2013  2
13 2014  2
13 2015  2
13 2004  3
13 2005  3
13 2006  3
13 2007  3
13 2008  3
13 2009  3
13 2010  3
13 2011  3
13 2012  3
13 2013  3
13 2014  3
13 2015  3
 9 2004  4
 9 2005  4
 9 2006  4
 9 2007  4
 9 2008  4
 9 2009  4
 9 2010  4
 9 2011  4
 9 2012  4
 9 2013  4
 9 2014  4
 9 2015  4
 0 2007  5
34 2004  6
34 2005  6
34 2006  6
34 2007  6
34 2008  6
34 2009  6
 8 2007  7
 8 2008  7
 8 2009  7
 8 2010  7
 8 2011  7
 8 2012  7
 8 2013  7
 8 2014  7
 8 2015  7
26 2004  8
26 2005  8
26 2006  8
26 2007  8
26 2008  8
26 2009  8
26 2010  8
26 2011  8
26 2012  8
26 2013  8
26 2014  8
26 2015  8
13 2004  9
13 2005  9
13 2006  9
13 2007  9
13 2008  9
13 2009  9
13 2010  9
13 2011  9
13 2012  9
13 2013  9
13 2014  9
13 2015  9
 1 2004 10
 6 2008 11
 6 2009 11
 6 2010 11
 6 2011 11
 6 2012 11
 6 2013 11
 6 2014 11
 3 2012 12
 3 2013 12
 3 2014 12
 3 2015 12
 4 2004 13
 4 2005 13
 4 2006 13
 4 2007 13
end

What I would like to do is to give a progressively increase to the age so that starting from the initial Year of each firm (this differs from firm to firm, for instance is 2008 for the first one, 2004 for the second one and so on...), so that for instance I end up with:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float agefirm double(Year idfirm)
 4 2008  1
 5 2009  1
 6 2010  1
 7 2011  1
 8 2012  1
 9 2013  1
 10 2014  1
 11 2015  1
13 2004  2
14 2005  2
15 2006  2
16 2007  2
17 2008  2
18 2009  2
19 2010  2
20 2011  2
21 2012  2
22 2013  2
23 2014  2
24 2015  2
13 2004  3
14 2005  3
15 2006  3
16 2007  3
16 2008  3
17 2009  3
18 2010  3
19 2011  3
20 2012  3
21 2013  3
22 2014  3
23 2015  3
 9 2004  4
 10 2005  4
 11 2006  4
 12 2007  4
 13 2008  4
 14 2009  4
 15 2010  4
 16 2011  4
 17 2012  4
 18 2013  4
 19 2014  4
 20 2015  4
 0 2007  5
34 2004  6
35 2005  6
36 2006  6
37 2007  6
38 2008  6
39 2009  6
 ...
end

Many thanks!

↧

mimrgns – problem of including ‘.class’ file format

November 19, 2018, 2:45 am

≫ Next: Do I have to check multiple linear regression assumptions before running panel regression models such as fixed-or random effects regression?

≪ Previous: Sequence by groups

I would like to calculate average marginal effects after running a linear regression with multiple imputed values.
I'm accessing my data by using a remote access. This just allows me to include do-files and ado-files. I found the mimrgns ado-file but realized when including the ado-file that there are 3 files to include (1) mimrgns.ado, (2) mimrgns_estimate.ado and (3) mimrgns_work.class
As I can't use the file format '.class' when using the remote access I wonder if I can skip the inclusion of 'mimrgns_work.class' and just include the mimrgns.ado and mimrgns_estimate.ado in my do-file program.
The command seems to work also without mimrgns_work.class. What does mimrgns_work.class do? When do I need 'mimrgns_work.class'?

My relevant commands are

adopath ++${prog}
which mimrgns.ado
which mimrgns_estimate.ado
*which mimrgns_work.class <-- I cannot include this when using the remote access

mi estimate, saving(miestfile, replace) esample(esample): xtreg logwage i.man##c.unemployment i.education , fe robust
mimrgns using miestfile, esample(esample) dydx(unemployment) at(unemployment=(0(90)720) man=(0 1))

↧

Do I have to check multiple linear regression assumptions before running panel regression models such as fixed-or random effects regression?

November 19, 2018, 12:48 pm

≫ Next: Date local option

≪ Previous: mimrgns – problem of including ‘.class’ file format

Dear research community,

I currently doing research on the determinants of innovation activity across 30 OECD countries. Particularly, I am employing OECD Triadic Patent Families as the DV which is considered to be a continuous variable as fractions are used to allocate respective contributions if inventors with different nationalities hold a single patent. When running appropriate tests (F test, Breusch and Pagan LM test and Hausman Test) the results indicate that a fixed effects regression model is most appropriate. However, I am wondering if I have to check "multiple regression assumptions" ((linearity, homoskedasticity, residual approx normal distributed etc) before?

↧

Date local option

November 19, 2018, 1:34 pm

≫ Next: Varying coefficients after re-arranging variable order following xtreg command

≪ Previous: Do I have to check multiple linear regression assumptions before running panel regression models such as fixed-or random effects regression?

Hi all!

I am running STATA 13 and got sucked with the following code:
gen launch=date(local,"YM")

In particular, it seems that sic instal does not work here being "local" an option of date. How can I read that line of code also in STATA 13?

Many thanks!

↧

Varying coefficients after re-arranging variable order following xtreg command

November 19, 2018, 1:57 pm

≫ Next: Using twoway rcap to graph odds ratios

≪ Previous: Date local option

Hi Community,

I have a weird regression problem that seems to have not occurred before (to the best of my knowledge at least).

The outputs of two regressions with identical variables, options and of the same type give different regression coefficients.

I am interested in the effect of different time periods (T) around an event (at T = 6). If I swap quarter and year fixed effects, coefficients change (also that of the focal variable T) as well as corr(u_i, Xb). This happens using both Windows and macOS versions of Stata. The data has been xtset to the identifier that is also the clustering variable (as panel variable) and Date (time variable), which is essentially a combination of Quarters and Years.

Please see the outputs below:

Number one:

Code:

.                 xtreg ln_ARPU ib6.T i.quarter i.year if Classification <= 1,    fe    vce(cluster    num_ID_TransactionXopera
> tor)
note: 2015.year omitted because of collinearity

Fixed-effects (within) regression               Number of obs     =      1,974
Group variable: num_ID_Tra~r                    Number of groups  =        169

R-sq:                                           Obs per group:
within  = 0.2073                                         min =          2
between = 0.0005                                         avg =       11.7
overall = 0.0014                                         max =         13

F(29,168)         =      14.35
corr(u_i, Xb)  = -0.0547                        Prob > F          =     0.0000

(Std. Err. adjusted for 169 clusters in num_ID_TransactionXoperator)

Robust
ln_ARPU       Coef.   Std. Err.      t    P>t     [95% Conf. Interval]

T
0     .1464512   .0222273     6.59   0.000     .1025704    .1903321
1     .1104406   .0241675     4.57   0.000     .0627294    .1581518
2     .0917803   .0210803     4.35   0.000     .0501638    .1333968
3     .0891851   .0188199     4.74   0.000      .052031    .1263391
4     .0392178   .0141845     2.76   0.006      .011215    .0672206
5      .004495   .0175504     0.26   0.798    -.0301528    .0391428
7    -.0257231   .0097921    -2.63   0.009    -.0450545   -.0063916
8    -.0262767   .0151852    -1.73   0.085    -.0562551    .0037016
9    -.0324765   .0168639    -1.93   0.056     -.065769    .0008159
10    -.0492936   .0192576    -2.56   0.011    -.0873117   -.0112756
11    -.0555349   .0206217    -2.69   0.008     -.096246   -.0148239
12      -.10436   .0215756    -4.84   0.000    -.1469541   -.0617659

quarter
2     .0323722   .0065651     4.93   0.000     .0194115     .045333
3     .0449597   .0074739     6.02   0.000     .0302048    .0597147
4     .0341042   .0080369     4.24   0.000      .018238    .0499705

year
2001    -.0241788   .0208703    -1.16   0.248    -.0653806     .017023
2002    -.0588143   .0368366    -1.60   0.112    -.1315366     .013908
2003     -.133443   .0604899    -2.21   0.029    -.2528613   -.0140247
2004    -.1148378   .0605792    -1.90   0.060    -.2344323    .0047568
2005    -.0774621   .0631883    -1.23   0.222    -.2022075    .0472834
2006    -.0303142   .0685139    -0.44   0.659    -.1655733    .1049449
2007    -.0331721   .0806189    -0.41   0.681    -.1923288    .1259846
2008      .119141   .0682179     1.75   0.083    -.0155338    .2538158
2009    -.0426798    .066156    -0.65   0.520    -.1732841    .0879245
2010    -.0884542   .0532357    -1.66   0.098    -.1935513    .0166428
2011    -.0255585   .0466674    -0.55   0.585    -.1176886    .0665716
2012     .0457716    .036172     1.27   0.207    -.0256386    .1171817
2013     .0259649   .0270016     0.96   0.338    -.0273413    .0792712
2014      .071395   .0185325     3.85   0.000     .0348085    .1079815
2015            0  (omitted)

_cons    2.473844   .0318103    77.77   0.000     2.411045    2.536644

sigma_u    1.049576
sigma_e   .16235026
rho   .97663265   (fraction of variance due to u_i)


.
end of do-file

Number two:

Code:

. xtreg ln_ARPU ib6.T i.year i.quarter if Classification <= 1, fe vce(cluster num_ID_TransactionXoperator)
note: 4.quarter omitted because of collinearity

Fixed-effects (within) regression               Number of obs     =      1,974
Group variable: num_ID_Tra~r                    Number of groups  =        169

R-sq:                                           Obs per group:
     within  = 0.2073                                         min =          2
     between = 0.0872                                         avg =       11.7
     overall = 0.0814                                         max =         13

                                                F(29,168)         =      14.35
corr(u_i, Xb)  = 0.1252                         Prob > F          =     0.0000

          (Std. Err. adjusted for 169 clusters in num_ID_TransactionXoperator)
------------------------------------------------------------------------------
             |               Robust
     ln_ARPU |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           T |
          0  |   .0782427   .0265134     2.95   0.004     .0259004    .1305851
          1  |   .0536002   .0276805     1.94   0.054    -.0010463    .1082467
          2  |    .046308   .0226806     2.04   0.043     .0015323    .0910836
          3  |   .0550808    .017973     3.06   0.003     .0195989    .0905628
          4  |   .0164817   .0163601     1.01   0.315    -.0158163    .0487796
          5  |  -.0068731   .0169055    -0.41   0.685    -.0402476    .0265014
          7  |   -.014355   .0102733    -1.40   0.164    -.0346363    .0059264
          8  |  -.0035406    .017086    -0.21   0.836    -.0372715    .0301904
          9  |   .0016277   .0197344     0.08   0.934    -.0373316     .040587
         10  |  -.0038213   .0240786    -0.16   0.874    -.0513568    .0437143
         11  |   .0013055   .0269264     0.05   0.961    -.0518523    .0544632
         12  |  -.0361515   .0280236    -1.29   0.199    -.0914753    .0191723
             |
        year |
       2001  |  -.0696511    .023932    -2.91   0.004    -.1168973   -.0224049
       2002  |   -.149759   .0413828    -3.62   0.000    -.2314562   -.0680617
       2003  |  -.2698599   .0701849    -3.84   0.000    -.4084179    -.131302
       2004  |  -.2967271   .0775532    -3.83   0.000    -.4498315   -.1436226
       2005  |  -.3048237    .087558    -3.48   0.001    -.4776794   -.1319681
       2006  |  -.3031482   .0949074    -3.19   0.002    -.4905131   -.1157833
       2007  |  -.3514784   .1268056    -2.77   0.006    -.6018162   -.1011407
       2008  |  -.2446377   .1089912    -2.24   0.026    -.4598064   -.0294689
       2009  |  -.4519307   .1088836    -4.15   0.000    -.6668871   -.2369744
       2010  |  -.5431775   .1155882    -4.70   0.000    -.7713701    -.314985
       2011  |  -.5257541   .1243861    -4.23   0.000    -.7713153    -.280193
       2012  |  -.4998964    .131967    -3.79   0.000    -.7604237   -.2393691
       2013  |  -.5651754   .1427426    -3.96   0.000    -.8469756   -.2833751
       2014  |  -.5652176   .1517425    -3.72   0.000    -.8647854   -.2656498
       2015  |  -.6820849   .1607377    -4.24   0.000    -.9994108   -.3647591
             |
     quarter |
          2  |   .0210041   .0057406     3.66   0.000     .0096711    .0323372
          3  |   .0222236   .0068018     3.27   0.001     .0087956    .0356515
          4  |          0  (omitted)
             |
       _cons |   2.862927   .0925403    30.94   0.000     2.680236    3.045619
-------------+----------------------------------------------------------------
     sigma_u |  1.0101402
     sigma_e |  .16235026
         rho |  .97481936   (fraction of variance due to u_i)
------------------------------------------------------------------------------

.

Does someone have an idea what could cause the issue?

Any help is highly appreciated! Thanks!

Bests

↧

Using twoway rcap to graph odds ratios

November 19, 2018, 2:33 pm

≫ Next: How to create a loop to make charts by group (country) using tsline?

≪ Previous: Varying coefficients after re-arranging variable order following xtreg command

Hi Stata members,
I am trying to use the twoway rcap command to graph my regression coefficients over time using this command:

twoway rcap UL LL YEAR || connected IRR YEAR, ///
yscale(log) sort legend(off) yline(1) ytitle(Incidence Risk Ratios & 95% CI) ///
xlabel(2007(1)2014) title("State", box bexpand)

Array

I am running into problems with the upper spikes. They appear to be confidence intervals but do not reflect my data. I am also trying to ensure that I graph the odds of the incidence rate ratios as is recommended. Below is sample data. I am using STATA 14.

	IRR	LL	UL
2007	1
2008	0.93	0.87	0.99
2009	0.74	0.53	1.04
2010	0.71	0.55	0.93
2011	0.74	0.62	0.89
2012	0.99	0.92	1.07
2013	0.82	0.72	0.94

I am grateful for any help that I receive.

Thanks
Julie

↧

How to create a loop to make charts by group (country) using tsline?

November 19, 2018, 3:22 pm

≫ Next: Parallel trends assumption test for a two way fixed effect model

≪ Previous: Using twoway rcap to graph odds ratios

Hi all,

I have quarterly panel data for 22 countries from 1990Q1 to 2017Q4. I would like to make a charts using tsline, by country, for a given variable X. I would like to do construct a loop for this purpose.

Kindly advise on how to create a loop to create charts using tsline by country. Many thanks.

Cheers.

↧

Parallel trends assumption test for a two way fixed effect model

November 19, 2018, 8:27 pm

≫ Next: Principal Component Analysis

≪ Previous: How to create a loop to make charts by group (country) using tsline?

Hi everyone,

I'm trying to test for the parallel trend assumption in the difference in difference. I'm trying to estimate this two way fixed effect model.

xtpcse lunemployment_rate uber i.uber##time_fe2-timefe97 CMA_fe2-CMA_fe34 time_fe2-time_fe97

My panel data is monthly. Uber is a dummy variable that is equal to one whenever uber enters a Metropolitan area and zero if not. Uber didn't enter into some areas which make it zero for all months for this area. I have a metropolitan area fixed effects, time fixed effects. The time fixed effects are generated from month and year using this code: "gen month_year = ym(YEAR, MONTH)
format month_year %tm" and then
"tab month_year, gen(time_fe)"

I'm trying to interact my uber dummy variable with the time fixed effects in order to have leads and lags that I can test with the parallel trends assumption. Please see below a sample of my data.I have 97 periods per metropolitan area. I only included two fixed effects variables down.

I have two questions:
1) How can I include only months fixed effects if my data is a panel? (It gave me an error message when I tried to do so, I resorted then to a year and month fixed effects combined)

2) How do I generate the leads and lags interaction factor in order to test for the parallel trends assumption for the difference in difference?

please let me know if you need any more information. I have been trying to generate this interaction factor for the past 3 days. Any help is appreciated.
Thank you,
Ahmad

* Example generated by -dataex-. To install: ssc install dataex
clear
input float uber byte(time_fe2 time_fe3) float month_year byte MONTH int YEAR
0 0 0 605 6 2010
0 1 0 606 7 2010
0 0 1 607 8 2010
0 0 0 608 9 2010
0 0 0 609 10 2010
0 0 0 610 11 2010
0 0 0 611 12 2010
0 0 0 612 1 2011
0 0 0 613 2 2011
0 0 0 614 3 2011
0 0 0 615 4 2011
0 0 0 616 5 2011
0 0 0 617 6 2011
0 0 0 618 7 2011
0 0 0 619 8 2011
0 0 0 620 9 2011
0 0 0 621 10 2011
0 0 0 622 11 2011
0 0 0 623 12 2011
0 0 0 624 1 2012
0 0 0 625 2 2012
0 0 0 626 3 2012
0 0 0 627 4 2012
0 0 0 628 5 2012
0 0 0 629 6 2012
0 0 0 630 7 2012
0 0 0 631 8 2012
0 0 0 632 9 2012
0 0 0 633 10 2012
0 0 0 634 11 2012
0 0 0 635 12 2012
0 0 0 636 1 2013
0 0 0 637 2 2013
0 0 0 638 3 2013
0 0 0 639 4 2013
0 0 0 640 5 2013
0 0 0 641 6 2013
0 0 0 642 7 2013
0 0 0 643 8 2013
0 0 0 644 9 2013
0 0 0 645 10 2013
0 0 0 646 11 2013
0 0 0 647 12 2013
0 0 0 648 1 2014
0 0 0 649 2 2014
0 0 0 650 3 2014
0 0 0 651 4 2014
0 0 0 652 5 2014
0 0 0 653 6 2014
0 0 0 654 7 2014
0 0 0 655 8 2014
0 0 0 656 9 2014
0 0 0 657 10 2014
0 0 0 658 11 2014
0 0 0 659 12 2014
0 0 0 660 1 2015
0 0 0 661 2 2015
0 0 0 662 3 2015
0 0 0 663 4 2015
0 0 0 664 5 2015
0 0 0 665 6 2015
0 0 0 666 7 2015
0 0 0 667 8 2015
0 0 0 668 9 2015
0 0 0 669 10 2015
0 0 0 670 11 2015
0 0 0 671 12 2015
0 0 0 672 1 2016
0 0 0 673 2 2016
0 0 0 674 3 2016
0 0 0 675 4 2016
0 0 0 676 5 2016
0 0 0 677 6 2016
0 0 0 678 7 2016
0 0 0 679 8 2016
0 0 0 680 9 2016
0 0 0 681 10 2016
0 0 0 682 11 2016
0 0 0 683 12 2016
0 0 0 684 1 2017
0 0 0 685 2 2017
0 0 0 686 3 2017
0 0 0 687 4 2017
0 0 0 688 5 2017
0 0 0 689 6 2017
0 0 0 690 7 2017
0 0 0 691 8 2017
0 0 0 692 9 2017
0 0 0 693 10 2017
0 0 0 694 11 2017
0 0 0 695 12 2017
0 0 0 696 1 2018
0 0 0 697 2 2018
0 0 0 698 3 2018
0 0 0 699 4 2018
0 0 0 700 5 2018
0 0 0 701 6 2018
0 0 0 605 6 2010
0 1 0 606 7 2010
0 0 1 607 8 2010
end
format %tm month_year
[/CODE]
------------------ copy up to and including the previous line ------------------

Listed 100 out of 3298 observations

↧

Principal Component Analysis

November 19, 2018, 11:18 pm

≫ Next: putpdf causes Stata to crash

≪ Previous: Parallel trends assumption test for a two way fixed effect model

I am working on a paper to build a state export readiness index. For this purpose, I used principal components analysis to get a single index for my variables. However, the variable ranges from -2.5 to 2.5. And for the export readiness index, negative index does not make any sense. Is there any method to rescale the index to positive numbers?

↧

putpdf causes Stata to crash

November 19, 2018, 11:32 pm

≫ Next: Robust OR Clustered Std Errors & Weighting issues with DHS data

≪ Previous: Principal Component Analysis

Hi all,

I'm using putpdf and it is causing Stata to crash. I'm on a Windows machine and by "crash" I mean that Stata stops responding and I have to force the application to close. Stata does not supply an error message.

I unfortunately do not have a reproducible example because Stata seems to crash at different points every time I run, and I'm not sure exactly what is causing the issue (and I'm not able to share my actual data).

Stata appears to always crash when putpdf is writing to cells in an already existing table. Additionally, the do file uses forvalues loops. When I manually run through one iteration of this loop, the program does not crash and the PDF is successfully written.

As I said, Stata crashes at different lines each time I run it, but here's one example line that has caused it to crash:

Code:

  putpdf table tsheet`respnum'(`row',1) = ("SUCCESSFUL INTERVIEW INFORMATION"), bold colspan(2) bgcolor(gray)

`respnum' is the counter for my forvalues loop. When comment out the forvalues loop and instead manually set

Code:

local respnum = 1

, the do file runs successfully.

Any ideas what might be causing this strange behavior? Anyone else experienced Stata crashing (rather than displaying an error) when using putpdf?

Thank you!!

↧

Robust OR Clustered Std Errors & Weighting issues with DHS data

November 19, 2018, 11:50 pm

≫ Next: Overlay two marginsplots where one treats x as a factor variable and the other treats x as continuous

≪ Previous: putpdf causes Stata to crash

Hi,

Firstly, I am unsure on whether to use robust or clustered standard errors for a regression analysis I am doing using DHS data. I always use robust, but in this case, I want to know which one is the better option and why (if possible)?

Secondly, I am doing a multiple country analysis, but I didn't know that you had to assign weights when using DHS data. Now, I don't know how to assign weights after appending the datasets of the countries. Is it necessary to do so? Also, what if I don't use the weights, does it make that much of a difference?

↧

Overlay two marginsplots where one treats x as a factor variable and the other treats x as continuous

November 20, 2018, 12:44 am

≫ Next: entropyetc not byable for string variables?

≪ Previous: Robust OR Clustered Std Errors & Weighting issues with DHS data

Dear Statalist. This is my first post.
Using Stata 15.1
I have one variable (x) with discrete values, representing bins of test scores. I wish to visually compare predicted values of y for values of this x-variable from two models, where one model treats x as a factor variable, and the other treats it as continuous, as I want to illustrate to what extent the association between x and y is primarily driven by the top or the bottom of the distribution of x.

My data is on a secure server, but I've recreated the problem with the auto dataset. Here, y is headroom and x is trunk.

Code:

sysuse auto, clear
drop if trunk > 17
reg headroom i.trunk weight length
margins trunk, atmeans post saving(file1, replace) // saving for combomarginsplot
estimates store dummies // storing for coefplot
marginsplot, recast(line) plotopts(color(blue%50)) ciopt(recast(rarea) color(blue%50))

This produces a nice graph of predicted values of headroom at different values of trunk, and at the means for other variables.

Making the plot where trunk is treated as continuous is also simple:

Code:

reg headroom trunk weight length
margins, at(trunk=(5(1)17)) atmeans post saving(file2, replace) // saving for combomarginsplot
estimates store continuous // storing for coefplot
marginsplot, recast(line) plotopts(color(red%50)) ciopt(recast(rarea) color(red%50))

But i cannot find a way to overlay these plots. Using coefplot I only get a plot for the continous specification, and combomarginsplot produces an error:

Code:

. coefplot continuous dummies, at recast(line) ciopt(recast(rarea) )
(dummies: could not determine 'at')

. combomarginsplot file1 file2
Warning: statistics differ for trunk: file 1=mean, file 2=values;  using values
local: _at1 master: 
file 2 _u_at_vars don't match file 1
r(198);

A workaround for a similar problem is discussed here: https://www.statalist.org/forums/for...gle-regression
...but this relates to two different continuous x-variables.

I would very much appreciate help and suggestions for a workaround on this issue.

Best regards.

↧

entropyetc not byable for string variables?

November 20, 2018, 1:07 am

≫ Next: Omitting one main effect in linear regression

≪ Previous: Overlay two marginsplots where one treats x as a factor variable and the other treats x as continuous

Dear All, A data set is

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str6 stkcd str10 reptdt byte(age degree)
"000002" "2017-12-31" 53 4
"000002" "2017-12-31" 43 4
"000002" "2017-12-31" 52 4
"000002" "2017-12-31" 49 5
"000002" "2017-12-31" 55 4
"000002" "2017-12-31" 40 4
"000002" "2016-12-31" 52 4
"000002" "2016-12-31" 44 3
"000002" "2016-12-31" 55 4
"000002" "2016-12-31" 39 4
"000002" "2016-12-31" 42 4
"000002" "2016-12-31" 51 4
"000002" "2016-12-31" 54 4
end

I use (ssc install) entropyetc to obtain relevant statistics but find the byable function does not work for string variables. Please see below

Code:

destring stkcd, gen(id)
gen ymd = date(reptdt, "YMD")
gen year = year(ymd)

// OK
entropyetc age, by(id year)

// Not OK
entropyetc age, by(stkcd reptdt)

The results are

Code:

. // OK
. entropyetc age, by(id year)

----------------------------------------------------------------------
    Group |  Shannon H      exp(H)     Simpson   1/Simpson     dissim.
----------+-----------------------------------------------------------
   2 2016 |      1.946       7.000       0.143       7.000       0.364
   2 2017 |      1.792       6.000       0.167       6.000       0.455
----------------------------------------------------------------------

. 
. // Not OK
. entropyetc age, by(stkcd reptdt)
no observations
r(2000);

Do I miss anything?

↧

Omitting one main effect in linear regression

November 20, 2018, 2:17 am

≫ Next: Generate standardized values within subgroups

≪ Previous: entropyetc not byable for string variables?

Good morning everyone,

I'm actually studying how the Transgenerationl Control Intentions (TCI) of family firms influence the Entrepreneurial Orientation (EO). I have two moderator, one is the financial condition and the other one is the ratio of family managers in the family firm.

The dependent variable is the EO (assume values from 0 to 15) and the independent variable is the TCI, which is a dummy.

I wanna study how being in weak financial conditions (dummy) moderate the relationship EO-TCI and how the presence of family manager influence the relationship EO-TCI. To do so, I created two interaction terms:

TCI x Weak financial conditions
TCI x Ratio of family managers

When I do the linear regression, one for the first interaction term and another one for the second, if I put both the main effect and the interaction (y= A + B + A*B) I obtain no significant result.
If I omit both the main effect (y= A*B), I obtain significant result.

But what is more interesting is that, if I omit just one main effect (y= A + A*B, where A is TCI), I obtain better result which are more stastistically significant.

I'm asking if doing so, i.e. eliminating one main effect, makes any sense.

Thanks,

Dario

↧

Generate standardized values within subgroups

November 20, 2018, 2:24 am

≫ Next: How to estimate a spatial SAR model through GMM

≪ Previous: Omitting one main effect in linear regression

I have survey data across several countries and it seems like the responses on ordinal scale have different spreads for different countries.
Hence, instead of standardizing across the board, I would like to standardize by country.
Unfortunately, egen x = std(y), by country doesn't work as std can not be used with by.
Any suggestions?

Also, I have more than 700 variables...is there any more efficient way than to write an egen for every single variable?

Thanks in advance for your help.

↧

How to estimate a spatial SAR model through GMM

November 20, 2018, 2:52 am

≫ Next: Within group matching - panel data

≪ Previous: Generate standardized values within subgroups

Hello eveybody,

I have a panel data and I would like to estimate a spatial SAR model through GMM (instead of using the standard ML method).

Now, the command "spivreg" does that, but only for cross-sectional datasets. For panel data I found the command "spregdpd ...run(xtdpdsys)" which estimates a spatial SAR via the Arelanno and Bond GMM-sys estimator; however, this is a dynamic specification and it includes, besides the spatially lagged dependent variable, also the (non-spatial) lagged dependent variable. I want to estimate a static (standard) SAR (i.e., with spatially lagged dependent variable but no non-spatially lagged dependent variable).

Do you know if there is a command for this, or some specific additional codes I have to use?

I know that the SAR model can be estimated with the GMM in R. Nonetheless, I am not a traitor, and I would never betray my beloved Stata program for R.

Many thanks.

Kodi

↧

Within group matching - panel data

November 20, 2018, 3:04 am

≫ Next: Getting parameters from looping maximize()

≪ Previous: How to estimate a spatial SAR model through GMM

I have a biennial long individual level panel data reshaped from a wide household level panel for individual analysis. However, individuals within households do not necessarily have same identification number over the years. For example:

hhid year member age gender
1 2009 1 88 1
1 2009 2 38 0
1 2009 3 36 1
1 2011 1 40 0
1 2011 2 37 1
1 2013 1 39 1
1 2013 2 41 0
1 2013 3 1 1

Member 1 from 2009 probably passes away and does not show up in 2011 survey, so number 1 is taken by a 40 year old male. Later in 2013, number one is his 39 year wife, while he gets to be number 2, they also have a daughter (number 3).

Is there any way to do within group matching?

I was thinking to match by age and gender, however the age difference between survey years is not always 2 years due to the varying survey dates.

Any suggestions?

Thank you very much!

↧

Getting parameters from looping maximize()

November 20, 2018, 8:49 am

≫ Next: Double Hurdle Poisson

≪ Previous: Within group matching - panel data

Hello,

I need to optimize a function that seems to have flat areas, so when i run maximize(), after some iterations the algorithm gives the same value of the function but does not stop.

Seems like it encounters a flat area. What I want is the algorithm to stop once the value of the function does not change much from iteration to iteration, and give me a value of the parameters (that i use to maximize the function) that corresponds to this maximum.

I can not figure out how to:

1. compare values of the function from different iterations
2. save the value of the parameter (so i can call it outside myeval).

y is a function of input data and 4 parameters (param[1]:param[4])

What i need logically should do things like this, but it doesn't work obviously

mata

void myeval(todo, param, y, g, H) {

y=function(param[1],param[2],param[3],param[4])

if (abs(y_i-y_i-1))<eps {
p=param
break
}

}
S = optimize_init()
optimize_init_evaluator(S, &myeval())
optimize_init_params(S, (530,540,535,544))
r = optimize(S)
r

If something is not clear, please let me know.

Thank you in advance for your help,
Olga Meshcheriakova

↧