Assessing model using svy data

August 24, 2019, 1:51 pm

≫ Next: Question on merging datasets with duplicates

Dear Statalist community,

I am using a nationwide dataset with survey sampling to determine a specific surgical outcome during readmissions. I have read the Survey Data and Survival Analysis Reference Manuals and I could not find how to obtain the ROC curve and the Harrell's C statistic after a svy: stcox model. Also, how do you typically assess the goodness-of-fit test using svy: stcox.

I would greatly appreciate if someone can help me with this.

Jesus

↧

Question on merging datasets with duplicates

August 24, 2019, 2:56 pm

≫ Next: Missing value upon matrix multiplication

≪ Previous: Assessing model using svy data

Hello all, I have a question regarding merging files with duplicates. I suspect the solution is quite simple, but I have not been able to come up with one (nor have any of my coworkers).

I have two datasets I’d like to merge. Both of them contain the numeric variables stckcd and year, along with other variables. The observations in dataset 1 are uniquely identified by stckcd and year. The observations in dataset 2 are not.

I want to merge two datasets by stckcd and year so that if there is a duplicate observation in dataset 2, the corresponding observation for the other variables in dataset 1 is repeated.

Here’s a simple example.

Dataset 1:

stckcd	year	A
1	2000	1
1	2001	1
2	2000	2

Dataset 2:

stckcd	year	B
1	2000	w
1	2000	x
1	2001	y
2	2000	z

Here's what I'd like the merged datasets to look like:

stckcd	year	A	B
1	2000	1	w
1	2000	1	x
1	2001	1	y
2	2000	2	z

My problem seems similar to the one described here: https://www.statalist.org/forums/for...the-duplicates,but I’m not entirely sure what that user wanted the final dataset to look like.

Apologies in advance if this question is not phrased clearly enough. I am new to StataList.

↧

Missing value upon matrix multiplication

August 24, 2019, 3:22 pm

≫ Next: Ppmlhdfe returns r3203

≪ Previous: Question on merging datasets with duplicates

Dear Statalist,

I am trying to implement a routine based on the following recent article in the Stata Journal (i've attached it for your convenience):

Terza, J. V. (2017). Two-stage residual inclusion estimation: A practitioners guide to Stata implementation. The Stata Journal, 17(4), 916-938.

I am following the codes on p15, reproduced here:

Step c:
/*step c*/ probit Xe Xo Wplus Step d:
/*step d*/ predict phiWalpha, p gen Xuhat=Y-phiWalpha Step e:
/*step e*/ mata: alphahat=st_matrix("e(b)") ́ mata: Valphahat=st_matrix("e(V)") Step f:
/*step f*/ glm Y Xe Xo Xuhat,family(gaussian) link(probit) vce(robust) Step g:
/*step g*/ mata: betahat=st_matrix("e(b)") ́ mata: Vbetahat=st_matrix("e(V)") mata: Bu=betahat[3] Step h:
/*step h*/ putmata Y Xe Xo Wplus Xuhat mata: X=Xe, Xo, Xuhat, J(rows(Xo),1,1) mata: W=Xo, Wplus, J(rows(Xo),1,1) Step i:
/*step i*/ mata: gradbeta=normalden(X*betahat):*X mata: gradalpha=-Bu:*normalden(X*betahat):*/*
*/normalden(W*alphahat):*W Step j:
/*step j*/ mata: B1 = gradbeta ́*gradbeta mata: B2 = gradbeta ́*gradalpha Step k:
/*step k*/ mata: AVARBeta=invsym(B1)*B2*Valphahat*B2 ́invsym(B1)/* */+ Vbetahat

Step l:
/*step l*/ mata: ACSE = sqrt(diagonal(AVARBeta))
Step m:
/*step m*/ mata: ACtstats=betahat:/ACSE
The goal of these steps is to be able to obtain the asymptotically correct standard errors (ACSE) when implementing the two-stage residual inclusion. I seem to get stuck on step J onwards. The problem is that when I checked the contents of B1 and B2, it seemed to return a full matrix of missing values. As a result, other variables (e.g. AVARBeta, ACSE, ACtstats) from step J onwards (i.e. steps K, L, M) all return missing values.

Not sure if it helps, but when I looked into gradbeta and gradalpha they seemed to have a mix of both missing and non-missing values.

I was wondering if you have any thoughts on this. Any help would be much appreciated.

Thank you very much.

Best,
GuiDeng

↧

Ppmlhdfe returns r3203

August 24, 2019, 3:23 pm

≫ Next: How to do a correlation between two variables for each unit of a panel?

≪ Previous: Missing value upon matrix multiplication

Dear All,

I am using ppmlhfe command with

Code:

ppmlhdfe sales post, a(timeFE itemFE) vce(cluster itemFE)

and I get:

Code:

 lsmr():  3203  <tmp>[0,0] found where colvector required
FixedEffects::_partial_out():     -  function returned error
solve_lse():     -  function returned error
relu_fix_separation():     -  function returned error
GLM::init_separation():     -  function returned error
 <istmt>:     -  function returned error
r(3203);

How to get rid of this problem?

↧

How to do a correlation between two variables for each unit of a panel?

August 24, 2019, 4:57 pm

≫ Next: color/line pattern for graph?

≪ Previous: Ppmlhdfe returns r3203

Hi,

I have age-specific suicide rates (1992-2000) for Russia grouped by 5 year age groups from 15-19 to 65-69. I also have Russian economic data (GDP, Inflation, Unemployment) for 1992 to 2000.

My data on Stata is described as a panel, for other regressions I have been doing on the dataset.

I want to find the correlation between the economic variables (GDP, Inflation, Unemployment) and the Suicide Rates for each age group. How would I do this?

My data looks like this:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str6 AgeGroup int(Year SuicideRate) float(Unemployment GDPGrowthRate InflationRate)
"15to19" 1992  254  5.2     0     .
"15to19" 1993  323  5.9  -8.7 874.6
"15to19" 1994  354  8.1 -12.7 307.6
"15to19" 1995  366  9.4  -4.1 197.5
"15to19" 1996  351  9.7  -3.6  47.7
"15to19" 1997  347 11.8   1.4  14.8
"15to19" 1998  335 13.3  -5.3  27.2
"15to19" 1999  339   13   6.4  85.7
"15to19" 2000  363 10.6    10  20.8
"20to24" 1992  429  5.2     0     .
"20to24" 1993  534  5.9  -8.7 874.6
"20to24" 1994  649  8.1 -12.7 307.6
"20to24" 1995  725  9.4  -4.1 197.5
"20to24" 1996  734  9.7  -3.6  47.7
"20to24" 1997  724 11.8   1.4  14.8
"20to24" 1998  709 13.3  -5.3  27.2
"20to24" 1999  757   13   6.4  85.7
"20to24" 2000  796 10.6    10  20.8
"25to29" 1992  600  5.2     0     .
"25to29" 1993  747  5.9  -8.7 874.6
"25to29" 1994  863  8.1 -12.7 307.6
"25to29" 1995  847  9.4  -4.1 197.5
"25to29" 1996  828  9.7  -3.6  47.7
"25to29" 1997  767 11.8   1.4  14.8
"25to29" 1998  722 13.3  -5.3  27.2
"25to29" 1999  800   13   6.4  85.7
"25to29" 2000  867 10.6    10  20.8
"30to34" 1992  742  5.2     0     .
"30to34" 1993  901  5.9  -8.7 874.6
"30to34" 1994 1022  8.1 -12.7 307.6
"30to34" 1995  989  9.4  -4.1 197.5
"30to34" 1996  949  9.7  -3.6  47.7
"30to34" 1997  855 11.8   1.4  14.8
"30to34" 1998  808 13.3  -5.3  27.2
"30to34" 1999  837   13   6.4  85.7
"30to34" 2000  877 10.6    10  20.8
end

↧

color/line pattern for graph?

August 24, 2019, 7:50 pm

≫ Next: Excluding leading zeros when exporting output to WORD

≪ Previous: How to do a correlation between two variables for each unit of a panel?

Dear All, I found the following code

Code:

sysuse auto, clear
regress price c.length##c.mpg 
est store regression   

foreach v of var length mpg {
  sum `v' if e(sample)
  local low_`v' = r(mean)-r(sd)
  local high_`v' = r(mean)+r(sd)
}

margins, at(mpg=(`low_mpg' `high_mpg') length=(`low_length' `high_length')) 
marginsplot, xlabel(13 " " `low_mpg' "Low IV" `high_mpg' "High IV" 30 " ") ///
   ytitle("Price") ylabel(2000(1000)9000, angle(0) nogrid) ///
   legend(position(3) col(1) stack) title("") noci

with graphArray

My question is:

How can I change the colors of lines?
How can I obtain, say, a dash and/or dot line?
Instead of circle, how can I obtain, say square symbol? Thanks in advance!

↧

Excluding leading zeros when exporting output to WORD

August 24, 2019, 8:39 pm

≫ Next: Can I run a Pseudo-Poisson Maximum Likelihood in panel data when not using a Gravity Model? What is the STATA command?

≪ Previous: color/line pattern for graph?

Hi folks, can you recommend a command that could help me export regression output without including the leading zeros for coefficients. For example, "0.75" should appear as ".75". I am using outreg2 but according to the manual it doesn't provide this function, to my understanding. Please inform and help! Thanks!

↧

Can I run a Pseudo-Poisson Maximum Likelihood in panel data when not using a Gravity Model? What is the STATA command?

August 25, 2019, 5:11 am

≫ Next: How can I get return list of estat endogenous after a 2sls?

≪ Previous: Excluding leading zeros when exporting output to WORD

I want to run the Pseudo-Poisson Maximum Likelihood (PPML) in a panel data framework as my dependent variable has many zeroes. However, my challenge is that from all the literature I have read on the PPML, it seems to only work in gravity model type of estimation.
1. Is it possible to run a PPML using panel data for a non-gravity type of model?
2. If it is possible, what is the STATA command to use?

↧

How can I get return list of estat endogenous after a 2sls?

August 25, 2019, 6:57 am

≫ Next: Calculating the expected return of Eurostoxx 50 with STATA (Event Study)

≪ Previous: Can I run a Pseudo-Poisson Maximum Likelihood in panel data when not using a Gravity Model? What is the STATA command?

Hello,everyone.I want to define results of extat endogenous to be a local.But I find that I can't get return list of estat endogenous,the code "return list" after "estat endogenous " is the return list of ivregress 2sls but not estat endogenous's list?I wonder if you can tell me how to get the return list of estat endogenous.
as an example:

----------------------- copy starting from the next line -----------------------

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
sysuse auto, clear
ivregress 2sls price i.rep78 (foreign = weight turn trunk),r
estat endogenous
return list
end

------------------ copy up to and including the previous line ------------------

↧

Calculating the expected return of Eurostoxx 50 with STATA (Event Study)

August 25, 2019, 9:52 am

≫ Next: Weighted adjacency matrix

≪ Previous: How can I get return list of estat endogenous after a 2sls?

Hello everyone,
currently I am working on an event study with the Eurostoxx 50 Index as underlying market. As I never worked with STATA before, I actually have no experience with the software.. So I hope you guys can help me out.

My question is, how to calculate the Index' Expected Return/ Normal Return (in order to calculate the abnormal returns). Is there any tutorial or maybe even a written command for this purpose or does anyone have a hint for me?

Unfortunately I couldn't find anything, neither on google, nor by using the search function...

Thanking you in advance!

Greetings,
MM

↧

Weighted adjacency matrix

August 25, 2019, 10:02 am

≫ Next: How to test whether difference in differences is statistically significant without regression?

≪ Previous: Calculating the expected return of Eurostoxx 50 with STATA (Event Study)

Hi all, I am struggling with something similar to the original poster. I am using Stata 15 and have 432 observations. I have a column of unique actor ID's and columns of 300+ forums, 0 if actor did not attend the forum and a 1 if the actor attended a forum. Hence, this is wide data. What I need to do is create a weighted adjacency matrix, actor X actor with cells populated by the number of forums actors coparticipate in. If anyone could help, I would be much indebted. Here is a sample of my data.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input int Actorid byte(Forum2 Forum4 Forum6 Forum9 Forum17 Forum18 Forum19 Forum21)
 1 0 0 0 0 0 0 0 0
 3 0 0 0 0 0 0 0 0
 8 0 0 0 0 0 0 0 0
18 0 0 0 0 0 0 0 0
23 0 1 0 0 0 0 0 0
26 0 0 0 0 0 0 0 0
29 0 0 0 0 0 0 0 0
36 0 0 0 0 0 0 0 0
38 0 0 0 0 0 0 0 0
42 0 0 0 0 0 0 0 0
46 0 0 0 0 0 0 0 0
60 0 0 0 1 0 0 0 0
65 0 0 0 0 0 0 0 0
71 0 1 0 0 0 0 0 0
79 0 0 0 0 0 0 0 0
end

------------------ copy up to and including the previous line ------------------

Listed 15 out of 442 observations

Thanks!

↧

How to test whether difference in differences is statistically significant without regression?

August 25, 2019, 10:34 am

≫ Next: -xtgls- vs -xtscc-?

≪ Previous: Weighted adjacency matrix

Dear all,

I have two groups and have tested whether the mean of my variable of interest is significantly different in each of two periods (using the ttest command; i.e. ttest "var" if year==1, by(treated) and ttest "var" if year==2, by(treated). However, I would also like to test whether the difference between those differences themselves is statistically significant (i.e. a get a raw diff-in-diff estimate); is it possible to do this without a regression?

Many thanks!

↧

-xtgls- vs -xtscc-?

August 25, 2019, 10:59 am

≫ Next: Which model would you recommend to use? Panel Data

≪ Previous: How to test whether difference in differences is statistically significant without regression?

I am analysing a panel data with n=19 (ID/panel variable i.e. countries) and T=44 (time variable) to understand the drivers of equity flows to emerging markets. Before estimating the model, I run following diagnostic tests:

1. Using -xttest3-, I find presence of Heteroskedasticity.
2. Using -xtserial-, no autocorrelation is found.
3. Using -xttest2-, I find presence of cross sectional dependence.

Moving on, based on these, I compute Robust Hausman Test using -xtoverid- after running -xtreg, re(robust)-. Robust Hausman Test suggests using FE model.

Now, I have to decide between -xtgls- & -xtscc-.

By using help xtscc, I find this:

-xtscc, fe- performs fixed-effects (within) regression with Driscoll-Kraay standard errors. These standard errors are robust to very general forms of cross-sectional ("spatial") and temporal dependence (provided that T is sufficiently large). If the residuals are assumed to be heteroscedastic only: use xtreg, fe robust.

On the other hand, -xtgls- fits panel-data linear models by using feasible generalized least squares. This command allows estimation in the presence of AR(1) autocorrelation within panels and cross-sectional correlation and heteroskedasticity across panels.

So, I have to choose between the following two commands:

1. xtscc depvar indepvars, fe
2. xtgls depvar indepvars, panels(correlated) corr(independent)

Which one should I choose?

Thanks.

↧

Which model would you recommend to use? Panel Data

August 25, 2019, 12:30 pm

≫ Next: Labels

≪ Previous: -xtgls- vs -xtscc-?

Hi all,
I do my University work and need your help.
I have a panel data and want to know how 10 independent variables will affect dependent variable #1 and dependent variable #2.
4 of 10 independent variables are factor variables (distances between 2 points which are constant in time).

According to tests (1. Wald test; 2. Breusch-Pagan Test; 3. Hausman test) I want choose a Fixed Effect Model.
Could you advice me:
1. Am I right that I've chosen this type of tests to choose model type? Which tests can you advice to do additionally?
2. What is your opinion regarding using Fixed Effect Model in this case?
3. Is it ok to use 4 factor variables in a Fixed Effect Model with only 10 independent variables in total?

Thank you very much! Array

↧

Labels

August 25, 2019, 7:42 pm

≫ Next: mgarch dcc predicted correlations

≪ Previous: Which model would you recommend to use? Panel Data

Hello, i'm sure my problem is very basic, but I couldn't find an answer in past topics.

An example of what I want to do is

gen x=.
replace x=0 if y=("Married", "Engaged")

The thing is that variable y is a numeric variable with a certain label related to each number, but I want to work with the labels.

I can't work with the numeric values because for some observations if it says "Married" the value of y=1 and for other its y=3

Thanks!

José

↧

mgarch dcc predicted correlations

August 25, 2019, 11:18 pm

≫ Next: Calculating the average risk taking ratio of similarly sized, successful other firms at T

≪ Previous: Labels

Hello,

Does anyone know why mgarch dcc predicted correlations begin with unreasonable values? For example, the following code results in correlation that begin at about 0 and slowly make their way to "reasonable" levels.

Code:

. use http://www.stata-press.com/data/r15/stocks
(Data from Yahoo! Finance)

. quietly mgarch dcc (toyota nissan honda = , noconstant), arch(1) garch(1)

. predict H*, correlation

Is there a way to force Stata to begin with something more reasonable, for example correlations for the entire sample?

Thank you,
Stan

↧

Calculating the average risk taking ratio of similarly sized, successful other firms at T

August 25, 2019, 11:28 pm

≫ Next: Questions about how to combine PSM and DID

≪ Previous: mgarch dcc predicted correlations

Dear all,

I have an unbalanced panel data. I am trying to calculate the average risk-taking ratio of similarly sized, successful other firms at T. Variable cumRisk5 is the average risk-taking ratio of the focal firm over the past 5 years. Perf_rank and fsize are quintile variables created based on firm profit over the past 5 years (Perf_rank) and firm sales over the past 5 years (fsize).

If the size code of firm j is the same as the focal firm i's, then these two firms are considered as 'similar' firms. If Perf_rank is 5 (top quartile), this firm is considered to be successful. For example, the focal firm's size is 1 (smallest), then I want to calculate the average risk-taking ratio of other successful firms with the same size number (1) and highest perf_rank (5) at t, which is the sum of cumRisk5 of 'other' firms with size 1 and Perf_rank 5 divided by the number of those other firms at T.

I want to calculate this number for each observation. I tried calculating this number manually by dividing the database into 5 parts based on size, but I would like to study if there is any simpler, wiser way of creating this variable (e.g. Loop). I have been learning a lot from Statalist, and I would greatly appreciate it if anyone can give me suggestions or comments.

Thank you in advance for your help!

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float firm int year float(cumRisk5 Perf_rank fsize)
 43 2011         1 4 3
 43 2012        .5 4 3
 43 2013  .3333333 4 3
 43 2014      1.25 4 4
 69 2011         1 4 4
 69 2012       1.5 4 4
 69 2013  .8333334 4 4
 79 2007         1 5 4
 79 2008        .5 5 4
 86 2004         1 2 3
 86 2005 1.1666667 2 5
 86 2006  .6666667 2 5
 86 2007 1.4722222 2 5
 86 2008  .8666667 3 5
 86 2011       1.2 3 5
 86 2012        .5 3 5
 86 2013  .3333333 3 5
 86 2014      1.25 3 5
 86 2015        .7 3 4
 89 2006         1 3 3
 89 2007        .5 3 4
 89 2008 1.3333334 3 4
 89 2009 1.0833334 3 4
 89 2010 1.0333333 3 4
 89 2011  .5277778 3 5
 89 2012  .3944445 3 5
 89 2013      1.15 3 5
 89 2014  .8166667 3 5
 89 2015  .4583333 3 5
103 2002  .3333333 1 3
103 2003       .25 1 2
103 2004        .2 1 2
103 2005         0 1 3
103 2006         0 1 3
119 2014         1 1 2
119 2015        .5 1 2
124 1997        .5 5 5
124 1998  .3333333 5 5
124 1999       .25 5 5
124 2000        .2 5 5
124 2001         0 5 5
124 2002         1 5 5
124 2003        .5 5 5
124 2004  .3333333 5 5
124 2005       .25 5 5
124 2006       1.2 5 5
124 2007        .5 5 5
124 2008  .3333333 5 5
124 2009       .25 5 5
124 2010        .2 5 5
124 2011         0 5 5
124 2012         0 5 5
124 2013         0 5 5
124 2014         0 5 5
124 2015         0 5 5
131 2009 1.3333334 2 2
131 2010       .75 3 2
131 2011 .53333336 3 3
131 2012       .25 3 3
131 2013        .2 3 3
138 2015         0 2 2
153 2009         1 4 4
153 2010        .5 4 5
153 2011  .3333333 4 5
153 2012       .25 4 5
153 2013        .2 4 5
153 2014         1 4 5
153 2015        .5 4 5
157 2004         1 3 4
157 2005        .5 3 4
157 2006  .3333333 3 4
157 2007       .25 3 4
157 2008        .2 3 4
168 2015         1 1 2
169 2012         1 5 5
169 2013       1.5 5 5
169 2014  .8333334 5 5
169 2015  .5833334 5 5
190 2008         1 4 4
195 2004         1 2 2
195 2005       1.5 2 2
195 2006  .8333334 2 2
195 2007  .5833334 2 2
195 2008       .45 2 2
198 2014         1 1 3
198 2015        .5 1 3
210 2012 1.3333334 2 3
210 2013       .75 2 3
210 2014 .53333336 2 3
210 2015       .25 2 3
212 2008         1 3 4
212 2009        .5 3 4
212 2010  .3333333 3 4
212 2011      1.25 3 4
212 2012        .7 4 4
236 2007      1.25 3 4
236 2008        .7 3 4
236 2009  .3333333 3 4
236 2010       .25 3 4
236 2011        .2 3 4
end
format %-ty year

↧

Questions about how to combine PSM and DID

August 25, 2019, 11:44 pm

≫ Next: create a matrix from single-observation values

≪ Previous: Calculating the average risk taking ratio of similarly sized, successful other firms at T

Hi. I am studying the impact of law on employment rates. In order to remove the selection bias, I am trying to implement DID after setting a comparison group in the PSM.

I assume that the law was enforced in 2010.

Please check if the following procedure is correct.

Stage 1. The 2009 dependent variable (employment status) is merged horizontally into the 2010 data using the 'merge' command (via the id variable).

Step 2. Get a matched sample using the 'psmatch2' command.

Step 3. I create an interaction variable and perform DID or DDD.

To be honest, I'm not sure how to merge two years(2009 and 2010) data for DID.

Can you provide me with a STATA DO file that combines PSM and DID? (Although I know the 'diff' command for kernel PSM, I want to use various types of matched DIDs).

↧

create a matrix from single-observation values

August 26, 2019, 12:28 am

≫ Next: Vertical and left justified added text to graph

≪ Previous: Questions about how to combine PSM and DID

Hello,

Does anyone know if there is a simple way to create a matrix from single-observation values? For example, suppose I have a dataset that contains 1 observation (for more than 1 I will use a loop) and Nx(N-1)/2 + N variables, each of which contains a correlation between something and something else (these somethings have been previously computed, for example via mgarch dcc). I'd like to create an NxN upper triangular matrix of these correlations.

Thank you,
Stan

↧

Vertical and left justified added text to graph

August 26, 2019, 12:29 am

≫ Next: Probit: iteration backed up problem

≪ Previous: create a matrix from single-observation values

Hi fellow Stata users!

I'm making up a graph and would like some added text labels to be added that are vertically oriented and left justified (so they look like they're all "standing up" with even bottoms).

I'm not sure why this isn't working:

Code:

clear 
graph drop _all
set obs  1

local location_u1 -1
local location_1 -0.75
local location_u2 -0.5
local location_2 -0.25
local location_u3 0
local location_3 0.25
local location_u4 0.5
local location_4 0.75
local location_u5 1

gen location_u1 = `location_u1'
gen location_1 = `location_1'
gen location_u2 = `location_u2'
gen location_2 = `location_2'
gen location_u3 = `location_u3'
gen location_3 = `location_3'
gen location_u4 = `location_u4'
gen location_4 = `location_4'
gen location_u5 = `location_u5'


twoway scatter location_u1 location_u1 , msym(i) ///
    || scatter location_u5 location_u5 , msym(i) ///
        text(`location_u3' `location_1' "p <= 0.01", orientation(vertical) justification(left)) ///
        text(`location_u3' `location_2' "0.01 <= p < 0.05", orientation(vertical) justification(left)) ///
        text(`location_u3' `location_3' "0.05 <= p < 0.1", orientation(vertical) justification(left)) ///
        text(`location_u3' `location_4' "p > 0.1", orientation(vertical) justification(left)) ///
    legend(off) graphregion(col(white)) ///
    xlabel(`location_u1' `location_u5', notick nogrid labcolor(white)) xscale(alt lcolor(white)) ///
    ylabel(`location_u1' `location_u5', notick nogrid labcolor(white)) xscale(alt lcolor(white))

Thanks for any help!
Cheers,
Simon.

↧