Quantcast
Channel: Statalist
Viewing all 65167 articles
Browse latest View live

difference between -stptime- and -strate-

$
0
0
Just an open question for general interest...

These commands appear to do very similar things but I'm sure there are good reasons for having both, so can anyone point to some suggested reading for the advantages/appropriateness of one over the other?

Thanks

New on SSC: -pwcorrf- module, a more powerful version of pwcorr (with within-variable correlation option)

$
0
0
Dear all

-pwcorrf- is now available on SSC. It has three advantages over the standard pwcorr command.
  1. It is much faster (often 10x or more)
  2. It can calculate within variable correlations. E.g. correlations across panel units. Before you'd have to reshape the data first, which was not always possible (variable limit) and very very slow.
  3. It returns the matrix r(T) which shows the number of observations used to calculate each pairwise correlation.

Demo
Code:
*** Correlation across variables
sysuse citytemp.dta, clear
pwcorrf heatdd cooldd tempjan tempjuly, showt

qui replace heatdd = . if runiform() < 0.3
qui replace tempjan = . if runiform() < 0.8
pwcorrf heatdd cooldd tempjan tempjuly, showt

*** Correlation within variables
sysuse xtline1.dta, clear
pwcorrf calories, reshape

qui reshape wide calories, i(day) j(person)
pwcorrf calories*
pwcorr calories*

*** Returns r(T)
pwcorrf calories*
return list

pwcorr calories*
return list
Todo list
  1. return r(P), a matrix with the significance of each pairwise correlation
  2. return r(Pd), the same matrix with Dunnett's test-based p-values. Note that in my understanding, Bonferroni and Sidak corrections are not valid for pairwise correlations as they assume the tests are independent? I.e. different "control groups" for each correlation.

Comments, feedback, bug reports and so on are always welcome.
Jesse Wursten
KU Leuven

Bootstrap last estimates not found

$
0
0
Dear all,
I want to determine 95% confidence intervals using an optimal number of bootstrapped replications.
My code is:
bssize initial, tau(0.05) pdb(5)
shormob decil_2005 decil_2006, ge(-1 0 1 2) atkinson(1 2 3)
set seed 1
bootstrap "shormob decil_2005 decil_2006, ge(-1 0 1 2) atkinson(1 2 3)" _b, reps(768)
But Stata replies to "last estimates not found r(301)".
shormob is an installed command.
Could anyone explain me what I did wrong in my code?

Two stage GMM with Newey-West standard erros for panel datasets

$
0
0
Hi everyone,
Unfortunately I’m having some troubles with my panel regression. I have 260 trading days of 26 different firms (balanced dataset) and my first goal is it to rebuild a commonly used regression model used in literature.
Here is what they do: They have one endogenous Variable and exactly one instrument they use. Furthermore they run a two stage GMM estimator which is efficient in the presence of heteroscedasticity (Stock and Yogo, 2002), and apply heteroscedasticity and autocorrelation robust standard errors (Newey-West for panel datasets) based on five lags. Fixed effects are additionally used.
Now I’m trying to exactly rebuild that, but somehow I’m not sure whether I got confused and if I’m using the correct commands.
Here is my regression:
xtivreg2 y1 x1 (y2=x2), gmm2 robust bw(2) fe
Is that correct?

I would really appreciate your help.
Thanks in advance.
Nico

Stataforum Bug?

$
0
0
Is the Statalist homepage bugged for anyone else? I've been seeing "Diagnostic tests for survey logistic regression?" as last post in the general discussion for the last two days, even as newer posts are appearing.
Array

Question: Predicted Probability Logit Estimation Stata

$
0
0
Hello,
i am a beginner in Stata and i ran a logit regression on a binary dependent variable with binary independent variables.
To get my predicted probabilities, i used the command predict prob, after running my logit regression. Now i do not know which specific command i should use, to display the predicted probabilites. Using "tab prob" I only get a list of numbers that do not tell me something specific.

If you need further information or a screenshot, please just come up to me!

thanks in advance and please excuse the inconvenience of my basic question.

Kevin

System GMM - Generalized inverse

$
0
0
Dear statalist members,

I am using the xtabond2 command for my present research. Unfortunately, I am dealing with a problem for a long time now. As long as I do not collpase the instrument matrix I get the well-known warning:

Warning: Two-step estimated covariance matrix of moments is singular.
Using a generalized inverse to calculate optimal weighting matrix for two-step estimation.

If I collpase the instrument matrix, the message is not reported, but now the H0 of the J-Test is always rejected (p-value = 0.000).

I have two questions concerning this issue:

1. Altough the warning "Number of instruments may be large relative to number of observations" is never reported when I do not collapse the instrument matrix, the covariance matrix is always singular. Why might the matrix be singular? As far as I understand, it is no problem of too many instruments (the instruments are less than the number of groups).

2. Even if the matrix is singular, STATA uses the generalized inverse. What is this generalized inverse and how is it calculated? I know suing the generalized inverse is not as efficient as the regular approach, but are the standard errors still robust to heteroskedasticity and serial correlation?

I would really appreciate some comments and thank you very much in advance!

Best regards Jonathan

Replace an observation as missing after it reaches a certain value

$
0
0
Good afternoon,
I am trying to replace the observation values of a variable with a missing values after it reaches a certain value. The variable is a dummy and I need to recode it as missing after it gets the value of 1. I've tried to use "replace" command with if clauses and stuff but I have not reached any solution.
Do you have any hint?
Thank you

Panel Data Fixed Effects Threshold Regression - xthreg

$
0
0
I have panel data for 44 countries across 50 years with numerous variables and am aiming to find the threshold government debt levels beyond which economic growth falls.

When applying the following command: tsset CountryName1 Year it shows that I have a strongly balanced panel.
In order to complete xthreg (fixed effect threshold regression) I need balanced data therefore I have applied the following command: xtbalance, range(1960 2010) as well as xtbalance, range(1960 2010) miss(_all) to make the unbalanced data balanced.

However I still receive the red error as follows: "Panel threshold model need balanced panel, check you data!". After doing xtbalance my data is balanced thus what is the problem? Please advise.

Furthermore, I have tried completing these commands on both Stata 13 and 14 therefore I'm assuming xthreg does not require a newer version of Stata to work. Please advise if there are certain commands relating to xthreg that require newer versions of Stata.

Regards,
Shreyas

Example of covariance(pattern) matrices

$
0
0
I wondered if anyone had references to/examples of user-specified patterned covariance matrices I could read. Supposing I have a random coefficient model with three random effects, x1, x2, x3. Rather than specify covariance(unstructured), there may be reasons why I would like to fit a model where, e.g., I estimate the variances for each of the random effects & the covariance between x1,x2, but the covariances for x1,x3, and x2,x3 are fixed to zero. Can this be done in Stata?

Many thanks for any guidance available

Bar graph with confidence interval

$
0
0
[1] My dataset looks like this:

. list, sep(0)

+-----------------------------------------------------------+
| domain mean sd n hi lo |
|-----------------------------------------------------------|
1. | domain1 57.55772 22.99455 33 65.71124 49.40421 |
2. | domain2 58.58586 13.57925 33 63.40085 53.77087 |
3. | domain3 82.42424 15.52938 33 87.93072 76.91776 |
4. | domain4 81.14478 27.42623 33 90.8697 71.41987 |
5. | domain5 52.86195 19.84319 33 59.89804 45.82586 |
6. | domain6 57.91246 22.69592 33 65.96008 49.86483 |
7. | domain7 63.97306 17.57486 33 70.20484 57.74129 |
8. | domain8 57.55772 22.99455 33 65.71124 49.40421 |
9. | domain9 74.24242 22.08369 33 82.07296 66.41189 |
10. | domain10 53.53535 40.77325 33 67.99292 39.07779 |
11. | domain11 60.13468 16.94845 33 66.14434 54.12502 |
12. | domain12 83.16499 19.26929 33 89.99758 76.33239 |
+-----------------------------------------------------------+


[2] "twoway (bar mean domain) (rcap hi lo domain)" returns vertical bar graph with 95% CI.

My question is, how to create horizontal one?

thanks in advance!
Connor


Help with Time Trend (Logit)

$
0
0
Hello, recently i feel so blocked since i cant come up with new ideas.
So My general task is to analyze, wether the Parental Education has a declining or uprising influence on the education of the childs over time from 1940- to 1978.
Every variable used is binary, having values of 0 or 1. In this case now, i have to "reproduce" the results from the attached sheets.

The upper table is a construction of interaction terms of parental education and trends. Now my question is if you can understand, what is meant by trend, how i can create this variable and then the interaction term and by which type of regression i get to these values in the table. I think i might use logit as i did before.

Questionig this, i do can interpret the values of table 4B (bottom page), but recreating those by the predicted conditional probabilities is another task which i asked in another topic. Maybe u do have an idea by reading this..e.g. telling me which commands display me the values, since i know that i have to use command "predict" but i do not know what to do further to see some values.

I thank you much in advance.
Best regards

Kevin ( a desperate Stata beginner )

Creating dummy variables using the i.x command

$
0
0
I have data that looks like the following:


Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input float(Transaction Buyer Seller Year)
  1 1 2 1990
 23 2 4 1991
432 3 1 1990
234 4 1 1990
 43 1 2 1991
 23 2 1 1992
432 3 2 1992
324 3 5 1993
432 4 6 1994
end

In the above, transaction is the dollar amount of the transaction, buyer and seller indicate the ID of the buyer and the seller involved in the transcation during a specific year, indicated by the year variable. I wish to run regressions of the sort where I regress the Transaction amount on a full set of buyer and seller dummy variables.

If I run the command regress Transaction i.Buyer i.Seller if year==1990 for instance, are dummy variables created only for those individuals that are in the dataset in the year 1990? Or, are dummy variables created for all individuals, but coefficients identified for only those that are in the dataset in 1990?

Thanks!

3-way variables interaction

$
0
0
Hi All,

I have a panel data in the following form:
<id> <city> <treated> <time> <after>

where id identifies the individuals in my panel, city is the location where the individual live (non-time varying), treated is a dummy indicating those individual that are eventually treated (0: non-treated, 1: treated), time is a year-month variable, and after is a dummy (0: before, 1: after) indicating the period in which the treated unit are under treatment.

With this data I am running a simple diff-in-diff, and I am including individual fixed effects and year-month fixed effects. Moreover, I'd like to include a treatment-city specific time trends (linear or quadratic) so my specification should be:

xtset id time
depvar c.treated c.after c.treated#c.after i.time i.city#c.treated#c.time, fe cluster(id)

However, I noticed that running the above specification, or the following (where the only difference is that I use i.treated) gives me different results:

depvar c.treated c.after c.treated#c.after i.time i.city#i.treated#c.time, fe cluster(id)

So, now I am not sure which one is the right way to define the 3-way interaction representing treatment-city specific time trends. Should I use i.treated or c.treated? I thought that there should not be any difference in using one or the other, and at least this is true when I use a 2-way interaction, i.e., c.treated#c.time or i.treated#c.time. But for some reasons, this is not true under a 3-way interaction, and I would like to understand why and if I am doing something wrong.

I hope someone can help me understand what's going on.

Thanks in advance!








error r322 after margins command

$
0
0
Hello, I am getting error r322 after the margins command: "missing predicted values encountered within the estimation sample."

I see this has been reported before. However, my circumstances are as follows: I'm doing a logistic regression with the svy: option while selecting a sub population using the "if" qualifer. I then use margins specifying the vce(unconditional) option. Everything works fine.

svy: logit Y i.Q4##c.Q5 if Q2<50
margins, at(Q5=(2 3 4 5 6 7 8) Q4=(0,1)) vce(unconditional)

However, if I instead use the subpop() option of the svy command, I get the r322 error.

gen spop=Q2<50
svy, subpop(spop): logit Y i.Q4##c.Q5
margins, at(Q5=(2 3 4 5 6 7 8) Q4=(0,1)) vce(unconditional)

However, if I do the above command without the vce(unconditional) option, it runs ok.

Can anyone help explain?

Thanks,
John L.

Help needed: How to forecast and store n-steps ahead forecasts in Stata?

$
0
0
Hello all,

Let me first apologize if someone already asked this question, I have tried to find an answer for my issue, but without any success.

I am trying to test various forecasting models with hourly data (and/or daily). What I need and do not know how to do it how to make some kind of loop which could store automatically (either as additional variables or something) n-step ahead forecasts. For example, if I am using first lag I would need up to 8 steps ahead forecasts for each hour.

There are two problems. First, regress/predict (OLS) does not allow dynamic forecasts. Second, I also could not find/control dynamic forecasts in arima. If I understood correctly, when someone uses
predict varfore, dyn(10)

after 10th value (10th hour, 10th day...) models predicts dynamic forecasts until the end of data. I would need that for each point in time, model predicts all 10-step forecasts and store them in 10 variables. Then for each real point in time, I would have 1-step ahead, 2-steps ahead,...10-steps ahead forecasts.

Quick and precise (in detail) answer would be much appreciated!

Thank you.

Obtaining IRF after Rolling VAR

$
0
0
Dear Statalist,

I am trying to obtain Impulse Response tables and graphs after
Code:
rolling, window(22) : var var1 var2 var3, lags(1/4)


I believed it should be similar to

Code:
       var var1 var2 var3, lags(1/4)
  
       irf set "myirf", replace
  
       irf create myfile, step(8) set(res1)
  
       irf table oirf coirf, impulse(var1) response(var2) noci stderror


However I have difficulty when using the -rolling- prefix.

Could you, please advise? Perhaps, anybody knows any use-written commands for such an estimation?

Many thanks!!

Dropping observations of categorical variables

$
0
0
Hi, I have a set of categorical variables extracted from survey dataset and as part of the responses I have observations with "don't know" or "dna" which I would like to drop. But I am getting error r(111). Example I want to drop observations where variable 'qhealth' has responses of "dna". I type the following code:

drop if qhealth==dna
But I get error saying dna not found

I also tried:
drop if qhealth=="dna"
But I get error saying type mismatch

Anyone can help me figure out what I'm doing wrong?

Thanks in advance!

Loop command

$
0
0
Hi everyone,

I have a question regarding expanding the usage of loop command (if possible). Let's say I'm using a database that has 21 variables for each observations (please find the sample below created by dataex):

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input double key float heart str5(DX1 DX2 DX3 DX4 DX5 DX6 DX7 DX8 DX9 DX10) int(CHRON1 CHRON2 CHRON3 CHRON4 CHRON5 CHRON6 CHRON7 CHRON8 CHRON9 CHRON10)
10146237 1 "41401" "41001" "4019"  "4240"  "2724"  "V1582" ""      ""      ""      ""      1 1 1 1 1 0 . . . .
10001801 1 "99672" "41001" "78551" "496"   "4271"  "41072" "25000" "9971"  "2720"  "2724"  0 1 0 1 1 1 1 0 1 1
10033631 1 "97081" "41001" "25000" "41401" "49320" "32723" "7904"  "2724"  "2449"  "4019"  0 1 1 1 1 1 0 1 1 1
10071307 1 "99672" "41001" "51881" "5750"  "4142"  "41401" "4019"  "2724"  "78659" "3051"  0 1 0 0 1 1 1 1 0 1
10276825 1 "0389"  "51881" "41001" "78552" "4271"  "5119"  "486"   "2761"  "5849"  "99592" 0 0 1 0 1 0 0 0 0 0
10287383 1 "0389"  "51881" "56983" "41001" "5070"  "486"   "34830" "78552" "78559" "56721" 0 0 0 1 0 0 1 0 0 0
10310377 1 "0389"  "41001" "51881" "78552" "486"   "34831" "4271"  "2639"  "5849"  "2762"  0 1 0 0 0 1 1 1 0 0
10245547 1 "51881" "41519" "V667"  "41001" "486"   "7994"  "73313" "49322" "4280"  "42731" 0 0 0 1 0 0 0 1 1 1
10304219 1 "51881" "99592" "41001" "78552" "4822"  "41189" "5990"  "4280"  "4168"  "1119"  0 0 1 0 0 1 0 1 1 0
10309903 1 "41041" "42823" "486"   "41001" "5601"  "496"   "4280"  "5949"  "2689"  "72400" 1 1 0 1 0 1 1 0 1 0
10033084 1 "1970"  "41001" "4271"  "V1052" "V4573" "9971"  "2724"  "V4364" "V153"  "40390" 1 1 1 0 0 0 1 1 0 1
10285418 1 "51881" "41001" "486"   "49121" "1629"  "1983"  "5849"  "9961"  "4239"  "2761"  0 1 0 1 1 1 0 0 0 0
10121271 1 "42821" "78551" "41001" "99681" "5849"  "00845" "78959" "V4283" "2930"  "29630" 1 0 1 1 0 0 0 1 0 1
10177348 1 "03849" "41001" "5849"  "9331"  "5300"  "3320"  "7837"  "27651" "5990"  "2859"  0 1 0 0 0 1 0 0 0 0
10071817 1 "5579"  "99592" "0380"  "41001" "40391" "5856"  "78552" "78551" "3441"  "5990"  1 0 0 1 1 1 0 0 1 0
10015097 1 "4111"  "41001" "42833" "5849"  "3360"  "2724"  "30000" "27800" "71690" "V1582" 1 1 1 0 1 1 1 1 1 0
20551022 1 "0389"  "5845"  "41001" "34831" "486"   "4271"  "78552" "78551" "2875"  "51881" 0 0 1 1 0 1 0 0 1 0
20411101 1 "0380"  "48232" "51881" "78552" "42841" "41001" "4162"  "6826"  "2762"  "70713" 0 0 0 0 1 1 1 0 0 1
20296199 1 "99666" "41001" "42741" "9971"  "V667"  "V8542" "78551" "4019"  "496"   "25000" 0 1 1 0 0 1 0 1 1 1
20009235 1 "97081" "41001" "42741" "51881" "5845"  "3481"  "72888" "2760"  "96500" "30560" 0 1 1 0 0 1 0 0 0 1
end

- first variable (named key in the database) is the de-identified code for the patient that is hospitalized
- 10 variables (named DX1-DX10 in the database) for the diagnoses (which is based on the ICD-9-CM codes). Some are acute conditions that happened during that hospitalization and some are chronic conditions like diabetes that the patient already had before being hospitalized.
- 10 variables (named CHRON1-10 in the database) indicating whether that diagnosis was a chronic condition (1) or happened during the same hospitalization as an acute condition (0). each CHRON(n) variable corresponds to the same number DX(n) variable. So to see if DX3 was acute or chronic, we look at CHRON3 to find out.

Previously, I wanted to know that, for example, how many of the observations had a heart condition (lets say ICD code for heart condition is 41001), regardless of whether it happened as an acute or chronic condition. So I used a loop command like below:

Code:
gen heart=0

quietly forval j=1/10 {
replace heart=1 if DX`j'=="41001"
}
and that worked perfectly fine. Now, I'd like to select only those observations which had an ACUTE heart attack. So, for example, if a heart code of 41001 is present in DX4, I only want to replace heart=1 if the corresponding CHRON4 is also zero (meaning that it was an acute condition, not chronic). Unfortunately, I don't know how to include that condition into my loop command, if possible at all.

I greatly appreciate it if anyone can help me in this regard, either utilizing the loop command or suggesting another approach. Thanks!

Generating and interpreting hazard ratios from Cox models using lincom and margins commands

$
0
0
Hi all -

There are some similar posts about this topic but none that specifically answer my question. Using Stata v14.1, I ran a Cox model which included an interaction term for two continuous variables. This term was significant. I followed up with a lincom command to get the hazard ratio for the interaction, but this was not significant. I am not sure why one is significant and the other is not. I'm assuming I must be doing something wrong with my code. I also tried using the margins command as suggested in another post, but this yielded an error message, "factor variables may not contain noninteger values". My variables do not have any noninteger values. Any help sorting this out would be much appreciated.

Code:
stcox EPAplusDHA f60solfb age bmi smoking educ f60alc tothormonestat texpwk ///
> f60enrgy f60fldeq f60calc redmeat dmarm hrtarm cadarm colorel c.EPAplusDHA#c.f60solfb
failure _d: colorectal == 1
analysis time _t: time
id: id

Iteration 0: log likelihood = -22630.355
Iteration 1: log likelihood = -22385.838
Iteration 2: log likelihood = -22385.147
Iteration 3: log likelihood = -22385.093
Iteration 4: log likelihood = -22385.092
Iteration 5: log likelihood = -22385.092
Refining estimates:
Iteration 0: log likelihood = -22385.092

Cox regression -- Breslow method for ties

No. of subjects = 134,017 Number of obs = 134,017
No. of failures = 1,952
Time at risk = 1569353.803
LR chi2(18) = 490.53
Log likelihood = -22385.092 Prob > chi2 = 0.0000

-----------------------------------------------------------------------------------------
_t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval]
------------------------+----------------------------------------------------------------
EPAplusDHA | .5044889 .2055583 -1.68 0.093 .2269981 1.121195
f60solfb | .9652983 .0208153 -1.64 0.101 .9253512 1.00697
age | 1.063833 .0035574 18.50 0.000 1.056883 1.070828
bmi | 1.024562 .0043645 5.70 0.000 1.016044 1.033152
smoking | 1.103438 .0370171 2.93 0.003 1.033219 1.178428
educ | 1.021955 .0308403 0.72 0.472 .9632625 1.084225
f60alc | .9986251 .0022477 -0.61 0.541 .9942293 1.00304
tothormonestat | .7813583 .0367145 -5.25 0.000 .7126133 .8567351
texpwk | .9965047 .0018905 -1.85 0.065 .9928063 1.000217
f60enrgy | 1.000179 .0000817 2.19 0.028 1.000019 1.000339
f60fldeq | .9997964 .0001836 -1.11 0.267 .9994367 1.000156
f60calc | .9997787 .0000787 -2.81 0.005 .9996245 .999933
redmeat | .9957916 .0571086 -0.07 0.941 .8899222 1.114256
dmarm | 1.015401 .0315733 0.49 0.623 .9553661 1.079208
hrtarm | 1.013579 .0219306 0.62 0.533 .971494 1.057486
cadarm | .9951817 .0420857 -0.11 0.909 .9160212 1.081183
colorel | 1.055499 .0367989 1.55 0.121 .9857837 1.130145
|
c.EPAplusDHA#c.f60solfb | 1.133301 .068782 2.06 0.039 1.0062 1.276457
-----------------------------------------------------------------------------------------



Code:
lincom EPAplusDHA + f60pect + c.EPAplusDHA#c.f60solfb, hr
( 1) EPAplusDHA + f60solfb + c.EPAplusDHA#c.f60solfb = 0

------------------------------------------------------------------------------
_t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
(1) | .5518976 .199766 -1.64 0.101 .2714913 1.121918

------------------------------------------------------------------------------

Code:
margins EPAplusDHA#f60solfb
EPAplusDHA: factor variables may not contain noninteger values
r(452);


Sandi
Viewing all 65167 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>