Quantcast
Channel: Statalist
Viewing all 65664 articles
Browse latest View live

Help Stata Manual - econometric specification

$
0
0
Hi,

I am looking that the ARCH/GARCH specification available here:
http://www.stata.com/manuals14/tsarch.pdf

Page 20-21, "Example 2: ARCH model with ARMA process", I do not understand why in the conditional mean "yt", there is -0.007 (intercept) substracted from y(t-1). In other words, I do not know why Stata de-mean y(t-1). Please help me with that.

For IV regression ,which one is testable?And why

$
0
0
I have some question about IV .
I am confused about homogenous treatment effect and heterogeneous treatment effects.

What is the command of Iv regression? ivreg ?
Also I was required to write down the assumption on homogenous treatment effect and heterogeneous treatment effects respectively.
And I wrote that existence of first stage ,exclusion for homogenous treatment effect and existence of first stage ,exclusion,monotonicity and independent for heterogeneous treatment effects.
My question is that which is definitely testable ?And why ?

And how to testable using by Stata code?


Thanks

How to combine two different data that are organized differently.

$
0
0
Hi,

So what i am trying to do now is to add "realGDP" of partner countries right next to weight column (attached).
but as you can see this data is arranged in a way that each country (in this case country 111) has its trading partners represented as a share of their total trade.

here, r_ifs_code is the country, and p_ifs_code is the partner countries, and weight is their corresponding share of total trade.

I have this other data (attached) which is realGDP data for all the countries in the world.

So i need to somehow take data from here put it right next to the weight column based on "year" and right"p_ifs_code"

Could you please help me with the code

Thanks

Mirror data

$
0
0
Hello,

Below is a data example that resembles my data. For convenience the example shows only 1 year and a handful of values of ID1 and ID2 but this should be enough.

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input int(ID1 ID2 time) byte x
100 101 1980 10
100 102 1980 20
100 103 1980 30
100 104 1980 40
100 105 1980  .
101 100 1980 50
101 102 1980 60
101 103 1980  .
101 104 1980  .
101 105 1980  .
102 100 1980 70
102 101 1980 80
102 103 1980  .
102 104 1980  .
102 105 1980  .
end
I would like to create a mirror variable m that takes the values of variable x as follows:

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input int(ID1 ID2 time) byte(x m)
100 101 1980 10 50
100 102 1980 20 70
100 103 1980 30  .
100 104 1980 40  .
100 105 1980  .  .
101 100 1980 50 10
101 102 1980 60 80
101 103 1980  .  .
101 104 1980  .  .
101 105 1980  .  .
102 100 1980 70 20
102 101 1980 80 60
102 103 1980  .  .
102 104 1980  .  .
102 105 1980  .  .
end
Your help would be appreciated.

Cheers.

Logit model will not predict values Close to 1

$
0
0
I have run a logistic regression model
Code:
logit overweight noise age i.gender i.income_cat i.education i.ethnicity i.diet i.physact i.maternal_bmi if gest_age >=259 & gest_age<300
All tests of model assumptions give satisfactory results. However, a plot of predicted vs observed values looks like this:
Array

Any suggestions why the model does not seem to predict higher values than it appears to do?

Best, Kjell Weyde

Relative Measure of Learning Effect over Time

$
0
0
Dear All,

In a repeated game framework, I am interested in studying whether or not variable y (which lays from 0 to infinite) decreases over time (i.e. over repetitions of the game). Then I use a OLS model in which y is my dependent variable and "Time" is my unique predictor. The problem is the following: I want to compare the learning effect between two groups. In group 1 I find a constant around 13 and a significant coefficient of -1. Then I conclude that learning is in place. In group 2 I find a constant of around 3 and a non-significant coefficient of around -0.10 (a pretty flat line). I am afraid that concluding that in group 1 learning occurred and in group 2 learning did not occurr may be inappropriate. Indeed, group 2 players probably started playing well since the beginning of the game (i.e. since the first period). So the problem is that the two groups trendlines depart from a different point on y axis (which may be demand for a relative measure of learning???)

Any suggestions to fix this issue?

Thanks a lot.
Simone





How to sum data vertically and present it by year

$
0
0
Hi,

So i have the following data. As you can see, data is organized by r_ifs_code and by year.
I want to sum all the weight values and present it with one single year.
year r_ifs_code weight
1980 111 0
1980 111 0
1980 111 0
1980 111 0
1980 111 0
1980 111 0
1980 111 0.026349
1980 111 0.025648
1981 111 0
1981 111 0
1981 111 0
1981 111 0.029414
1981 111 0
1981 111 0
1981 111 0
So i want the result to look like
1980 111 0.051997
1981 111 0.029414
Can someone help me with the code?

Time-varying binary covariates in Cox regression

$
0
0
Dear statalist
I'm struggling with the subject of time-varying covariates in cox regression. My data is in the format suggested by "An introduction to survival analysis using stata" by Mario Cleves (StataPress). I want to study the effect of binary covariates (e.g. hypertension and diabetes) on mortality. Attached an example of the entries of one subject in my dataset:

Code:
clear
input str20 id double(date0 date1) float(status hypertension diabetes)
"thomas"     . 19614 0 . .
"thomas" 19614 19628 1 0 0
"thomas" 19628 19631 1 0 0
"thomas" 19631 19656 1 0 0
"thomas" 19656 19661 1 1 0
"thomas" 19661 19687 1 1 0
"thomas" 19687 19753 1 1 0
"thomas" 19753 19800 1 1 0
"thomas" 19800 19802 1 1 0
"thomas" 19802 19982 1 1 0
"thomas" 19982 19984 1 1 0
"thomas" 19984 20165 1 1 0
"thomas" 20165 20166 1 1 0
"thomas" 20166 20167 1 1 1
"thomas" 20167 20173 1 1 1
"thomas" 20173 20621 1 1 1
"thomas" 20621 20646 2 1 1
end
format %d date0
format %d date1
status==0 means "inclusion", ==1 "in active followup", ==2 "death"

Just for clarification how I would stset:
Code:
stset date1, failure(status==2) id(id) origin(date0)
Now, I would like to include ht and diabetes as a time-varying covariate in cox regression. However, I couldn't find any good literature (or entries on statalist or in other forums) so far on how to run this. Does anyone have any suggestions on how to approach for this problem? Or any good introductory literature?

Regards

Fabian

&quot;too many variables specified&quot; when run &quot;reg&quot; with a large number of dummies

$
0
0
Dear Stata users,

I have trouble to run the command “reg” for a model with thousands of dummies. Would really appreciate if you can help me out. Thank you in advance for your help. Below are more details,

Task: generate school value-added measures with standard errors for an education project
Commands:
  1. codes: reg posttest pretest i.year_sch [pw=weight]
  2. year_sch represents school/year. It produces school value-added measures with standard errors.
Error message:
  1. “too many variables specified”
  2. There’re in total 6400 unique school/year in my sample. “Reg” does work when I cut my sample and reduce the number of unique school/year to 5000. So I think the problem seems very clear. Stata does not allow “reg” to run with so many dummies.
Other possible commands
  1. My current coding strategies involve two steps. First, I use “reg” to estimate school value-added measures. However, “reg” computes fixed effects relative to some arbitrary holdout unit (e.g. a school/year), which can produce incorrect standard errors. Thus, in the second step, I use “Contrast” to normalize fixed-effects to grand mean and computes their standard errors.
  2. For the estimation stage, I considered other stata commands but failed to find anyone worked. “Areg” does not produce standard errors. More importantly, areg does not work with contrast, because areg treats school/year as a factor variable. xtreg does not work neither because it requires constant weight within panel (school/year), which is not true in our case.
  3. For the normalization stage, I considered command “Felsdvregdm”. But it does not allow for weights, which are important to our project.
Questions
  1. Is my understanding of the error report correct? “Reg” does work when I cut my sample and reduce the number of unique school/year to 5000. So I think the problem seems very clear. Stata does not allow “reg” to run with so many dummies.
  2. The online stata guide says that the Maximum right-hand-side variables for stata MP is 10,998. In my sample, there are only 6400 unique school/years. Why doesn’t “reg” work? FYI: http://www.stata.com/products/which-...-right-for-me/
  3. How can I fix the problem using “reg”?
  4. Or are there other commands I should consider?
Version of stata: StataMP 14 (64-bit)
Num. of obs. in my sample: about 887,000

Thank you again for your help!

Best,
Lihan Liu

Repeated Measures Question

$
0
0
Hello,

I was wondering what might be the best analysis to use if I have baseline, 6 month, 12 month, and 18 month binary HIV viral load data for each of my participants? I am wanting to know what are the factors (i.e. gender, race, age, length of time with HIV diagnosis) related to improved HIV outcome (which is considered suppressed HIV viral load versus unsuppressed). Thanks for any help!

&quot;By&quot;- graphs in s1color scheme: how to change background color to white?

$
0
0
Hello,

I am doing scatterplots and bar charts with the by option. I want to use s1color, as I have other graphs, where I need colours to make them readable.

But I dislike the yellow background color in the by option. Any idea how to change that in stata 12?

Here is an example:

Code:
sysuse auto, clear
scatter price mpg, by(foreign, note("")) msize(tiny) scheme(s1color)
I also would appreciate if I could change the color of the dots, but that is okay. Thanks!


Array

Multiple imputation for logistic regression

$
0
0
I have not tried multiple imputation before, and am sorry to trouble with this simple question.

I am trying to develop a prediction rule from the logit coefficients

My dichotomous outcome is tabxneg0 (temporal artery biopsy result zero negative, 1 positive) The pathology report always provides this outcome.
My predictors are age (continuous), crp (continuous), gender(dichotomous) opa (continuous)

I have 67 subjects and am missing 6 crp values and 2 opa values.

I tried the following

mi set mlong
mi register imputed crp opa
mi register regular tabxneg0 age gender
mi impute mvn crp opa, add(10) rseed(54321)



When I do mi estimate : logit ..... my coefficients are very different from the logit without missing data.
In particular the avgopa coefficient should be negative, and it becomes positive when the missing data is input
What am I doing incorrectly?

Sorry to trouble with this trivial question.

Different results using subpop and over

$
0
0
I am running some survey weighted proportions and I found that I get different results depending on if I use the subpop command with svy, or if I use over.

My data is poststratification weighted only.
I want the proportion of respondents who are risky drinkers (risky_drinker_binge==1) who answer 'Yes' to a question (b5a_15). Using subpop I find the proportion is 0.02983 and using over I find the proportion is 0.0308966.

I had thought that both methods of using the subpopulation should give the same result. Could there be something wrong with my weighting?



Here is an example of the output:


. svyset

pweight: <none>
VCE: linearized
Poststrata: bmark_grps
Postweight: pop
Single unit: scaled
Strata 1: region
SU 1: <observations>
FPC 1: <zero>



. labellist risky_drinker_binge
risky_drinker:
0 Not a risky drinker
1 Risky drinker
98 Don't know/Refused


. svy, subpop(if risky_drinker_binge==1 & last_month_drinker==1): prop b5a_15
(running proportion on estimation sample)

Survey: Proportion estimation

Number of strata = 16 Number of obs = 12086
Number of PSUs = 12086 Population size = 3189540
N. of poststrata = 40 Subpop. no. obs = 1849
Subpop. size = 502830.3
Design df = 12070

_prop_3: b5a_15 = Don't know

--------------------------------------------------------------
| Linearized
| Proportion Std. Err. [95% Conf. Interval]
-------------+------------------------------------------------
b5a_15 |
Yes | .02983 .0043656 .0223649 .0396856
No | .9679202 .0045089 .9578028 .9756735
_prop_3 | .0022498 .0011667 .0008134 .0062068
--------------------------------------------------------------




. svy, subpop(if last_month_drinker==1): prop b5a_15, over(risky_drinker_binge)
(running proportion on estimation sample)

Survey: Proportion estimation

Number of strata = 16 Number of obs = 11975
Number of PSUs = 11975 Population size = 3189540
N. of poststrata = 40 Subpop. no. obs = 7292
Subpop. size = 1925872
Design df = 11959

Yes: b5a_15 = Yes
No: b5a_15 = No
_prop_3: b5a_15 = Don't know

_subpop_1: risky_drinker_binge = Not a risky drinker
_subpop_2: risky_drinker_binge = Risky drinker
_subpop_3: risky_drinker_binge = Don't know/Refused

--------------------------------------------------------------
| Linearized
Over | Proportion Std. Err. [95% Conf. Interval]
-------------+------------------------------------------------
Yes |
_subpop_1 | .002948 .0007758 .0017594 .0049357
_subpop_2 | .0308966 .0045405 .0231357 .0411514
_subpop_3 | .0388163 .0378929 .0054856 .2281972
-------------+------------------------------------------------
No |
_subpop_1 | .9960936 .000896 .9938789 .9975091
_subpop_2 | .9668978 .0046728 .9564083 .9749294
_subpop_3 | .9183779 .054923 .7279609 .9793002
-------------+------------------------------------------------
_prop_3 |
_subpop_1 | .0009584 .0004504 .0003813 .0024064
_subpop_2 | .0022055 .0011437 .0007974 .006085
_subpop_3 | .0428059 .0415786 .0060813 .2463412
--------------------------------------------------------------


Transfer the b_date of CHNS into age

$
0
0
Hello guys,

I am so glad I found this forum, as I am new to work with STATA and I am struggling a lot right now. If anyone could help I would very much appreciate it.

QUESTION:

I am trying to analyse data of Chinese Health and Nutrition Survey, in the survey, age of children was expressed like this:
Array I want to get age of children,and the interview date is 891220, so I use"if b_date<=890000, gen age=b_date-890000",but I can't get the right answer. So how should I do?
And I need to split the age into age groups of 0-1year, 1-2 years, 2-4 years. I have no clue.
Please help, I feel very helpless.
Thanks!

Get alphas and betas plus t-stat from panel estimation of the Fama French 3-Factor Model (large dataset)

$
0
0
Hello,
I have a panel data set with daily stock returns across a bunch of firms. I would like to estimate daily alphas (constant) and betas using the Fama French 3-Factor model with a rolling window of 250 days. As the rolling command in STATA takes ages, I use the -fastreg- command from Geertsema (see https://papers.ssrn.com/sol3/papers....act_id=2423171).
I do not manage to adjust the code in that it stores standard errors or t-stats after each regression.
Can anyone help me how to do that in STATA? The code that I use is as follows:

**rolling window betas**
gen mktrf_beta = .
gen smb_beta = .
gen hml_beta = .
gen _cons_beta = .
gen mktrf_beta_f = .
gen smb_beta_f = .
gen hml_beta_f = .
gen _cons_beta_f = .

local maxobs = 400000
local window = 250

timer clear
timer on 1
forvalues k = `window'/`maxobs' {
local first = `k'-`window'+1
local last = `k'
if permno[`last'] == permno[`first'] {
* if window covers same permno, then do estimation
qui fastreg eret mktrf smb hml in `first'/`last'
* save coefficients
foreach x in mktrf smb hml _cons {
qui replace `x'_beta_f = _be[`v'] in `last'
}
}
}
timer off 1
timer list 1

Thank you so much for your help.
Best,
Luce

xtdcce2 , long run and short run estimtes

$
0
0
dear Stata Members

I have generated the following results based on the xtdcce2. How do i obtain the long run estimates as well as the short run estimates.

Hope for some help on this issue please.

Thank you
Regards
Anita



. xtdcce2 lnfdipc lnpr xmgdp lnexch m2g gfcfg lnnrr,reportconstant crosssectional(lnfdipc lnpr x
> mgdp lnexch m2g gfcfg lnnrr) lr(lnpr xmgdp lnexch m2g gfcfg lnnrr) residuals(res1)


Dynamic Common Correlated Effects - Mean Group

Panel Variable (i): country Number of obs = 368
Time Variable (t): year Number of groups = 23
Obs per group (T) = 16

F( 161, 46)= 1.63
Prob > F = 0.03
R-squared = 0.92
Adj. R-squared = 0.92
Root MSE = 0.21

CD Statistic = -1.24
p-value = 0.2153
------------------------------------------------------------------------------------------
lnfdipc| Coef. Std. Err. z P>|z| [95% Conf. Interval]
------------------------+-----------------------------------------------------------------
Short Run Estimates: |
------------------------+-----------------------------------------------------------------
Mean Group Estimates: |
_cons| -253.451 231.295 -1.10 0.273 -706.7796 199.8786
------------------------+-----------------------------------------------------------------
Long Run Estimates: |
------------------------+-----------------------------------------------------------------
Mean Group Estimates: |
lnpr| -5.39846 2.10927 -2.56 0.010 -9.532545 -1.264376
xmgdp| .277525 .446056 0.62 0.534 -.5967289 1.151779
lnexch| 149.123 162.425 0.92 0.359 -169.2249 467.47
m2g| -.566815 .766276 -0.74 0.459 -2.068688 .9350574
gfcfgdp| .298534 1.07605 0.28 0.781 -1.810487 2.407555
lnnrr| 3.71483 4.00993 0.93 0.354 -4.144494 11.57415
------------------------------------------------------------------------------------------
Mean Group Variables: lnpr xmgdp lnexch m2g gfcfgdp lnnrr _cons
Cross Sectional Averaged Variables: lnfdipc lnpr xmgdp lnexch m2g gfcfgdp lnnrr
Long Run Variables: lnpr xmgdp lnexch m2g gfcfgdp lnnrr
Degrees of freedom per country:
in mean group estimation = 9
with cross-sectional averages = 2
Number of
cross sectional lags = 0
variables in mean group regression = 322
variables partialled out = 161
Long Run Variables are averages of the individual long run coefficents.





Converting Excel string dates into Stata dates

$
0
0
I have a string variable containing a date imported from Excel in the following format;

"MM/DD/YY"

However, MM and DD could be single digit for single digit months. That's one problem. The other problem is that the years span 1996-2015 and are not specified as 1998 or 2006 but are reported in a two digit format (e.g. 98, 06).

I would very much appreciate a solution to this problem.

Using 1st estimations as data for 2nd estimation within same data set

$
0
0
Hi,

I am working on panel data with following description
.
Code:
xtdes
coid: 2, 5, ..., 376 n = 104
year: 1991, 1992, ..., 2015 T = 25
Delta(year) = 1 unit
Span(year) = 25 periods
(coid*year uniquely identifies each observation)

by using following command

Code:
bysort year: egen avgsales=mean(sales)
Code:
bysort year: egen avginc=mean(inc)
I am getting yearwise average of all 25 years.

Now following command will give me a sort of new dataset having 25 time observation from 1991 to 2015 which is basically average of all 104 firms present in 1 year (I am calling it First Stage Estimation)

Code:
list year avgsales avginc if coid == 2
Can someone guide me any Clever way by which i can use those 25 time observation for as new subdataset (for 2nd Stage Estimation) being in the same dataset without exporting

For My 2nd stage Estimation; a) I want mean and standard deviation of those 25 time observation and b) I want to
Code:
reg avginc avgsales
for those 25 time observation

Regards and Stay Blessed
Mubeen




Community innovation survey and weights

$
0
0
Dear all,
I am conducting a research using Community Innovation Survey data, which are weighted according to the auxiliary variables number of employees and to the number of firms, using the Deville and Särndal's approach (1992). Such weights are meant to make the sample representative of the corresponding population. None of the articles I reviewed so far explicitly say whether they take into account such weights and, if yes, how do they consider them.

I advance that they should be treated as sample weights (i.e. pweight command). Am I correct to the best of your knowledge?

Thanks

Expand quarterly to monthly dataset

$
0
0
Hi,

I would like to expend a quarterly to monthly dataset by just duplicating information. In other word, I would like to go from 1 to 3 observations (no interpolation, just copy info). How to do that in stata? Thanks.

Here is a short sample of the data I have:
Code:
cqtr    beta1_t010    beta2_t010
1984q3    .6369938    -.1176437
1984q4    .6333363    -.138367
1985q1    .6199132    -.1459348
1985q2    .6048287    -.1301749
1985q3    .609488    -.1263284
1985q4    .6129703    -.1144919
1986q1    .6313893    -.1163448
1986q2    .6459937    -.1377332
1986q3    .654822    -.1255156
1986q4    .6732373    -.1230011
1987q1    .6801462    -.1404264
1987q2    .6723115    -.1214778
1987q3    .6611854    -.1264669
1987q4    .6521533    -.1395243
1988q1    .6319343    -.1173788
1988q2    .6238974    -.140165
1988q3    .6215904    -.1248284
1988q4    .6182418    -.1242707
1989q1    .6273249    -.1335198
1989q2    .6330974    -.1167396
1989q3    .6461476    -.1407138
1989q4    .6476254    -.1294639
1990q1    .6499776    -.1277468
1990q2    .6490695    -.1411913
1990q3    .6367927    -.1243939
1990q4    .6256983    -.1194668
1991q1    .6282375    -.1181863

Thanks for your help!
Viewing all 65664 articles
Browse latest View live


Latest Images

<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>