Hi, I am trying to estimate the following model with a standard deviation that is random. The model to be estimated is : Zt=a*Z(t-1) + b*Y(t-1)+c*D(t-1)+ exp(sigma(t)*epsilon. where Zt is a policy shock (deviation from mean). Y is deviation from mean of log of output. D is change in debt/output ratio. epsilon is N(0,1). Please note the Sigma is time variant. How should I estimate this using STATA 15? I was trying to use bayes/bayesmh but with no success. Appreciate any advice as to how to proceed
↧
Estimating models with stochastic volatility using Bayesian techniques involving MCMC. .
↧
Data perperation (Event data analysis) - please help :)
Hi all,
I am so stuck with my data preperation before even beginning with the actual analysis and really hope someone can help me.
I have a very large panel data set (already with some event data variables included) about people labour status and would like to drop some ID´s which are not relevant for my analysis.
I am only interested in women (it was easy to drop all men) which had at least one period of maternity leave (ArbeitKinder=Kinder) and at least one period of full- or part-time work (already created the "work-variable"-> ArbeitKinder=Arbeit).
The data looks somewhat like this:
Array
In this little example of my data basis I am only interested to keep the ID "2674000", because it is the only woman with maternity leave and work periods (duration and activity beginning and end are in the example included).
I have two questions:
1) How can I drop all ID´s which are not interesting to me and keep ALL periods of the interesting ID´s?
2) In a second step (after droping the ID´s) I want to only keep those women who worked AFTER the maternity leave (and I want to treat it as "not working again" after 36 months?)
I am so greatful in advance for your help!
Best regards,
Susa
I am so stuck with my data preperation before even beginning with the actual analysis and really hope someone can help me.
I have a very large panel data set (already with some event data variables included) about people labour status and would like to drop some ID´s which are not relevant for my analysis.
I am only interested in women (it was easy to drop all men) which had at least one period of maternity leave (ArbeitKinder=Kinder) and at least one period of full- or part-time work (already created the "work-variable"-> ArbeitKinder=Arbeit).
The data looks somewhat like this:
Array
In this little example of my data basis I am only interested to keep the ID "2674000", because it is the only woman with maternity leave and work periods (duration and activity beginning and end are in the example included).
I have two questions:
1) How can I drop all ID´s which are not interesting to me and keep ALL periods of the interesting ID´s?
2) In a second step (after droping the ID´s) I want to only keep those women who worked AFTER the maternity leave (and I want to treat it as "not working again" after 36 months?)
I am so greatful in advance for your help!
Best regards,
Susa
↧
↧
Using categorical independent variables in GSEM builder
My goal is to use and understand GSEM, both the builder and the syntax. I'm doing a path analysis to determine the mediation effects of a number of variables. My dependent variable is ordinal, as is the main independent variable, and there are three mediating variables that are either binary, continuous or ordinal. Using GSEM syntax, I write the model like this:
gsem (fi4<-i.sc2 i.health c.town i.retired, family(ordinal) link(logit)) (retired<-i.sc2, family(binomial) link(logit)) (town<-i.sc2) (health<-i.sc2, family(binomial) link(logit)) if sex==1 [pweight=static_weight_v3]
...where fi4 is the DV, sc2 is the 'main' IV and the other variables (health, town, retired) are mediators. Obviously, when I run this command the output reports coefficients for each level of the relationship between an IV and the DV, e.g. I get a coefficient for 1.sc2's effect on fi4, a separate one for 2.sc2 etc. However, using the GSEM builder, I can't work out how to add each categorical variable level to the plot separately, only the 'full' variable (i.e. sc2, not 1.sc2). To calculate the mediation indirect and direct effects, I need each separate path coefficient, not an overall coefficient for the ordinal variable. Am I misunderstanding something? Levels of categorical variables do not appear in the drop-down list from which I select the variable I wish to add, only the list of variables in the main data. Grateful for any help here, I realise it may be a silly question!
gsem (fi4<-i.sc2 i.health c.town i.retired, family(ordinal) link(logit)) (retired<-i.sc2, family(binomial) link(logit)) (town<-i.sc2) (health<-i.sc2, family(binomial) link(logit)) if sex==1 [pweight=static_weight_v3]
...where fi4 is the DV, sc2 is the 'main' IV and the other variables (health, town, retired) are mediators. Obviously, when I run this command the output reports coefficients for each level of the relationship between an IV and the DV, e.g. I get a coefficient for 1.sc2's effect on fi4, a separate one for 2.sc2 etc. However, using the GSEM builder, I can't work out how to add each categorical variable level to the plot separately, only the 'full' variable (i.e. sc2, not 1.sc2). To calculate the mediation indirect and direct effects, I need each separate path coefficient, not an overall coefficient for the ordinal variable. Am I misunderstanding something? Levels of categorical variables do not appear in the drop-down list from which I select the variable I wish to add, only the list of variables in the main data. Grateful for any help here, I realise it may be a silly question!
↧
poisson vs. ols for difference in differences with extremely small sample
Hi,
I'm trying to calculate a difference in differences estimate with a very small sample (10 clusters, 4 years of data for 40 observations total. I've tried 3 models.
The ols estimate is an effect size of 3 times the poisson with p-value of .000. The count models show an effect size 3 times smaller with p = .09.
This doesn't make sense to me, because I can look at the data in excel and calculate the "difference-in-differences" by hand and it is equal to the ols estimate. Unfortunately I can't post a data sample because its sensitive, but it is essentially all 0's and 1's and a single count of a much higher number in the treatment group and post treatment period. (1 treatment cluster and 1 post treatment period). Should be simple to simulate.
I don't know what kind of correction the poisson data could make to have this effect. It seems the poisson model is no longer estimating a diff in diff. Any idea what is happening here? And how to push back if poisson is requested rather than ols since this is count data?
Edit: Ok, I forgot there is different interpretation for OLS vs. Poisson. Converting the estimate yields an equivalent effect size. Does anyone know why the standard errors are so much larger in the poisson model vs. the ols?
I'm trying to calculate a difference in differences estimate with a very small sample (10 clusters, 4 years of data for 40 observations total. I've tried 3 models.
Code:
reg Count treatment treatxpost i.Yearpoisson Count treatment treatxpost i.Yearnbreg Count treatment treatxpost i.Year
The ols estimate is an effect size of 3 times the poisson with p-value of .000. The count models show an effect size 3 times smaller with p = .09.
This doesn't make sense to me, because I can look at the data in excel and calculate the "difference-in-differences" by hand and it is equal to the ols estimate. Unfortunately I can't post a data sample because its sensitive, but it is essentially all 0's and 1's and a single count of a much higher number in the treatment group and post treatment period. (1 treatment cluster and 1 post treatment period). Should be simple to simulate.
I don't know what kind of correction the poisson data could make to have this effect. It seems the poisson model is no longer estimating a diff in diff. Any idea what is happening here? And how to push back if poisson is requested rather than ols since this is count data?
Edit: Ok, I forgot there is different interpretation for OLS vs. Poisson. Converting the estimate yields an equivalent effect size. Does anyone know why the standard errors are so much larger in the poisson model vs. the ols?
↧
skewness and kurtosis
Array
hi
in the case of the high value of skewness and kurtosis statistics, what should I do (winsorize) and is the accepted value for skewness and kurtosis (0 and 3) or (3 and 10 ) respectively
thanks
hi
in the case of the high value of skewness and kurtosis statistics, what should I do (winsorize) and is the accepted value for skewness and kurtosis (0 and 3) or (3 and 10 ) respectively
thanks
↧
↧
Geodist between dataset and point
Hello,
I currently deal with a dataset containing the latitude and longitude for over 2000 locations. I need to find the distance between all these points and another given point, for which I also know the latitude and longitude. I know that I should use geodist, and I have tried to do it manually and it works, but I have no idea what code to use so I get all the distances in a new variable. Please help.
I currently deal with a dataset containing the latitude and longitude for over 2000 locations. I need to find the distance between all these points and another given point, for which I also know the latitude and longitude. I know that I should use geodist, and I have tried to do it manually and it works, but I have no idea what code to use so I get all the distances in a new variable. Please help.
↧
How to create a column that denotes whether or not a data row is a specific data row of interest
Hello, I am trying to figure out how to create a column that tells me whether or not a specific visit date is the visit of interest. For example in the data examples below, I show part of what the data looks like and then what I want. I have a column that gives the date of the visit of interest (presample_date) and also a place where all visits are listed (all_visits). I want to be able to have a row which says whether the visit date is the visit of interest for each specific patient. In this specific instance there is not 1 visit of interest for each record as shown, but if there is any code that could also eventually be applicable if there was more than one date that would be helpful!
Example:
input str16 record_id float (all_visits presample_date precollection_yes_no)
"Sub1" 20870 21457 .
"Sub1" 21457 21457 .
"Sub1" 21222 21457 .
"Sub1" 21345 21457 .
"Sub2" 20320 20320 .
"Sub2" 21555 20320 .
"Sub3" 20345 21123 .
"Sub3" 21333 21123 .
"Sub3" 21567 21123 .
"Sub3" 21222 21123 .
"Sub3" 21145 21123 .
"Sub3" 21123 21123 .
end
Example of what I want:
input str16 record_id float (all_visits presample_date precollection_yes_no)
"Sub1" 20870 21457 0
"Sub1" 21457 21457 1
"Sub1" 21222 21457 0
"Sub1" 21345 21457 0
"Sub2" 20320 20320 1
"Sub2" 21555 20320 0
"Sub3" 20345 21123 0
"Sub3" 21333 21123 0
"Sub3" 21567 21123 0
"Sub3" 21222 21123 0
"Sub3" 21145 21123 0
"Sub3" 21123 21123 1
end
*Where '1' is yes and '0' is no
Also just to note my data the date information is actually int I just could not figure out how to get it to run for my examples, but if this changes anything about how the code should be written it would be helpful to know!
Thank you for the help!
Example:
input str16 record_id float (all_visits presample_date precollection_yes_no)
"Sub1" 20870 21457 .
"Sub1" 21457 21457 .
"Sub1" 21222 21457 .
"Sub1" 21345 21457 .
"Sub2" 20320 20320 .
"Sub2" 21555 20320 .
"Sub3" 20345 21123 .
"Sub3" 21333 21123 .
"Sub3" 21567 21123 .
"Sub3" 21222 21123 .
"Sub3" 21145 21123 .
"Sub3" 21123 21123 .
end
Example of what I want:
input str16 record_id float (all_visits presample_date precollection_yes_no)
"Sub1" 20870 21457 0
"Sub1" 21457 21457 1
"Sub1" 21222 21457 0
"Sub1" 21345 21457 0
"Sub2" 20320 20320 1
"Sub2" 21555 20320 0
"Sub3" 20345 21123 0
"Sub3" 21333 21123 0
"Sub3" 21567 21123 0
"Sub3" 21222 21123 0
"Sub3" 21145 21123 0
"Sub3" 21123 21123 1
end
*Where '1' is yes and '0' is no
Also just to note my data the date information is actually int I just could not figure out how to get it to run for my examples, but if this changes anything about how the code should be written it would be helpful to know!
Thank you for the help!
↧
Tabout with svy - balanced repeated replication variance estimation
Hi everyone,
I am analyzing a complex survey (Population Assessment of Tobacco & Health) and need to use Balance Repeated Replication (BRR) for the variance estimation. I am generating hundreds of tables and would like to use some automated procedure (tabout or putexcel) to help populate the tables. However, when I run the tabout command that works for a colleague (his dataset calls for Taylor Series Linearization) it only spits out the point estimates, not the standard errors. When I use Taylor Series instead it does spit out the SEs, but they are wider than they should be (hence the need to use BRR). Is there a way to use tabout with BRR? Here is a snippet of code:
svyset [pweight= R01_A_PWGT], brr(R01_A_PWGT1 - R01_A_PWGT100) vce(brr) mse fay(.3)
tabout both UM_W1_A_cigs_everysome_menthol_1 using "PATH_1.txt", ///
replace f(1p) c(row lb ub) layout(col) svy per npos(col) nlab(Observaciones) ///
h2(" Prevalence of Menthol use in PATH W1") ///
clab(SE) h1(nil)cibnone ci2col h3(|nonsmokers%|lb|ub|menthol%|lb|ub|nonmenthol%|l b|ub|unknown%|lb|ub|total|observations) cisep(" ")
Thanks in advance,
Jana
I am analyzing a complex survey (Population Assessment of Tobacco & Health) and need to use Balance Repeated Replication (BRR) for the variance estimation. I am generating hundreds of tables and would like to use some automated procedure (tabout or putexcel) to help populate the tables. However, when I run the tabout command that works for a colleague (his dataset calls for Taylor Series Linearization) it only spits out the point estimates, not the standard errors. When I use Taylor Series instead it does spit out the SEs, but they are wider than they should be (hence the need to use BRR). Is there a way to use tabout with BRR? Here is a snippet of code:
svyset [pweight= R01_A_PWGT], brr(R01_A_PWGT1 - R01_A_PWGT100) vce(brr) mse fay(.3)
tabout both UM_W1_A_cigs_everysome_menthol_1 using "PATH_1.txt", ///
replace f(1p) c(row lb ub) layout(col) svy per npos(col) nlab(Observaciones) ///
h2(" Prevalence of Menthol use in PATH W1") ///
clab(SE) h1(nil)cibnone ci2col h3(|nonsmokers%|lb|ub|menthol%|lb|ub|nonmenthol%|l b|ub|unknown%|lb|ub|total|observations) cisep(" ")
Thanks in advance,
Jana
↧
Reshaping dataset
Hello,
I'm trying to reshape my data and I cannot find the right code for this. What I want is to have four rows corresponding to v05_itm: potatoes, onions, cauliflower, cabbage (variable v05_idc corresponds to codes for the food names); and 25 columns corresponding to new_id. The middle of the matrix should be filled with values from exp variable.
So basically it should be 4 by 25 matrix. I don't think I can use transpose function as my original data has over 4 thousand observations which cannot be transposed and the collapsed.
Any help is highly appreciated![Smile]()
Cheers,
marta
I'm trying to reshape my data and I cannot find the right code for this. What I want is to have four rows corresponding to v05_itm: potatoes, onions, cauliflower, cabbage (variable v05_idc corresponds to codes for the food names); and 25 columns corresponding to new_id. The middle of the matrix should be filled with values from exp variable.
So basically it should be 4 by 25 matrix. I don't think I can use transpose function as my original data has over 4 thousand observations which cannot be transposed and the collapsed.
Any help is highly appreciated

Cheers,
marta
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input float new_id int v05_idc str25 v05_itm float exp 1 51 "Potatoes" 240 1 52 "Onions" 70 1 53 "Cauliflower/Cabbage" 200 1 54 "Tomatoes" 100 2 51 "Potatoes" . 2 52 "Onions" . 2 53 "Cauliflower/Cabbage" . 2 54 "Tomatoes" . 3 51 "Potatoes" . 3 52 "Onions" . 3 53 "Cauliflower/Cabbage" . 3 54 "Tomatoes" 460 4 51 "Potatoes" 295 4 52 "Onions" 90 4 53 "Cauliflower/Cabbage" . 4 54 "Tomatoes" 60 5 51 "Potatoes" 360 5 52 "Onions" . 5 53 "Cauliflower/Cabbage" . 5 54 "Tomatoes" 360 6 51 "Potatoes" . 6 52 "Onions" . 6 53 "Cauliflower/Cabbage" . 6 54 "Tomatoes" . 7 51 "Potatoes" . 7 52 "Onions" . 7 53 "Cauliflower/Cabbage" . 7 54 "Tomatoes" . 8 51 "Potatoes" 640 8 52 "Onions" 220 8 53 "Cauliflower/Cabbage" 680 8 54 "Tomatoes" 330 9 51 "Potatoes" 700 9 52 "Onions" . 9 53 "Cauliflower/Cabbage" . 9 54 "Tomatoes" 210 10 51 "Potatoes" . 10 52 "Onions" . 10 53 "Cauliflower/Cabbage" . 10 54 "Tomatoes" . 11 51 "Potatoes" 600 11 52 "Onions" 320 11 53 "Cauliflower/Cabbage" 1200 11 54 "Tomatoes" 615 12 51 "Potatoes" . 12 52 "Onions" . 12 53 "Cauliflower/Cabbage" . 12 54 "Tomatoes" . 13 51 "Potatoes" . 13 52 "Onions" . 13 53 "Cauliflower/Cabbage" . 13 54 "Tomatoes" . 14 51 "Potatoes" 490 14 52 "Onions" 160 14 53 "Cauliflower/Cabbage" . 14 54 "Tomatoes" 130 15 51 "Potatoes" 210 15 52 "Onions" . 15 53 "Cauliflower/Cabbage" . 15 54 "Tomatoes" 185 16 51 "Potatoes" . 16 52 "Onions" . 16 53 "Cauliflower/Cabbage" . 16 54 "Tomatoes" . 17 51 "Potatoes" . 17 52 "Onions" . 17 53 "Cauliflower/Cabbage" . 17 54 "Tomatoes" . 18 51 "Potatoes" 320 18 52 "Onions" 120 18 53 "Cauliflower/Cabbage" . 18 54 "Tomatoes" 320 19 51 "Potatoes" . 19 52 "Onions" . 19 53 "Cauliflower/Cabbage" . 19 54 "Tomatoes" . 20 51 "Potatoes" . 20 52 "Onions" . 20 53 "Cauliflower/Cabbage" . 20 54 "Tomatoes" . 21 51 "Potatoes" 520 21 52 "Onions" 200 21 53 "Cauliflower/Cabbage" . 21 54 "Tomatoes" . 22 51 "Potatoes" . 22 52 "Onions" . 22 53 "Cauliflower/Cabbage" . 22 54 "Tomatoes" . 23 51 "Potatoes" . 23 52 "Onions" . 23 53 "Cauliflower/Cabbage" . 23 54 "Tomatoes" . 24 51 "Potatoes" . 24 52 "Onions" . 24 53 "Cauliflower/Cabbage" 90 24 54 "Tomatoes" 70 25 51 "Potatoes" 600 25 52 "Onions" 260 25 53 "Cauliflower/Cabbage" . 25 54 "Tomatoes" . end
↧
↧
Margin option and squaring the variable
Dear Researchers,
I am examining the impact of GDP on fertility rate across countries. There are many other independent variables in the equation. The GDP is lagged to one year and the independent variables as well. After running the regression, the sign of the GDP’s coefficient is negative. Hence, I have resorted to literature and I found that most of the relationships between the GDP and fertility are negative as well. And, accordingly, they have used the squared GDP and they found that the relationship is U-shaped. Accordingly, I have used a similar method by squaring the GDP and I found it U-shaped as well. But after squaring the GDP, both values of the coefficient for GDP and GDP^2 are becoming too high, and what shocked me more that the value of the constant becomes too high as well.
The regression is:
Fertility rate= lagLogGDP + LagX1 + LagX2 + LagX3
My questions are:
I am examining the impact of GDP on fertility rate across countries. There are many other independent variables in the equation. The GDP is lagged to one year and the independent variables as well. After running the regression, the sign of the GDP’s coefficient is negative. Hence, I have resorted to literature and I found that most of the relationships between the GDP and fertility are negative as well. And, accordingly, they have used the squared GDP and they found that the relationship is U-shaped. Accordingly, I have used a similar method by squaring the GDP and I found it U-shaped as well. But after squaring the GDP, both values of the coefficient for GDP and GDP^2 are becoming too high, and what shocked me more that the value of the constant becomes too high as well.
The regression is:
Fertility rate= lagLogGDP + LagX1 + LagX2 + LagX3
My questions are:
- Is there any method to prove that there is a U-shape relationship between the GDP and fertility rate without using the squaring approach?
- How can I benefit from the margin code in this case, I mean can I use the margin code at some specific values to prove that there is a U-shaped relationship and I can use the graphs accordingly to see it?
↧
spxtregress is arbitrarily dropping regressors
I'm trying to use Stata's built-in commands to estimate a spatial panel model (i.e. I'm using spxtregress, not xsmle). I've successfully spset and xtset my data and linked a shapefile, so I'm able to run regressions with spxtregress. However, Stata is dropping some of my variables, even when no spatial effects are included (i.e. no DV spatial AR1, no IV spatial AR1, and no spatial error). To illustrate, I've shown two regressions below, both of which include the same DV and single IV and do not include spatial effects. The only difference is that in the first model, I use the years 2009-2015 and in the second I use the years 2009-2014. Model 1 works and model 2 doesn't. Also, 2009-2014 works when a different single regressor is included. This can't be a multicollinearity issue because there's only one regressor. What is going on? Please help.
Array
Array
Array
Array
↧
Zero LR test statistic for melogit
When using the following code
I obtain the following output. What I don't understand is why the LR test statistic is 0 comparing with the logistic model.
The variable goodprog is 0/1, with 34% of records=1. the general data characteristics are consistent with the first example in the stata melogit documentation.
Any suggestions would be very appreciated.
Thanks.
Mixed-effects logistic regression Number of obs = 3,074
Group variable: sa3 Number of groups = 80
Obs per group:
min = 8
avg = 38.4
max = 124
Integration method: mvaghermite Integration pts. = 7
Wald chi2(7) = 270.05
Log likelihood = -1826.6925 Prob > chi2 = 0.0000
------------------------------------------------------------------------------------
goodprog | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------------+----------------------------------------------------------------
1.screen | 3.380587 .288287 14.28 0.000 2.86025 3.995583
|
dxagecat |
2. 40-49 | 2.532121 .6644552 3.54 0.000 1.51398 4.234954
3. 50-59 | 2.312459 .6022525 3.22 0.001 1.387998 3.852649
4. 60-69 | 2.757566 .7171189 3.90 0.000 1.656405 4.590765
5. 70-79 | 3.076532 .8337443 4.15 0.000 1.808775 5.232849
|
remoteness |
2. Inner regional | .9092641 .1170525 -0.74 0.460 .7064999 1.170221
3. Major city | 1.221622 .1362389 1.79 0.073 .9817663 1.520076
|
_cons | .0964161 .025415 -8.87 0.000 .0575142 .1616308
-------------------+----------------------------------------------------------------
sa3 |
var(_cons)| 1.14e-33 3.75e-18 . .
------------------------------------------------------------------------------------
Note: Estimates are transformed only in the first equation.
Note: _cons estimates baseline odds (conditional on zero random effects).
LR test vs. logistic model: chi2(0) = 0.00 Prob > chi2 = .
Note: LR test is conservative and provided only for reference.
Code:
melogit goodprog i.screen i.dxagecat i.remoteness || sa3: , or
The variable goodprog is 0/1, with 34% of records=1. the general data characteristics are consistent with the first example in the stata melogit documentation.
Any suggestions would be very appreciated.
Thanks.
Mixed-effects logistic regression Number of obs = 3,074
Group variable: sa3 Number of groups = 80
Obs per group:
min = 8
avg = 38.4
max = 124
Integration method: mvaghermite Integration pts. = 7
Wald chi2(7) = 270.05
Log likelihood = -1826.6925 Prob > chi2 = 0.0000
------------------------------------------------------------------------------------
goodprog | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------------+----------------------------------------------------------------
1.screen | 3.380587 .288287 14.28 0.000 2.86025 3.995583
|
dxagecat |
2. 40-49 | 2.532121 .6644552 3.54 0.000 1.51398 4.234954
3. 50-59 | 2.312459 .6022525 3.22 0.001 1.387998 3.852649
4. 60-69 | 2.757566 .7171189 3.90 0.000 1.656405 4.590765
5. 70-79 | 3.076532 .8337443 4.15 0.000 1.808775 5.232849
|
remoteness |
2. Inner regional | .9092641 .1170525 -0.74 0.460 .7064999 1.170221
3. Major city | 1.221622 .1362389 1.79 0.073 .9817663 1.520076
|
_cons | .0964161 .025415 -8.87 0.000 .0575142 .1616308
-------------------+----------------------------------------------------------------
sa3 |
var(_cons)| 1.14e-33 3.75e-18 . .
------------------------------------------------------------------------------------
Note: Estimates are transformed only in the first equation.
Note: _cons estimates baseline odds (conditional on zero random effects).
LR test vs. logistic model: chi2(0) = 0.00 Prob > chi2 = .
Note: LR test is conservative and provided only for reference.
↧
Spatial model (xsmle) with spatial DV and IV lags and spatial errors?
Is there a way to estimate a model with (1) a spatially lagged dependent variable, (2) spatially lagged independent variables, and (3) spatial errors? Using xsmle, the sdm model incorporates 1 and 2 and the sac model incorporates 1 and 3, but I can’t seem to figure out how to include 1, 2, and 3 simultaneously. It seems there is a way using spxtregress but that command seems really buggy (separate post). If you have a guess, please let me know.
↧
↧
Bootstrap with xtabond2
Hello everyone,
I am a beginner researcher. Now, I try my data with xtabond2 command. I see almost estimator command can use Bootstrap , but xtabond2 cannot. So, can I use any other option similar to bootstrap with xtabond2?
Thank for your helping!
I am a beginner researcher. Now, I try my data with xtabond2 command. I see almost estimator command can use Bootstrap , but xtabond2 cannot. So, can I use any other option similar to bootstrap with xtabond2?
Thank for your helping!
↧
Omitted coefficients using reghdfe
Hello,
I am estimating regressions that include different fixed effects and variables whose coefficients I am not interested in. I want to control them in the estimations. To do so, I am using the command reghdfe. My understanding of this command is that it accounts for the variables declared in the "absorb" option, but it does not estimate their coefficients. Then, I would expect to obtain the same coefficients of the variables of interest when using the command reg, but controlling for all the variables (those inside and outside "absorb") and when using the command reghdfe.
However, it is not the case; when estimating an equation using the command reg, I obtain a coefficient for each one of the variables of interest. When using the command reghdfe, it omits the coefficients of some of the variables of interest. If I use a big dataset, the estimated coefficients of non-omitted variables are the same as those obtained using reg. If the sample is small (such as the one below), the coefficients are quite different, and Stata omits most of the variables of interest.
The following are examples of the estimations:
trend_* --> These are statefip-birthyr (state-year o birth) specific time trends. Therefore, they should be equivalent to what Stata would generate with statefip#c.birthyr
The following is an example of the dataset I used to generate the estimations above:
The following are the results of such estimations:
As you can see, when using reghdfe, Stata omits all of the variables of interest, except for one of them, while when using reg, I obtain a coefficient for each one of these variables.
So, what I would like to understand is:
1) What are the differences between reg and reghdfe that are generating different coefficients?
1) Why does reghdfe omit variables, whereas reg estimated coefficients for each variable?
2) How can I prevent Stata from omitting the variables of interest and instead make it omit the coefficients of the variables I declared in "absorb"?
I would appreciate any help!
I am estimating regressions that include different fixed effects and variables whose coefficients I am not interested in. I want to control them in the estimations. To do so, I am using the command reghdfe. My understanding of this command is that it accounts for the variables declared in the "absorb" option, but it does not estimate their coefficients. Then, I would expect to obtain the same coefficients of the variables of interest when using the command reg, but controlling for all the variables (those inside and outside "absorb") and when using the command reghdfe.
However, it is not the case; when estimating an equation using the command reg, I obtain a coefficient for each one of the variables of interest. When using the command reghdfe, it omits the coefficients of some of the variables of interest. If I use a big dataset, the estimated coefficients of non-omitted variables are the same as those obtained using reg. If the sample is small (such as the one below), the coefficients are quite different, and Stata omits most of the variables of interest.
The following are examples of the estimations:
Code:
reg manager nonblack admit_exp20 admit20_nb admit_noexpbord20 admit20free_nb i.cpuma0010 i.birthyr trend_*, vce(cluster stateyear) reghdfe manager nonblack admit_exp20 admit20_nb admit_noexpbord20 admit20free_nb, absorb(cpuma0010 birthyr statefip#c.birthyr) vce(cluster stateyear)
The following is an example of the dataset I used to generate the estimations above:
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input float(statefip birthyr manager nonblack admit20_nb admit_noexpbord20 admit20free_nb admit_exp20) int cpuma0010 float(stateyear trend_1 trend_2 trend_3 trend_4 trend_5 trend_6) 12 1968 0 0 0 0 0 0 255 445 0 1968 0 0 0 0 13 1966 0 0 0 1 0 1 262 491 0 0 1966 0 0 0 15 1965 0 1 0 . . 0 284 538 0 0 0 1965 0 0 15 1965 0 1 0 . . 0 288 538 0 0 0 1965 0 0 12 1968 0 1 0 0 0 0 253 445 0 1968 0 0 0 0 12 1964 0 1 0 1 1 0 204 441 0 1964 0 0 0 0 15 1967 0 1 0 . . 0 284 540 0 0 0 1967 0 0 12 1968 0 1 0 0 0 0 257 445 0 1968 0 0 0 0 15 1968 0 1 0 . . 0 285 541 0 0 0 1968 0 0 15 1967 0 1 0 . . 0 282 540 0 0 0 1967 0 0 12 1967 0 1 0 0 0 0 239 444 0 1967 0 0 0 0 17 1968 0 1 1 1 1 1 310 637 0 0 0 0 0 1968 17 1965 0 0 0 1 0 1 335 634 0 0 0 0 0 1965 15 1966 0 1 0 . . 0 283 539 0 0 0 1966 0 0 11 1966 0 1 0 1 1 0 201 395 1966 0 0 0 0 0 13 1967 0 1 1 1 1 1 273 492 0 0 1967 0 0 0 17 1965 0 0 0 1 0 1 337 634 0 0 0 0 0 1965 12 1967 0 1 0 0 0 0 239 444 0 1967 0 0 0 0 11 1964 0 1 0 1 1 0 201 393 1964 0 0 0 0 0 11 1964 0 0 0 1 0 0 202 393 1964 0 0 0 0 0 11 1965 0 0 0 1 0 0 201 394 1965 0 0 0 0 0 11 1964 1 0 0 1 0 0 202 393 1964 0 0 0 0 0 11 1966 0 1 0 1 1 0 201 395 1966 0 0 0 0 0 11 1968 0 1 0 1 1 0 200 397 1968 0 0 0 0 0 11 1966 0 1 0 1 1 0 200 395 1966 0 0 0 0 0 11 1966 0 1 0 1 1 0 201 395 1966 0 0 0 0 0 11 1964 0 0 0 1 0 0 201 393 1964 0 0 0 0 0 11 1966 1 1 0 1 1 0 201 395 1966 0 0 0 0 0 11 1964 0 0 0 1 0 0 202 393 1964 0 0 0 0 0 11 1967 0 0 0 1 0 0 202 396 1967 0 0 0 0 0 11 1968 0 1 0 1 1 0 201 397 1968 0 0 0 0 0 11 1965 0 0 0 1 0 0 202 394 1965 0 0 0 0 0 11 1966 0 1 0 1 1 0 202 395 1966 0 0 0 0 0 11 1964 0 0 0 1 0 0 202 393 1964 0 0 0 0 0 11 1967 0 1 0 1 1 0 201 396 1967 0 0 0 0 0 11 1966 0 0 0 1 0 0 202 395 1966 0 0 0 0 0 11 1964 0 0 0 1 0 0 202 393 1964 0 0 0 0 0 11 1965 0 0 0 1 0 0 202 394 1965 0 0 0 0 0 11 1968 0 1 0 1 1 0 201 397 1968 0 0 0 0 0 11 1967 0 1 0 1 1 0 201 396 1967 0 0 0 0 0 11 1968 0 0 0 1 0 0 202 397 1968 0 0 0 0 0 11 1967 1 0 0 1 0 0 201 396 1967 0 0 0 0 0 11 1964 0 0 0 1 0 0 201 393 1964 0 0 0 0 0 11 1968 1 1 0 1 1 0 200 397 1968 0 0 0 0 0 11 1965 0 1 0 1 1 0 200 394 1965 0 0 0 0 0 11 1966 0 0 0 1 0 0 201 395 1966 0 0 0 0 0 11 1968 1 0 0 1 0 0 201 397 1968 0 0 0 0 0 11 1968 0 1 0 1 1 0 201 397 1968 0 0 0 0 0 11 1965 0 0 0 1 0 0 202 394 1965 0 0 0 0 0 11 1965 0 1 0 1 1 0 201 394 1965 0 0 0 0 0 11 1966 0 1 0 1 1 0 200 395 1966 0 0 0 0 0 11 1964 0 0 0 1 0 0 201 393 1964 0 0 0 0 0 11 1966 0 0 0 1 0 0 201 395 1966 0 0 0 0 0 11 1964 0 0 0 1 0 0 201 393 1964 0 0 0 0 0 11 1968 0 0 0 1 0 0 201 397 1968 0 0 0 0 0 11 1965 0 0 0 1 0 0 201 394 1965 0 0 0 0 0 11 1965 0 0 0 1 0 0 201 394 1965 0 0 0 0 0 11 1965 0 0 0 1 0 0 202 394 1965 0 0 0 0 0 11 1968 0 1 0 1 1 0 201 397 1968 0 0 0 0 0 11 1968 0 0 0 1 0 0 201 397 1968 0 0 0 0 0 11 1967 0 0 0 1 0 0 202 396 1967 0 0 0 0 0 11 1966 0 0 0 1 0 0 202 395 1966 0 0 0 0 0 11 1968 1 1 0 1 1 0 201 397 1968 0 0 0 0 0 11 1966 0 0 0 1 0 0 202 395 1966 0 0 0 0 0 11 1965 0 1 0 1 1 0 201 394 1965 0 0 0 0 0 11 1965 0 0 0 1 0 0 201 394 1965 0 0 0 0 0 11 1966 0 0 0 1 0 0 202 395 1966 0 0 0 0 0 11 1966 0 1 0 1 1 0 201 395 1966 0 0 0 0 0 11 1968 1 1 0 1 1 0 200 397 1968 0 0 0 0 0 11 1964 0 0 0 1 0 0 202 393 1964 0 0 0 0 0 11 1967 0 1 0 1 1 0 201 396 1967 0 0 0 0 0 11 1964 0 0 0 1 0 0 201 393 1964 0 0 0 0 0 11 1965 0 0 0 1 0 0 201 394 1965 0 0 0 0 0 11 1964 0 0 0 1 0 0 201 393 1964 0 0 0 0 0 11 1965 0 0 0 1 0 0 202 394 1965 0 0 0 0 0 11 1966 1 1 0 1 1 0 200 395 1966 0 0 0 0 0 11 1966 0 1 0 1 1 0 200 395 1966 0 0 0 0 0 11 1968 1 1 0 1 1 0 201 397 1968 0 0 0 0 0 11 1968 0 0 0 1 0 0 202 397 1968 0 0 0 0 0 11 1968 0 0 0 1 0 0 201 397 1968 0 0 0 0 0 11 1967 0 0 0 1 0 0 202 396 1967 0 0 0 0 0 11 1967 1 1 0 1 1 0 201 396 1967 0 0 0 0 0 11 1966 0 0 0 1 0 0 202 395 1966 0 0 0 0 0 11 1967 0 0 0 1 0 0 202 396 1967 0 0 0 0 0 11 1966 0 1 0 1 1 0 201 395 1966 0 0 0 0 0 11 1968 0 0 0 1 0 0 202 397 1968 0 0 0 0 0 11 1967 0 1 0 1 1 0 201 396 1967 0 0 0 0 0 11 1967 0 0 0 1 0 0 201 396 1967 0 0 0 0 0 11 1968 1 1 0 1 1 0 200 397 1968 0 0 0 0 0 11 1966 0 1 0 1 1 0 201 395 1966 0 0 0 0 0 11 1965 0 1 0 1 1 0 200 394 1965 0 0 0 0 0 11 1967 0 0 0 1 0 0 202 396 1967 0 0 0 0 0 11 1964 0 0 0 1 0 0 202 393 1964 0 0 0 0 0 11 1964 0 1 0 1 1 0 201 393 1964 0 0 0 0 0 11 1968 0 1 0 1 1 0 201 397 1968 0 0 0 0 0 11 1966 0 0 0 1 0 0 201 395 1966 0 0 0 0 0 11 1964 1 0 0 1 0 0 201 393 1964 0 0 0 0 0 11 1968 0 1 0 1 1 0 201 397 1968 0 0 0 0 0 11 1966 0 0 0 1 0 0 201 395 1966 0 0 0 0 0 11 1967 0 1 0 1 1 0 201 396 1967 0 0 0 0 0 end label values statefip statefip_lbl label def statefip_lbl 11 "District of Columbia", modify label def statefip_lbl 12 "Florida", modify label def statefip_lbl 13 "Georgia", modify label def statefip_lbl 15 "Hawaii", modify label def statefip_lbl 17 "Illinois", modify
The following are the results of such estimations:
Code:
. reg manager nonblack admit_exp20 admit20_nb admit_noexpbord20 admit20free_nb i.cpuma0010 i.birthyr tren > d_*, vce(cluster stateyear) note: 255.cpuma0010 omitted because of collinearity note: 257.cpuma0010 omitted because of collinearity note: 310.cpuma0010 omitted because of collinearity note: 337.cpuma0010 omitted because of collinearity note: trend_1 omitted because of collinearity note: trend_2 omitted because of collinearity note: trend_3 omitted because of collinearity note: trend_4 omitted because of collinearity note: trend_5 omitted because of collinearity Linear regression Number of obs = 94 F(2, 11) = . Prob > F = . R-squared = 0.1838 Root MSE = .3353 (Std. Err. adjusted for 12 clusters in stateyear) Robust manager Coef. Std. Err. t P>t [95% Conf. Interval] nonblack -1.02e-14 1.19e-14 -0.85 0.411 -3.63e-14 1.60e-14 admit_exp20 -.2653784 .1433278 -1.85 0.091 -.5808408 .050084 admit20_nb -.2726897 .0501327 -5.44 0.000 -.383031 -.1623483 admit_noexpbord20 .5479404 .1346726 4.07 0.002 .251528 .8443529 admit20free_nb -.0098723 .0563795 -0.18 0.864 -.1339628 .1142181 cpuma0010 201 -.2675532 .1452092 -1.84 0.092 -.5871564 .05205 202 -.3605872 .1864338 -1.93 0.079 -.7709252 .0497509 204 -.4246173 .1518185 -2.80 0.017 -.7587674 -.0904671 239 .0862023 .0371662 2.32 0.041 .0044 .1680046 253 7.72e-16 1.02e-15 0.75 0.466 -1.48e-15 3.02e-15 255 0 (omitted) 257 0 (omitted) 262 -.0786729 .0147373 -5.34 0.000 -.1111095 -.0462362 273 .0862023 .0371662 2.32 0.041 .0044 .1680046 310 0 (omitted) 335 -5.81e-15 7.51e-15 -0.77 0.456 -2.23e-14 1.07e-14 337 0 (omitted) birthyr 1965 -.1691112 .0222369 -7.60 0.000 -.2180542 -.1201682 1966 -.0904383 .0334711 -2.70 0.021 -.1641078 -.0167689 1967 .0272485 .0172123 1.58 0.142 -.0106356 .0651326 1968 .1134508 .0399003 2.84 0.016 .0256308 .2012708 trend_1 0 (omitted) trend_2 0 (omitted) trend_3 0 (omitted) trend_4 0 (omitted) trend_5 0 (omitted) _cons -.1134508 .0399003 -2.84 0.016 -.2012708 -.0256308 . reghdfe manager nonblack admit_exp20 admit20_nb admit_noexpbord20 admit20free_nb, absorb(cpuma0010 birt > hyr statefip#c.birthyr) vce(cluster stateyear) (dropped 9 singleton observations) note: admit_noexpbord20 is probably collinear with the fixed effects (all partialled-out values are close > to zero; tol = 1.0e-09) (MWFE estimator converged in 5 iterations) note: admit_exp20 omitted because of collinearity note: admit20_nb omitted because of collinearity note: admit_noexpbord20 omitted because of collinearity note: admit20free_nb omitted because of collinearity HDFE Linear regression Number of obs = 85 Absorbing 3 HDFE groups F( 1, 5) = 0.03 Statistics robust to heteroskedasticity Prob > F = 0.8692 R-squared = 0.1709 Adj R-squared = 0.0589 Within R-sq. = 0.0001 Number of clusters (stateyear) = 6 Root MSE = 0.3398 (Std. Err. adjusted for 6 clusters in stateyear) Robust manager Coef. Std. Err. t P>t [95% Conf. Interval] nonblack -.0098723 .0569517 -0.17 0.869 -.1562713 .1365266 admit_exp20 0 (omitted) admit20_nb 0 (omitted) admit_noexpbord20 0 (omitted) admit20free_nb 0 (omitted) _cons .14559 .0254608 5.72 0.002 .080141 .2110389 Absorbed degrees of freedom: Absorbed FE Categories - Redundant = Num. Coefs - cpuma0010 4 0 4 birthyr 5 1 4 statefip#c.birthyr 2 0 2 ? ? = number of redundant parameters may be higher
So, what I would like to understand is:
1) What are the differences between reg and reghdfe that are generating different coefficients?
1) Why does reghdfe omit variables, whereas reg estimated coefficients for each variable?
2) How can I prevent Stata from omitting the variables of interest and instead make it omit the coefficients of the variables I declared in "absorb"?
I would appreciate any help!
↧
[putdocx] Using if command to remove blank line
Hello,
I am making a lot of news report with putdocx command in Stata.
I will collect various news from branches of my company, and I already have distributed the excel sheet that staff in branches will write something.
The form is like below:
The converting code is like below:
Each news can have 1, 2 or 3 body. If some news is short, it can be finish just 1 body. However, other news is long, it can be added 1 or 2 more bodes.
The problem is blank lines that is made from my converting code from xlsx to docx above.
When I open the result docx file, the blank line is located under the news that have just 1 of 2 bodies.
I want to remove that blank lines, so I think "if command" can help me, but I don't know how to use this.
IF a news don't have second or third body, my code don't need to run "putdocx text (BODY2) or (BODY3)".
It is not a major error, but I want to make a well designed report.
Thank you in advance!
I am making a lot of news report with putdocx command in Stata.
I will collect various news from branches of my company, and I already have distributed the excel sheet that staff in branches will write something.
The form is like below:
Code:
clear set obs 3 gen TITLE = "" gen BODY1 = "" gen BODY2 = "" gen BODY3 = "" replace TITLE = "Title1" in 1 replace TITLE = "Title2" in 2 replace TITLE = "Title3" in 3 replace BODY1 = "This is body1..." in 1 replace BODY1 = "This is body1..." in 2 replace BODY1 = "This is body1..." in 3 replace BODY2 = "This is body2..." in 1 replace BODY2 = "This is body2..." in 3 replace BODY3 = "This is body3..." in 3
Code:
putdocx clear putdocx begin putdocx paragraph, style(Title) putdocx text ("Report of something") levelsof TITLE, local(tlist) foreach t of local tlist { preserve keep if TITLE == "`t'" putdocx paragraph, style(Heading1) putdocx text ("`t'") putdocx paragraph putdocx text (BODY1) putdocx paragraph putdocx text (BODY2) putdocx paragraph putdocx text (BODY3) restore } putdocx save "news_sample.docx", replace
The problem is blank lines that is made from my converting code from xlsx to docx above.
When I open the result docx file, the blank line is located under the news that have just 1 of 2 bodies.
I want to remove that blank lines, so I think "if command" can help me, but I don't know how to use this.
IF a news don't have second or third body, my code don't need to run "putdocx text (BODY2) or (BODY3)".
It is not a major error, but I want to make a well designed report.
Thank you in advance!
↧
Sample Selection Issue: Pooled Cross Sectional Data
Hi All,
I am using micro level dependent and country level independent variable. Just for example, firm profitability and inflation, my problem is that I don't have panel data on micro level dependent variable, therefore, I am using a pooled cross sectional data consists of 39000 firms and 47 countries, as some of countries repeats in the next survey with unique firms, therefore at maximum I have 73 country level values as independent variable. I have doubt on feasibility of this study ? can I go with it? any reference ?
Secondly, I have 2 firm level dependent variable proxies and 3 country level proxies for independent variable.. I ran 6 regression using instrumental approach.. for some regression the t-statistic is very high at first stage of regression around 100 to 500 .. again, I be concerned with this high t statistic? (note: at first stage both endogenous regressor and instrument variable are country level in other words both contain 73 observations at maximum)
third, a high constant value of 100 to 1500, is problematic?
Can expert please provide their opinions?
Can expert please provide their opinions?
↧
↧
R200 error ologit categorical dependent and independent variables
Hi all,
I am really struggling with identifying the predictors of a 5 category variable Excellent to Poor Health. I want to look at this by a range of categorical variables e.g. agegroup, sex,ethnicity. I've tried this in Stata where the data are string and I get the R200 error. However, if I translate the variables to Byte Stata treats them as continuous. I've tried using YouTube and online resources, but I still cannot get anything to work. I'd really appreciate some guidance.
I am really struggling with identifying the predictors of a 5 category variable Excellent to Poor Health. I want to look at this by a range of categorical variables e.g. agegroup, sex,ethnicity. I've tried this in Stata where the data are string and I get the R200 error. However, if I translate the variables to Byte Stata treats them as continuous. I've tried using YouTube and online resources, but I still cannot get anything to work. I'd really appreciate some guidance.
↧
P value for trend
I would be very grateful if someone could help me with the code for this - I'm reporting the incidence of stroke in patients with diabetes versus no diabetes in 5 year bands, and would like to calculate the P value for trend over time if possible. I've tried the ptrend command but I'm confused about the other syntax. My data looks something like this:
Year Strokes in non-diabetics Strokes in diabetics
2002-2007 238/72243 343/4030
2007-2012 336/74945 282/4375
2012-2017 431/76741 169/4809
Thanks much in advance!
Year Strokes in non-diabetics Strokes in diabetics
2002-2007 238/72243 343/4030
2007-2012 336/74945 282/4375
2012-2017 431/76741 169/4809
Thanks much in advance!
↧
Selected variables for Lasso
I'm performing an inferential lasso model (dslogit) and I've specificed some variables of interest and some controls. It says that it selected 15 controls but still my output only shows my 2 variables of interest. Where can I see which controls were picked?
↧