Quantcast
Channel: Statalist
Viewing all 65064 articles
Browse latest View live

Fixed effects estimator and gmm

$
0
0
  • I am running a regression using panel data for 190 countries and over the period 1960-2010. This is a very big data set. I think FE estimator is appropriate for this estimation. I am using two instruments for independent variables as independent variable is endogenous. logyi,t = a0+ a1i xi,t +a2i,t+.......+ ei,t I don't need to estimate Pooled ols estimates as countries are heterogeneous, so estimates will be biased upward. Since T is very large, so FE estimator is appropriate and gives good results. 1. Do I need to do GMM? why do I need to run GMM? 2. How can I test endogeneity in 2sls fixed effects estimator? If any one helps me in solving these questions, I will be greatly benefited?

drop few obs per year

$
0
0
I'm using stata 13 with OS windows 10.

I have a panel with quarterly data but some year don't have all four quarters and I'd like to drop if there is less than four quarter per year.
Any idea how to do this?

xtset help

$
0
0
Hi there guys,

I'm new to Stata for about a week and I'm really struggling to xtset my data. I currently have approximately 3,400 observations (company data per month). For each company, there is monthly data from 31/01/1990 through to 31/12/2015.

something like this:
Company Month ROA ROE ROIC LTD STD Size = log(TA) Age TAN
African Oxygen Ltd 31/01/1990 0.07540 0.16490 0.13920 0.14336 0.07598 13.88144 29.170 0.52453
African Oxygen Ltd 28/02/1990 0.07540 0.16490 0.13920 0.14336 0.07598 13.88144 29.170 0.52453
African Oxygen Ltd 31/03/1990 0.07540 0.16490 0.13920 0.14336 0.07598 13.88144 29.170 0.52453
I have already managed to convert company to numeric type. But then when I go to xtset my data, I keep getting 109 or 111 errors. Am I correct in assuming the problem is due to the months not being properly formatted for Stata?

Many thanks for the help,

Neil.

SVAR with Block Exogeneity

$
0
0
Hi all,
I have a couple of questions. I'm current trying to run a 10 variable SVAR where my system of contemporaneous variables are over identified (>45 restrictions) and have imposed block exogeneity in that system. Would this be my "A" matrix when specifying the short-run restrictions in Stata? If I do create the 10x10 matrix with my restrictions and define it as A and then run the SVAR, when I try to generate the IRFs, Stata freezes on me. Also, in Stata 13 there is an option for model 2 with exogenous variables, is this where I would impose the block exogeneity by defining my exogenous variables? Or would I just define that through the A matrix.


Any help would be greatly appreciated! For reference I have uploaded my system of contemporaneous variables that I would like to use.

Thanks

Fixed Effects with ,fe and dummies

$
0
0
Hey there,
I have a very basic question.
I would like to make a country fixed and time fixed effect estimation.
Do I receive the right AND the same results when I do the following
A)
xtset id year, yearly
xtreg Dep.Var. Ind.Var., fe
B)
reg Dep.Var. Ind.Var DummiesForYears DummiesForCountries

(the panel data sets covers more than 20 years and 90 countries and several explanatory variables)

I appreciate any comment, it is urgent

Running Beta regression

$
0
0
Hi All,

I need to run Capital Asset Pricing Model (CAPM) to get beta.for each company.. I will like some help regarding to run a command so i do not need to do it for individual firm. Is there any rolling command in stata?

> I have a database that has the following structure:
>
> Firmid date ret mktret
> A 012007 .1 . .05
> A 022007 .05 .02
> A 032007 -.05 -.1
> (...)
> A 122015 .1 .1
> B 082007 .2 .1
> B 092007 .05 .2
> (...)
> B 122015 .1 .1
> (...)

I need to calculate a simple regression of ret on mktret for each of the last 36 months of data of each company starting from 2010 (2007 till 2010) one value of beta and and save the results in a separate file. So, we have each 6 beta values for each firm from 2010 to 2015) What about saving the r-square value of each regression as well?

Regards,
Michael Bond

oaxaca command gives r(499) error when making estimation with large number of dummies

$
0
0
Hello
I'm trying to make a wage gap decomposition with a relatively large set of dummy variables representing occupations. oaxaca command ends up with r(499) error if I use even one fifth of dummies available in the set. Here is an illustration:

If I use only first 21 variable of the occupations dummies set, oaxaca performs fine:
oaxaca ln_wklyearn otpaid otunpaid age age_sq married prov1d2-prov1d10 parttime permjob firmsize2-firmsize4 uslhrs educlev2-educlev10 tenure soc_2-soc_22, by(female) pooled

But if I use, for instance, 121 dummies from the set, I have an error:
oaxaca ln_wklyearn otpaid otunpaid age age_sq married prov1d2-prov1d10 parttime permjob firmsize2-firmsize4 uslhrs educlev2-educlev10 tenure soc_2-soc_122, by(female) pooled
(model 2 has zero variance coefficients)
dropped coefficients or zero variances encountered
specify -noisily- to view model estimation output
specify -relax- to ingnore
r(499);

I appreciate if somebody can help me to solve the issue, or at least let me know if it looks solvable within short time (a week).
Thank you.

Add var_names to table with regression F-stats

$
0
0
Hi all, see the code below. I am trying to create a vector of dependent variable names and then export the vector of F-statistics and dependent variable names for many regressions to excel using putexcel. I cannot figure out how to create the vector of variable names, the line
Code:
matrix names=nullmat(names) \ e(depvar)
is not correct, but I don't know why. Can you please help troubleshoot. I am really confused! Thanks!

Array


Understanding differences in marginsplot and predicted values

$
0
0
Dear statalist,

I'm running a MV logistic regression and using marginsplot to understand the relationship between aki2 (DV) and log_avl (IV) in an observational cohort. As a side note, log_avl was log transformed after assessing its distribution with ladder plots, and it is now normally distributed and has a understandable and statistically significant relationship with aki2.

To better understand the specific nature of the relationship between aki2 and log_avl in the context of other coviarates, I'm also generating scatterplots after using the predict command. There seems to be a discrepancy in the shape of the marginsplot and scatter plot. Please see the code and graphs below:


logistic aki2 c.log_avl##i.it_type i.agecat male race i.bmicat i.cci_cat Auto_CKD_Preop i.renal i.clavien_cat
margins, at(log_avl=(-2(1)6))
predict fitted2
quietly margins, at(log_avl=(-2(0.5)6)) saving(file2, replace)
marginsplot
graph addplot scatter fitted2 log_avl, msym(oh) msize(vsmall) mcolor(cranberry*0.8) xlabel(-2(1)6)
graph addplot qfitci fitted2 log_avl



Array

The marginsplot is in blue. The scatterplot is in red and is created by using the predict command. The quadratic fit to the scatterplot is in gray. A quadratic fit was chosen after running the fp command to determine the optimal fit between pr(aki2) and log_avl.


When I assess the scatterplot, it seems like the quadratic fit does a much better job fitting the relationship. Is that because margins naturally attempts to fit the shape of a logistic regression, and so the tail ends of the curve flatten out? Which of the two curves is more appropriate than? Is marginsplot really giving me the correct relationship between these two variables in the regression?



Thanks for any help!
Julien





Combining bar graphs

$
0
0
I am trying to combine bar graphs of different variables in one figure but the y-axis is distorted across the variables. I have tried a number of different scaling option but can't get a common axis for bar graphs with different variables.

2SLS with Poisson first stage, and summation of predicted first-stage values

$
0
0
Hi there,

I realize questions like this come up a lot, but I couldn't find answers that suited exactly what's going on for me.

I've got an instrumental variables setup with two continuous instruments Z1 and Z2, and a few exogenous variables W, in the first stage. I'm instrumenting for a count variable, and in the interest of precision, want the first stage to be a Poisson regression.

Now, between the first stage and the second stage, I need to sum up the predicted values from the first stage, because my model in the second stage is at an aggregated level vis-a-vis the first. In particular, the first stage instruments for a sort of trade flow between each pair of states, so it's at the level of the source state, destination state, and year. In the second stage, I want to estimate the impact of the total flow into the destination state on an outcome variable.

What I've been trying so far, based on a combination of earlier statalist posts--especially this one--is something like the following:

Code:
xtset src_des_num year // sets panel, where panel variable is source-destination combination
xtpoisson flow `stage1_covars' i.year, fe vce(robust) // list of stage1_covars has been defined elsewhere and includes the two instruments
predict flow_hat // get the Poisson-estimated values

* Now I need to collapse to the destination state level
collapse (sum) flow_hat /// total flow into state
              (max) log_gdp_des log_pc_des officer_rate_dest /// these are constant within destination state and year
              (mean) log_gdp_src log_pc_src norm_score_source officer_rate_source, /// these need to be averaged over source states or they don't make sense in the second stage
              by(dest_state year) // final dataset is at destination state-year level

merge 1:1 dest_state year using "[outcome dataset]", nogen

* Set the new panel
egen dest_state_num = group(dest_state)
xtset dest_state_num year

* Make new variable list
local stage2_covars log_gdp_des log_pc_des officer_rate_dest log_gdp_src log_pc_src norm_score_source officer_rate_source

* Follow instructions from statalist post
ivregress 2sls log_homic_rate `stage2_covars' i.year i.dest_state (flow = flow_hat), vce(cluster dest_state)
At this point I get the following error:

Code:
flow_hat included in both endogenous and excluded exogenous variable lists
r(498);
What am I doing wrong? I'm pretty sure/I've read that I can just do a linear first stage instead, and the normal 2SLS will get the job done. I also remember reading somewhere though (can't find the link) that there's more precision/efficiency/something if you estimate the first stage in its "natural" non-linear way. (Again, the endogenous variable, trade flow, is a count variable.)

I really appreciate any and all help you could offer.

Best,
Isaac

Making multiple value weighted portfolios

$
0
0
Hello,

For my master thesis I have to calculate multiple value weighted portfolios.
Now as I am not that proficient with Stata, I have come to this forum for help.

I currently have the Market Value and the current Portfolio allocation file both with the same form:

The market value file looks as follows:
Date MVFirm A MVFirm B MVFirm C etc.
5/10/2002 MVA MVB MVC etc.
6/10/2002 MVA MVB MVC etc.
etc.
All market values are numbers.


The Portfolio file looks as follows:
Date PortFirm A PortFirm B PortFirm C etc.
5/10/2002 1 1 1 etc.
6/10/2002 2 1 3 etc.
etc.
All portfolio values are within a scale of 1 to 5, which can differ on daily basis.

Now I need to make a new variable for every firm which has to be calculated by dividing the market value of a single firm by the market value of all firms that have the same portfolio number (e.g. 1).
Which should be something in the neighbourhood of
Foreach var of varlist {
g VWport`var' = MV`var' / (sum(MV`var') if Port`var' == 1)
}

However, I cannot find anything close to this when I googled this topic, therefore, I have come here.

If you could help me out, that would be amazing.

Thanks in advance.

Generating post-estimation graphs and tables Predicted value of mean of response

$
0
0
test -- please ignore -- I stuffed it up! :-(

help with unit root testing!

$
0
0
Hello!

After reading several websites, i still can't be 100% sure when chosing between none, Intercept only, Trend only and both (trend and inercept) for unit root testing. So, i will really appreciate any suggestion when choosing between the models mentioned above, especialy when working with the next tests:

- ADF
- Phillips - Perron
- Kpss
- Zivot - Andrews

looking forward to your help.

Regards,

Joel

Generating graphs and tables following regression (nbreg)


compare half-life and elimination rate

$
0
0
I wonder if Is there a function implemented on comparison of half-lifes and elimination rates b/w two groups?
If the half-life and elimination rate are estimated using pkexamine for a set of curves in one group, and a set of curves in another group.

Many thanks in advance.

Advice on generating graphs and tables following regression (nbreg)

$
0
0
Generating graphs and tables following regression (nbreg)

Today, 18:40

I'm in transition from predominantly using SPPS to using Stata -- with all the learning that entails.

My biggest problem is that I can't figure out how I generate graphs and a table after running a negative binomial (nbreg) regression.

This is what I do in SPSS:

First, I run the negbin model and include this syntax to save the estimates I want:

SAVE STDPEARSONRESID STDDEVIANCERESID meanpred

This command saves:
- Predicted value of mean of response
- Standardized deviance residual, and
- Standardized Pearson residual

Second, I graph the predicted mean and deviance residual using this syntax:

GRAPH
/SCATTERPLOT(BIVAR)=MeanPredicted WITH StdDevianceResidual ... etc

[ATTACH=CONFIG]temp_6130_1477009200240_393[/ATTACH]


Third, I graph the predicted mean with a predictor variable, using this syntax:

GGRAPH
/GRAPHDATASET NAME="graphdataset" VARIABLES=example.predictor
MeanPredicted MISSING=LISTWISE REPORTMISSING=NO
/GRAPHSPEC SOURCE=INLINE.

[ATTACH=CONFIG]temp_6131_1477009266021_587[/ATTACH]


Finally, I produce a table with the count and percent of observations outside ± two standard deviations, which looks like this:
PC_std.dev N % of Total N
= 'Outside +- 2 Standard Deviations' 39 2.30%
= 'Within +- 2 Standard Deviations' 1673 97.70%
Total 1712 100.00%

I've tried to figure out how to do this in Stata, but I've reached the point where I really would appreciate some help.

Help with self-weighted strata samples using samplepps

$
0
0
Hi all,

I'm trying to pick primary units for sampling using PPS (Probability Proportional to Size) sampling from each strata. I would like the sample picked per strata to be self-weighted, that is, in proportion to the cumulative size of all units in that strata. I cannot figure out how to do this using 'samplepps'.

I was unable to find any option built into 'samplepps', so I tried to generate the sample size per stratum using loops. But 'samplepps' rejects the looped variable placeholder for the sample size per stratum. Is this something wrong with my code, or a limitation of samplepps?

Are there any other ways to achieve the same results?

I have attached example code below:

Code:
    

    webuse lifeexp, clear

    set seed 3454

    egen strata = group(region lexp)
    bysort country: gen popn = runiform()*1000
    egen total_popn = sum(popn)
    egen strata_popn = total(popn), by(strata)
    gen strata_weight = strata_popn/total_popn

    cap drop h
    quietly levelsof strata, local(index)

    foreach i in `index' {
        quietly sum strata_weight if strata == `i'
        local h2 `r(mean)' // because strata_weight is constant within a stratum.
        local j2 round(`h2'*110) // becausesamplepps accepts only integers
        samplepps treatment_village2`i' if strata == `i', size(popn) n(`j2')
    }
However this returns the error
Code:
option ncases() invalid

How to copy all variables and labels?

$
0
0
Array
As the captured picture above, I want to copy names and labels to WORD or EXCEL, but in stata can only copy a name once a time , how should I do to copy all I want and paste ?
Thanks~

advice on bootstrap sampling to internally validate a logistic model

$
0
0
I am not familiar with this type of analysis, and so would greatly appreciate advice on the following.

I would like to develop and internally validate a model using a logistic regression analysis and a bootstrap sampling process.

Using the dataset, nlsw88, installed in Stata, as an example, I wrote the following code:

sysuse nlsw88, clear
cd "C:\Documents"
save nlsw88, replace

capture program drop mysim
program define mysim, rclass
use nlsw88, clear
bsample
merge m:1 idcode using nlsw88
* Fit logistic regression model on the bootstrap sample
logit union south grade if _merge == 3
matrix b = e(b)
* test the model on the subjects that were not sampled
lroc union if _merge == 2, nograph beta(b)
return scalar area=r(area)
end

simulate area=r(area), reps(10000): mysim
_pctile area, p(2.5 50 97.5)
ret list
* Gives the validation AUROC and accompanying 95% probability interval.

Does this way of going about the task make sense?

Thank you for your feedback.
Best wishes,
Miranda
Viewing all 65064 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>