Quantcast
Channel: Statalist
Viewing all 65052 articles
Browse latest View live

Elasticity and Marginal Effect of Tobit Regression

$
0
0
Hi! I have been using stata14 for about a week and have been piecing together information online and feel as though I am getting a decent handle on it for my purposes. After running a Tobit regression on my data set, I need to find the marginal effects (dy/dx) and elasticities (ey/ex). This is where I have not been able to find clear answers.

For the marginal effect, I have told me simply running:

mfx

will work and others have told me running:

mfx, predict(e(0,.))

Likewise, post-Tobit elasticity, I have been seeing both:
mfx, eyex
and
mfx, predict(e(0,.)) eyex

If anyone has any insight on what the "predict(e(0,.))" function does, and if or why it may be necessary and wouldn't mind sharing their insights I would be greatly appreciative. In addition, if anyone knows has any recommendations or cautions with these results I would really appreciate it.

Thanks in advance,

Bryan Gensits
Royal University of Bhutan: College of Natural Resources


Divide groups by size in panel data

$
0
0
Dear Statalist,

I am very new on Stata, I want to divide my panel data into groups regarding to firm size (small firms and large firms). Based on the median of firm size in current year (2015), I have two groups by ID (one group includes all id with the firm size in 2015 >= median and the other group include the rest of id with the firm size<median in 2015) my thought is that I should creating the dummy variable of id (companies) then run two regressions on two sub-sample for large and small firm, respectively. However, the syntax did not work as following

sort id
by id: gen newid=1 if year=2015 & firm size>=732047.9 => syntax did not work!

Anyone can help me correct the syntax or instruct me the process! Many thanks!

Trang

Ploting age profiles of consumptions

$
0
0
Dear All,
Please see attached an image, taken from Deaton (1997). Using a different dataset, how can one replicate the Fig 6.2 in the attached picture?
Thank you very much,
Dapel

Boostrapping a relative difference with confidence bounds

$
0
0
Dear All,

I have used the search function on STATAlist but have not found a solution to my problem.
I want to bootstrap relative difference to calculate a mean relative difference with confidence bounds. I am assuming that nlcom can get me here, but have so far not been able to do it. Does anybody have any experience with this? Any suggestions would be more than helpful!

Thank you all.

Best,

Jasper

two part model with positive and negative outcome

$
0
0
Dear Statalist,
I am modelling the following problem: a business manager decision whether or not to report a particular number, then decides on the amount of the number to report which may be a positive but also a negative number. In general terms the model is: Amount_Number (positive or negative values) = Size Industry_leader Sales, if Decision_to_report = 1
I use the same Indep variables for both the amount and decision parts.

I have been using twopm command from Belotti et al 2015 (STATA journal, 2015) with firstpart (probit) and secondpart (glm) but I wonder whether there is a better approach. In the examples of two part models I found the outcome is always truncated to zero, e.g. health expenditure only takes positive values. In my case Amount_number can take any value.

Thanks for you help
Helena

Command to generate child's age from a birth history data

$
0
0
How do I generate child's age from birth history data

Confidence interval for mean of predicted probabilities following a binary logistic regression

$
0
0
How does one compute a confidence interval for the mean of the predicted probabilities following a binary logistic regression? Do I e.g. use some kind of bootstrapping approach or the margins commando?

To be more specific, I have a very large sample of individuals from the same country but from different geographic regions that differs widely in the number of observations in my sample. Using this sample, I run a binary logistic regression and predict the individual probabilities for the outcome. Then, I find the mean of the predicted probabilities for different geographic regions. The problem is how to calculate the confidence interval for the mean of the predicted probabilities for each geographic region.

In the abovementioned, I have sample data. Should the calculation of the confidence interval for the mean of predicted probabilities be different if I instead have data for all individuals in the country? (Some argue that it is still relevant to talk about sampling error in this situation since one could view the population as a sample from some kind of super population.)

Plotting and estimating difference across time

$
0
0
Say my hypothesis is: has the income gap between whites and blacks decreased over time?
I start with a simple analysis of raw data, I can tabulate the mean wages for each group by each year and compute the difference. I can also test whether this difference is significant using t-test, each time limiting the test for the year analyzed.

The issues are:
1. Plotting this requires some data "destruction" with table, replace OR some sort of collapsing. I would have loved to avoid this. Have a plot where the x axis is time, the y axis is the the difference in wages between groups. Any way to achieve this?

2. t-testing for each year separately cannot answer the question whether the difference decreases or widens across time. say in 1990 the difference is 1.8 and this is statically significant. in 1991 the difference is 1.799 and it is also statistically significant. but is the difference between 1.8 & 1.799 statistically significant? disjoint t-test cannot provide an answer obviously.

Code example:

Code:
clear all
webuse nlswork, clear
drop if race == 3

*estimating the difference in means*
bysort year: ttest ln_wage, by(race)

*graphing the difference*
collapse ln_wage, by(race year)
reshape wide ln_wage, i(year) j(race)
rename ln_wage1 ln_wage_white
rename ln_wage2 ln_wage_black
gen diff = ln_wage_white - ln_wage_black

twoway line diff year, yline(0) ylabel(-0.2(0.1)0.2)

Stock returns daily to weekly

$
0
0
Dear All

I need some help with the stock returns. I would like to calculate weekly stock returns from the daily returns data. However, I need the value only for those weeks in which the stock was traded and have data for. For instance, following is the daily returns data:

input long StockCode float(YEAR month day) double dailyreturn
1 2005 1 4 -.010622
1 2005 1 5 -.009202
1 2005 1 6 .009288
1 2005 1 7 -.001534
1 2005 1 10 .012289
1 2005 1 12 -.009105
1 2005 1 13 .006126
1 2005 1 14 -.010654
1 2005 1 17 -.038462
1 2005 1 18 -.0048
1 2005 1 19 -.009646
1 2005 1 20 -.024351
1 2005 1 21 .066556
1 2005 1 24 .00936
1 2005 1 25 -.02473
1 2005 1 26 -.001585
1 2005 1 27 -.025397
1 2005 1 28 .004886
1 2005 1 31 -.017828
1 2005 2 1 .00495
1 2005 2 2 .050903
1 2005 2 3 -.014063
1 2005 2 4 .045959
1 2005 2 16 -.001515
1 2005 2 17 -.004552
1 2005 2 18 .006098
1 2005 2 21 .016667
1 2005 2 22 .004471
1 2005 2 23 -.010386
1 2005 2 24 -.005997
1 2005 2 25 .001508
1 2005 2 28 -.024096
end


The format I require is below:

input byte StockCode str7 TradingWeek double return
1 "2005-09" .006061
1 "2005-08" 0
1 "2005-06" .069692
1 "2005-05" -.037441
1 "2005-04" -.013846
1 "2005-03" -.001536
1 "2005-02" -.01214
end


Would highly appreciate if I can get some help with the above command.

Regards

Yahya

Technological problem about data

$
0
0
I have succeeded to calculate quarterly unemployment rate every year with egen function. But I don't know how to convert current data to the desired forms, that means the whole column.

Could you give some hints? I stuck here.

Thank you!


Testing

$
0
0
TREAT FT_PT AGE
1 1 32
1 1 43
1 1 23
1 1 55
1 1 33
1 1 23
1 1 56
0 1 34
0 2 54
0 2 40

Testing the layout of dataset

Probit with variable that predicts failure perfectly

$
0
0
Dear Statalist,

I have a dataset with similar properties as the following:

TREAT GENDER AGE
1 1 32
1 1 43
1 1 23
1 1 55
1 1 33
1 1 23
1 1 56
0 1 34
0 2 54
0 2 40

I am running a probit with TREAT as the dependent variable. In this case, GENDER can take on 2 values - 1 or 2, but all the obs with GENDER=2 are untreated.

I tried running a probit followed by a predict
Code:
probit TREAT i.GENDER AGE
predict double score
summ score
However, no score is generated and the log is appended below

note: 1.GENDER != 1 predicts failure perfectly
1.GENDER dropped and 2 obs not used

note: 2.GENDER omitted because of collinearity
Iteration 0: log likelihood = -3.0141613
Iteration 1: log likelihood = -2.9598644
Iteration 2: log likelihood = -2.9592964
Iteration 3: log likelihood = -2.9592962

Probit regression Number of obs = 8
LR chi2(1) = 0.11
Prob > chi2 = 0.7405
Log likelihood = -2.9592962 Pseudo R2 = 0.0182

------------------------------------------------------------------------------
TREAT | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
2.GENDER | 0 (empty)
AGE | .0177287 .0551661 0.32 0.748 -.0903949 .1258524
_cons | .5144595 2.008838 0.26 0.798 -3.422791 4.45171
------------------------------------------------------------------------------

. predict double score
(option pr assumed; Pr(TREAT))
(10 missing values generated)

. summ score

Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
score | 0

By changing the reference category, the score can be generated for observations with GENDER = 1
Code:
probit TREAT ib2.GENDER AGE
Does anyone know why will changing the base cause the output to be different in this case and what will be the appropriate solution when running a dataset like this?

Thanks in advance.

Revised version of rct_minim available on SSC

$
0
0
Thanks to Kit Baum a revised version of rct_minim is now available from SSC.

rct_minim implements the Pocock and Simon (1) method of randomising treatments in a clinical trial, balancing for covariate patterns.

The new version (2.2.1) fixes a bug whereby a misplaced comment character ("*") in the code would prevent rct_minim using the current date and time to set the seed. If the user exited and reinvoked Stata between randomisations, the same (Stata default) seed would be generated and treatment allocation would not vary among subjects. This has been fixed.

The new version also explicitly sets the seed using the KISS32 random number generator rather than the MT64 generator introduced in Stata Release 14.

Finally an extra subject counter is displayed with the treatment allocated if the showdetail() option is specified.

My thanks to Ben Leiby and Nooreen Dabbish at TJU, Philadelphia for alerting me to the bug.

In Stata, type :
Code:
ssc describe rct_minim
(1) Pocock, S.J. and Simon, R. [1975]. Sequential treatment assignment with balancing for prognostic factors in the controlled clinical trial. Biometrics 31, 103-115.

conditional find in the data

$
0
0
i have x1 and x2 variables
how can i find the raw number of the data where x1 is missing while x2 is not missing?

Transform panel data into a time series

$
0
0
Dears,

I have a panel data of countries and years. I need to make it look like a time series. I want to get a new variable id e.g. 11, 12 (first country first year, first country second year). Could you please help me get that in stata?

Regards,

Arshad

compare bèta's of two xi regressions (interaction terms/dummy variables) gives error

$
0
0
Dear all,

I would like to compare some beta's from different regressions with dummy variables.

The first regression is: xi: regress TE5 i.ETF i.ETF|VOL i.ETF|SKEW i.ETF|KURT i.ETF|DIV i.ETF|PD
The second regression is: xi: regress TE5 i.Market i.Market|VOL i.Market|SKEW i.Market|KURT i.Market|DIV i.Market|PD

Then, I would like to compare the slope coefficient from _IETFxVOL8 and _IMarxVOL2.

I have tried the following commands:
xi: regress TE5 i.ETF i.ETF|VOL i.ETF|SKEW i.ETF|KURT i.ETF|DIV i.ETF|PD
estimate store _IETFxVOL8
xi: regress TE5 i.Market i.Market|VOL i.Market|SKEW i.Market|KURT i.Market|DIV i.Market|PD
estimate store _IMarxVOL2
suest _IETFxVOL8 _IMarxVOL2
ttest _IETFxVOL8 == _IMarxVOL2

But then Stata cannot find the variable _IETFxVOL8 anymore (it is empty)

Then, I have tried to do the following commands:
xi: regress TE5 i.ETF i.ETF|VOL i.ETF|SKEW i.ETF|KURT i.ETF|DIV i.ETF|PD
save "blabla.dta"
Then I run the second regression:
xi: regress TE5 i.Market i.Market|VOL i.Market|SKEW i.Market|KURT i.Market|DIV i.Market|PD

Then, I use the following command:
append using "blabla.dta"

Then, I try: ttest _IETFxVOL8 == _IMarxVOL2

But it gives the error r(2000).

Hence, _IETFxVOL8 is empty.

Which commands do I have to run to make _IETFxVOL8 unempty?

Thank you in advance




Propensity Score Matching with Longitudinal Data

$
0
0
Dear Statalists,

I try to use the user-written -psmatch2- for a PSM analysis in a longitudinal data setting.

I have five years of data from different companies resulting in a strongly balanced panel data.
Each company is assigned an unique ID and is either assigned to the treated group (dummy variable which equals 1 if a company is located in a certain country) or to the untreated group (=0).

I'd like to use the PSM to achieve a better fit between the treated and untreated group.
Is it possible to cluster the company ID in the PSM so that it regards the matching to the IDs and within the IDs the years available?
Otherwise it generates a matching between different years of different companies.

Thank you very much in advance,
Thilo



How to save slope/beta coefficients from one regression and use after second regression?

$
0
0
Hey all,

I have to compare a beta coefficient from one regression with interaction terms (factor-variables) to a beta coefficient from another regression with interaction terms. How can I save the beta coefficients from the first regression?

My do-file look as follows:
clear all
set more off
import excel "\\studfiles.campus.uvt.nl\files\home\home07\u 1246 790\Master\THESIS\Data\TOTAL REGRESSION DATA.xlsx", sheet("Sheet1") firstrow

fvset base none ETF Market

/* Option 1: Per ETF */
regress TE5 i.ETF##c.VOL i.ETF##c.SKEW i.ETF##c.KURT i.ETF##c.DIV i.ETF##c.PD
save "\\studfiles.campus.uvt.nl\files\home\home07\u 1246 790\Master\THESIS\Performance Measurement\voorbeeld.dta"
regress TE5 i.Market##c.VOL i.Market##c.SKEW i.Market##c.KURT i.Market##c.DIV i.Market##c.PD
append using "\\studfiles.campus.uvt.nl\files\home\home07\u 1246 790\Master\THESIS\Performance Measurement\voorbeeld.dta"
test i28.ETF#c.PD == i2.Market#c.PD

And the error is as follows:
. test i28.ETF#c.PD == i2.Market#c.PD
variable ETF not found
r(111);


Is there a way I can save my slope coefficient from the first regression (i.e. regress TE5 i.ETF##c.VOL i.ETF##c.SKEW i.ETF##c.KURT i.ETF##c.DIV i.ETF##c.PD)

Thank you in advance


Bar graph for categorical variables

$
0
0
Hi,

I am struggling with how to plot a bar graph with x-axis have three categories, for each category it represents the coding for variable g1a_1 (1 No, 2 Neutral, 3 Yes) by group (0 control, 1 intervention), so it will be a total of 6 vertical bars and each of the adjacent two bars represent the "no", "neutral", "yes" responses for control and intervention respectively. Also, on the x-axis, the bar for control and intervention slightly overlaps with each other (bargap(-30)). Y-axis shows the percentage. On top of each bar there is the count for the g1a_1 = 1, g1a_1 = 2, g1a_1 = 3 in control and intervention group separately.

I checked Stata manual, didn't find the information for my case. I also found a user-write command -fbar-, but it doesn't allow -bargap()- option.
Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input float(g1a_1 g1b_1 g1c_1 g1d_1 g1e_1 group)
3 3 3 3 1 1
1 1 1 1 1 1
1 1 1 1 1 0
1 3 3 2 3 1
1 1 1 1 1 1
1 1 1 1 1 1
1 1 1 1 1 1
1 1 1 1 1 1
1 1 1 1 1 0
1 1 1 1 1 1
1 1 1 1 1 1
1 1 1 1 1 0
3 1 1 2 2 1
1 1 1 1 1 1
1 1 2 2 2 0
1 1 1 1 3 1
1 1 1 1 1 0
1 1 1 1 1 0
3 2 2 1 1 1
1 2 3 1 1 1
3 1 3 3 2 0
3 3 3 3 3 1
1 1 1 1 1 0
1 1 1 1 1 0
1 1 1 1 1 0
1 1 1 1 1 1
2 1 1 1 1 0
1 2 3 1 2 0
1 1 3 1 1 0
1 1 1 1 1 1
1 1 1 1 1 1
3 3 3 3 3 1
1 3 1 3 1 1
1 1 1 1 1 1
2 2 2 2 1 0
1 1 1 1 1 1
1 1 1 1 1 0
1 1 1 2 2 1
2 2 1 3 1 0
1 1 1 1 1 1
1 1 1 1 2 1
1 1 1 1 1 0
1 1 1 1 1 0
1 1 1 1 1 0
1 1 1 1 1 0
1 1 1 1 1 0
1 1 1 1 1 0
1 1 1 1 1 1
3 3 2 1 1 1
1 1 1 1 1 1
1 1 1 1 1 1
1 1 1 1 1 1
1 1 2 1 1 0
1 3 3 1 1 0
1 1 1 1 1 0
1 1 1 1 1 1
1 1 1 1 1 0
1 1 1 1 1 1
3 3 2 1 1 1
1 1 1 1 1 1
3 3 1 1 1 1
3 3 3 1 1 1
1 1 1 1 1 0
1 1 1 1 1 1
1 1 1 1 1 0
1 1 1 1 1 1
1 1 1 1 1 1
3 3 1 3 1 1
3 3 2 1 1 1
1 1 2 1 1 0
1 1 1 1 1 0
2 1 3 2 1 1
1 1 1 1 1 0
1 1 1 1 1 1
1 1 1 1 1 1
1 1 1 1 1 0
1 1 1 1 1 1
3 1 1 1 1 1
1 1 1 2 2 1
3 3 2 3 2 1
1 1 1 1 1 0
1 1 2 1 1 0
1 1 1 1 1 0
1 1 1 1 1 0
1 1 1 2 2 0
1 1 3 1 1 0
1 1 1 1 1 1
1 1 1 3 2 1
1 1 1 1 1 1
1 1 1 1 1 0
1 1 1 1 1 0
1 1 1 1 1 0
1 1 1 1 1 0
3 3 3 1 1 1
1 1 2 3 1 0
1 1 1 1 1 1
1 1 1 1 1 0
1 1 1 1 1 0
1 1 1 1 1 1
1 1 1 1 1 1
end
label values g1a_1 common_label
label values g1b_1 common_label
label values g1c_1 common_label
label values g1d_1 common_label
label values g1e_1 common_label
label values group group
label def group 0 "Control", modify
label def group 1 "Intervention", modify
Thanks in advance for your thoughts and sharing the code.

Regards,
Mengmeng

Testing whether to include a squared term

$
0
0
Hi,

I am using a panel dataset.
vote is my dependent variable: 1 if the respondent voted in an annual leadership election, and 0 otherwise (so I am using nonlinear methods).
My independent variables include marital status, gender, age etc.

I then run my regression with only age and age^2 as control variables:

Code:
xtprobit vote c.age c.age#c.age, re vce(robust)
I then conduct the test to see whether age^2 should be included, because I suspect there may be a U-shaped or inverse U-shaped relationship with voting (e.g. very young and very old people may be more or less likely to vote than middle-aged people, in a non-linear relationship).

Code:
test age c.age#c.age

 ( 1)  [vote]age= 0
 ( 2)  [vote]c.age#c.age= 0

           chi2(  2) =    4.34
         Prob > chi2 =    0.1141
With this result, does this suggest that including age^2 is insignificant, and that perhaps I should only include age?

I believe this is the appropriate test to see the significant of the squared term, although please could you advise me if I'm mistaken?

Thank you
Viewing all 65052 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>