Quantcast
Channel: Statalist
Viewing all 65514 articles
Browse latest View live

tab by order of frequency across multiple variables

$
0
0
Hi there
If I want to tab values by order of frequency for one variable, I simply do this:
tab var1, sort

But what if I want to tab across multiple variables var1 to var7?
One way might be to restructure my data from wide to long, so that I have var1 to var7 listed as observations.
Can anyone suggest a nice alternative that doesn't involve having to restructure my data?

Any thoughts much appreciated!

With thanks
Tim

Oil and Gold Prices Time Series

$
0
0
Im doing a project on the relationship of oil and gold prices over the period 1980 - 2006 and just wanted to ask what tests i should run, for example: Dickey fuller and phillips perron for stationarity, ARIMA etc.

My aim is to analyse any relationship between the two and see if there's any correlation as in if one price goes up, what happens to the other

Any help would be much appreciated

thanks

Robust estimators for ANOVA?

$
0
0
I want to compare questionnaire scores across three groups of participants, however, the questionnaire scores are positively skewed. I've been advised to use a robust estimator instead of transforming the data, but there doesn't seem to be an option for this with the oneway or anova command? Does anybody have advice on how I might do this?

Many Thanks!

Problems importing XML files

Warning: derivative missing; try rescaling variable

$
0
0
Hi All,

I desperately need help. I have been trying to sort out the problem for the past week. I have been trying to run marginal effects after a Bivariate Probit model, my data is panel. I get the following error massage:


warning: derivative missing; try rescaling variable bill_amount


warning: derivative missing; try rescaling variable carbon_emm


warning: derivative missing; try rescaling variable _cons


warning: derivative missing; try rescaling variable carbon_emm


warning: derivative missing; try rescaling variable employed


warning: derivative missing; try rescaling variable carbon_emm


Marginal effects after biprobit
y = Pr(wtp_re20=1,wtp_re50=1) (predict)
= .00044697
------------------------------------------------------------------------------
variable | dy/dx Std. Err. z P>|z| [ 95% C.I. ] X
---------+--------------------------------------------------------------------
bill_a~t | 2.59e-07 . . . . . 57.9548
carbon~m | .0015463 . . . . . .069837
re_kno~e*| .0026403 .00122 2.16 0.031 .000248 .005033 .037862
avail_~p*| -.0001679 .00023 -0.72 0.469 -.000623 .000287 .009511
medica~p*| .0020233 .0016 1.26 0.207 -.001117 .005164 .008605
gender*| .00042 .0003 1.41 0.157 -.000162 .001002 .296196
age | -.0000394 . . . . . 35.4783
hh_size | 2.23e-06 . . . . . 3.8
kids_u18*| .0002573 .00022 1.15 0.248 -.000179 .000694 .568116
educ_y~s | -.0000708 . . . . . 13.7174
loginc~e | -.0001237 . . . . . 11.7852
employed*| -.0001205 .0004 -0.30 0.761 -.000896 .000655 .681159
student*| -.0002307 .00032 -0.73 0.467 -.000852 .000391 .052174
selfem~d*| .0003823 .00064 0.60 0.551 -.000876 .00164 .153623
retired*| -.0007175 .00029 -2.45 0.014 -.001292 -.000143 .024638
------------------------------------------------------------------------------
(*) dy/dx is for discrete change of dummy variable from 0 to 1


What can be the solution to this?

Regards,
Phindile

Marginal effects after GMM estimation

$
0
0
Dear All,
How can one obtain Marginal Effects after GMM estimation?

Thanks,
Dapel

Data manipulation in Stata.

$
0
0
Dear All,

I have a sample data set (please see the attached file, Array Stata 14) with the first five observations being:


. list in 1/5

+-------------------------------------------------------+
| id ymd Changtyp ceo |
|--------------------------------------------------------|
1. | 1 1995-08-21 1 Name_1_4 |
2. | 1 1995-08-21 2 Name_1_20 |
3. | 1 1997-07-22 1 Name_1_20 |
4. | 1 1997-07-22 2 Name_1_6 |
5. | 1 2003-10-16 1 Name_1_6 |
+-------------------------------------------------------+

where id denotes the firm, ymd denotes the date that CEO left (assumed) office when Changtyp = 1 (2), and ceo is the name (I created, the original names were in Chinese characters).
I'd like to have a new format (new variables) like:

+--------------------------------------------------------------------------------------------------+
| id ymd Changtyp ceo start end |
|---------------------------------------------------------------------------------------------------|
1. | 1 1995-08-21 1 Name_1_4 . 1995-08-21
2. | 1 1995-08-21 2 Name_1_20 1995-08-21 1997-07-22
3. | 1 1997-07-22 2 Name_1_6 1997-07-22 2003-10-16
4. | 1 2003-10-16 2 Name_1_2 2003-10-16 2004-12-14



. | 1 2012-11-21 2 Name_1_22 2012-10-16 2015-12-31 (say)
. | 2 1999-02-08 1 Name_2_13 . 1999-02-08
. | 2 1999-02-08 2 Name_2_13 1999-02-08 2001-02-15
| 2 2001-02-15 2 Name_2_23 2001-02-15 2015-12-31 (say)

+-------------------------------------------------------+
and so on.

Any suggestion is highly appreciated!

How to find the right model of ARCH and GARCH

$
0
0
Dear All,

To measure volatility in the stock returns of KSE 100 index I am applying ARCH GARCH models as it fulfilled the pre conditions of presence of volatility clustering and arch effects. I have tried GARCH (1, 1) and GARCH (2,0) under all three distributions but still the residuals show white noise. How can i find a better fit model of GARCH to predict volatilities?

Question regarding the proper use of dummy variables in panel data fixed effects regression

$
0
0
Hello,

I am not sure how to properly use dummy variables in my panel data regression:
I have 7 variables (A B C D E F G), where A is the dependent variable and the rest are independent variables.
The panel data is set up with respect to G (which is a group id taking the values 1 to 3), and a date variable (H).

I am running the following two commands and I am not sure which one is the correct one for the interaction between the dummy variable and the independent variable:

Code:
set more off

eststo clear


eststo: xtreg A B C D E F c.F#i.G, fe cluster(G)
eststo: xtreg A B C D E c.F#i.G, fe cluster(G)


esttab using panel_data_dummy_variable_test.tex, label star(* 0.10 ** 0.05 *** 0.01) stats(r2 N) replace booktabs ///
   title(panel data dummy variable test\label{tab1})
The results I get are:

Code:
\begin{table}[htbp]\centering
\def\sym#1{\ifmmode^{#1}\else\(^{#1}\)\fi}
\caption{panel data dummy variable test\label{tab1}}
\begin{tabular}{l*{2}{c}}
\toprule
                    &\multicolumn{1}{c}{(1)}&\multicolumn{1}{c}{(2)}\\
                    &\multicolumn{1}{c}{A}&\multicolumn{1}{c}{A}\\
\midrule
B                   &      -0.576\sym{***}&      -0.576\sym{***}\\
                    &    (-10.45)         &    (-10.45)         \\
\addlinespace
C                   &     -0.0468\sym{***}&     -0.0468\sym{***}\\
                    &    (-18.13)         &    (-18.13)         \\
\addlinespace
D                   &      -2.214\sym{*}  &      -2.214\sym{*}  \\
                    &     (-3.84)         &     (-3.84)         \\
\addlinespace
E                   &     -0.0345         &     -0.0345         \\
                    &     (-1.92)         &     (-1.92)         \\
\addlinespace
F                   &     -0.0302\sym{***}&                     \\
                    &    (-28.65)         &                     \\
\addlinespace
G=1 $\times$ F      &           0         &     -0.0302\sym{***}\\
                    &         (.)         &    (-28.65)         \\
\addlinespace
G=2 $\times$ F      &      0.0109\sym{**} &     -0.0193\sym{**} \\
                    &      (4.80)         &     (-9.63)         \\
\addlinespace
G=3 $\times$ F      &      0.0132\sym{***}&     -0.0170\sym{***}\\
                    &      (9.97)         &    (-12.85)         \\
\addlinespace
Constant            &       0.211\sym{***}&       0.211\sym{***}\\
                    &     (10.94)         &     (10.94)         \\
\midrule
r2                  &       0.152         &       0.152         \\
N                   &        7197         &        7197         \\
\bottomrule
\multicolumn{3}{l}{\footnotesize \textit{t} statistics in parentheses}\\
\multicolumn{3}{l}{\footnotesize \sym{*} \(p<0.10\), \sym{**} \(p<0.05\), \sym{***} \(p<0.01\)}\\
\end{tabular}
\end{table}

G=1 $\times$ F gets dropped from the 1st model, so I suspect the 2nd model is correct? The results are very different for G=2 and G=3 since in the case of model 1 the coefficients are positive, while for the model 2 the coefficients is negative. What is going on? What is the proper use of dummy variables?

Thank you in advance

P.S. I uploaded a txt file with the variables and the values (I cannot attach it because it exceeds the maximum file size of statalist).
Please find it here : http://filebin.ca/2aYGTr8AuX04/variables.txt

multiple margins post or estpost margins

$
0
0
Hello Statalisters,

i'm investigating the effect of wealth on charitable donations. For this i'am running a heckman-regression on donations, a dummy for donations yes/no, wealth and a number of control- and selection-variables.

Now, i'd like to report margins or some other effects and i'd prefer to retrieve the margins via esttab/estout, first because it's simple, secondly because it looks good.

As far as my understanding goes, esttab retrieves the parameters values, summary-statistics, etc, stata stores in eststo. Margins usually does not store it's results there (or anywhere permanently, as far as i know), except if one uses the post-option or the estpost margins command.

The problem is that by specifying these options i seem to override all the input from my heckman-estimation, so that i can not use margins, post consecutively but rather only ones after the heckman-estimation. At least, regardless of the command i use (estpost margins or margins, post) when i want to use it a second time without reestimating the heckman, i get the following error-message:

margins cannot work with its own posted results
r(322);
Now my question: is there any way of using multiple, consecutive margin-commands, store the results and retrieve them via esttab (or something which is as convenient)?

This is my code:

Code:
* prepare locals
#delimit ;
local wealth = c.wealth_norm##(c.wealth_norm#c.wealth_norm);
local control = ib5.wealth_perc ib0.soc_eng_2011 ib0.child_dum ib4.age_gr ib4.gross_gr
                 ib1.sex ib3.casmin_comp ib1.migback ib3.hlf0261 ib0.unempl ib1.loc1989;
local income = c.netinc_norm##(c.netinc_norm#c.netinc_norm);
local inherit = c.inherit_norm##(c.inherit_norm#c.inherit_norm);
local selection = ib0.married ib0.retired ib0.inherit_dum;

#delimit cr

        svy:heckman lgivings ib0.hvid `wealth' `income' `inherit' ///
                     `control' `weight', ///
                 select(givings_dum = ib0.hvid `wealth' `income' `inherit' ///
                        `control' `selection')
        eststo heckman

        * effect of employing people on donations *
        estpost margins hlf0261, pred(ycon)
        eststo marginhlfdon
        
*** The last command is not working, stata returns error-message as described ***

        * effect of employing people on probability to donate donations *
        estpost margins hlf0261, pred(psel)
        eststo marginhlfprob
My results. Array




To get the results, i used estimates replay, where no information on svy is given:

Number of Strata: 1
Number of PSUs: 2121

Many thanks in advance,

Caspar

Nonlinear Diff-in-Diff implemention in Stata (binary dependent variable)?

$
0
0
Hello,

I am running a difference-in-difference analysis using Stata. In this case, my dependent variable Y is binary (i.e. zero or one).

So far, I just ran a linear diff-in-diff which has - several drawbacks (common trend assumption unlikely, likely to predict outside unit interval etc.).

Apparently, just running a Probit/Logit regression does not identify the causal effect (average treatment effect on treated (ATET)) (see for example Lechner (2011) for a discussion).

I came across several suggestions - in particular Blundell and Costas Dias (2006)'s suggestion on estimating the ATET effect in a diff-in-diff framework where the dependent variable is binary.

Is anyone aware of a Stata implementation of Blundell and Costas Dias (2006)?

Further suggestions on non-linear diff-in-diffs in Stata are welcome.

Thanks in advance.
Ruediger

Mixed logit estimation using Arne Risa Hole stata command

$
0
0
Dear all,

My name is Babatope Akinyemi! I am not a Stata expert, and I treasure your invaluable contributions to other researchers' work on this forum.
I am currently estimating willingness to pay for attributes of community-based ecotourism in adjacent community to national parks among tourists visiting national parks in South Africa.
Here is a brief explanation of my research: My dataset was collected from tourists visiting the national parks using a discrete choice experiment questionnaire. I have four attributes (village accommodation(yes/no) craft market (yes/no) village tours (yes/no) and price ($0, $10, $20, $30, $40, $50)). Each tourist were presented with seven (7) choice sets with each choice set having three (3) alternatives (i.e. 2 improved alternatives and a status quo). Meaning that each tourist made seven choices (panel data). Following Arne Risa Hole Mixed logit modelling in stata: An overview paper (find attached), I implemented a mixed logit to account for preference heterogeneity in tourists choices.

I however got stuck on page 14 not knowing how to generate the matrix start = b[1,1..7],0,0,0,0,b[1,8],0,0,0,b[1,9],0,0,b[1,10],0,b[1,11] presented here. I tried using this value for my analysis and I got error message "initial vector: matrix must be dimension 10".
I know Francesco Chirico encountered similar issue when conducting Survival Analysis in June 2014 and Stephen Jenkins and Mike Lacy comments and advice were very handy and helpful. Francesco comments on how the problem was solved is not detailed enough to assist in my analysis and that's why I am asking for help.
If I know how to derive matrix start = b[1,1..7],0,0,0,0,b[1,8],0,0,0,b[1,9],0,0,b[1,10],0,b[1,11] of this kind for dimension 10 matrix for my dataset I should be able to continue with the analysis.

Many many thanks and again many thanks for your help.

Kind Regards:
Babatope Akinyemi

What would be the best way to analyze this data? I think I'm missing something really simple

$
0
0
Hey everyone,
I have a dataset of two kinds individuals, older than 85 yrs and younger than 50 (variable 'Old', coded 1 if old, 0 if not). For each of these patients, I have 3 lengths of the same muscle (variable len1 len2 len3). Please see example of the data setup below.

I want to 1) assess the variability in the lengths of each patient (so variability between len1 len2 len3 for each patient), and then 2) compare this variability between 'old' and 'not old' populations.

ID Old Len1 Len2 Len3
1 1 20.0 21.0 22.0
2 0 21.0 19.0 18.0
3 0 22.0 18.0 17.0
4 1 15.0 19.0 19.0
5 1 19.0 20.0 17.0

So essentially I want to compare the difference in len1, 2 and 3 to the difference in len1, 2 and 3.

I've tried reshaping to long format, and thought I would run
Code:
bysort ID: ttest len, by(old)
but it says "1 group found, 2 required".

Am I missing something?
Could someone advise a better way to do this please?

Thank you!
Mohammad

margins after cmp with complex data

$
0
0
Hi Statalisters,
I am using Stata 11.1 in a Mac to estimate marginal effects after an ordered probit with complex data and the cmp command. To account for the complex data, I am incorporated the option
vce(unconditional) to use the linearized variance matrix.
When I define the subpopulation with an "if command", the margins can not be computed. Here the code :

Code:
svyset conglome [pw=factor07], strata(estrato) 
cmp (educ2 = status sex age age2 father_miss mother_miss edu_father_nor edu_mother_nor), ind($cmp_oprobit) svy subpop(if age>=25) qui
margins, dydx(*) predict(outcome(#1)) subpop(if age>=25) vce(unconditional)
I get :
invalid subpop() option
However, when I define the subpopulation with an indicator variable, the margins are computed:

Code:
gen x=1 if age>=25
replace x=0 if x==.
cmp (educ2 = status sex age age2 father_miss mother_miss edu_father_nor edu_mother_nor), ind($cmp_oprobit) svy subpop(x) qui  
margins, dydx(*) predict(outcome(#1)) subpop(x) vce(unconditional) 
While the second way gives the result, I wonder if I am doing something wrong in the first case. I appreciate very much your help. Celia P.

Itsa commande

$
0
0
Hello everyone,

I'm facing a problem regarding intervention analysis.

I'm studying the effect of the increase of two new cost-sharing policies on the number of outpatient visits in the last 20 years.
So, I used an intervention analysis with itsa on Stata: itsa outvisit, single trperiod(199. 201.) lag (1) fig posttrend.
My model was: Yt=β0 + β1 * timet + β2 * intervention1t + β3 * time after intervention1t + β4 * intervention2t + β5 * time after intervention2t + et with t=199. to 201. Until now, no problem.

But next, I would like to create a model which calculate a global average increase rate for all policies, which that sets the beta β2 and β4 as an unified coefficient β2. i.e, this model: Yt=β0 + β1 * timet + β2 * (intervention 1 and intervention 2)t + β3 * time after (intervention1 + intervention 2)t + et with t=199. to 201.

Is it possible to do that with itsa? Does anyone have an idea?

Thank you very much

Peter

Sorting stocks into portfolios by historical volatility of returns

$
0
0
Dear Statalisters,

I am having some issues with finding the right command for sorting the data into groups (portfolios) by historical volatility (standard deviation) of returns. I am hoping someone will be able to help me.

I have the data on weekly prices ("price") of all the constituents of S&P 600 from 31 Dec 2004 - 29 Jan 2016. From these prices I calculated log returns ("return"). The time variable is called "date".

What I need to do:
1.) Calculate volatility of returns from 31 Dec 2004 - 30 December 2005 for each stock. Then move 1 month forward, so 28 Jan 2005 - 27 Jan 2006 (monthly rebalancing) and again calculate volatility of returns, and so on. So, what I need to do is calculate past 1-year volatility of weekly returns.
2.) Then I need to sort the stocks ("id") into 5 portfolios based on this historical volatility. The portfolios are rebalanced each month, so there will be different stocks belonging to each portfolio every period.

I have googled a lot of commands, but I am not quite sure which one to use. I am guessing I need to have a loop over all my stocks. I was thinking about foreach, but I am not sure whether it is the most appropriate considering that my data is in the long format (i.e., I only have one variable "return").

I thought about using the commands rolling, mvsumm, asrol, but I am not quite sure how I should write my loop.

Any help would be much appreciated!

Laura

Issue with merging two databases

$
0
0
Hello,

I have an issue merging two databases.

Database 1 contains information on public procurements contracts (e.g. €-value of contract, activity of contract, winning company, etc.)
Database 2 contains company information (company name, number of employees, turnover, balance sheet, etc.)
Evidently, I want to merge both databases based on the name of the company.
The problem lies in the fact that the company name is not always consistent in both databases.

Examples
Database1 Database2
A&M Motors A & M Motors
Architectuurbureau Filips
Architectuurbureau Filips BVBA
Aforest Belgium Aforest
Air Liquide Medical NV Air Liquide Medical SA
Do you guys have any suggestions on how to deal with this?

Kind regards,

Willem

Graph Bar -&gt; adding a &quot;total&quot; bar automatically, without generating a new category

$
0
0
Dear all, I would like to add a "total" bar within my bar chart

There are two variables - and five different categories.

Now i would like to obtain - without creating another category - the total over all categories, so I can compare the variables directly in my bar chart.

Is there any option I can add in my code?

Thank you for any hints!

graph bar (median) a_t w_t if sex==1,
over(erwst_paar, relabel(1 "a" 2 "b" 3 "c" 4 "d" 5 "e") )
ytitle("Stunden (Median)")
yscale(range(0/50))
ylabel(0(5)50, labsize(small))
legend(position(6) label(1 "actual time") label(2 "wanted time")
size(vsmall) cols(3) colfirst )

Recreating DHS under five mortality statistics

$
0
0
I'm trying to estimate U5 mortality rates in for DHS surveys in India (98,06) and Kenya (98,08), and then do sub-population analysis of child mortality in slum areas. To start, I want to recreate the DHS estimates for country level under five mortality rates.

I'm using this code in Stata and am getting close, within 1-2 people, of the published DHS rate, but not exact matches. Can anyone help me see what I am missing?

Very appreciative of any suggestions.

I've created this code for the BR (birth rate) DHS file following http://siteresources.worldbank.org/INTPAH/Resources/Publicat ions/459843-1195594469249/HealthEquityCh3.pdf and http://legacy.measuredhs.com/help/datasets/

In the code below, b3 is age (cmc), and v008 is date of survey (cmc). b5 indicates if the child is alive, and b7 indicates age at death. All ages are in months. v005 is the proxy used for sampling weight suggested by DHS, since pweights aren't available in ltable.

gen hypage=(v008-b3)
gen survivelength=.
replace survivelength=hypage
replace survivelength=b7 if b5==0
gen dead=(b5==0)

ltable survivelength dead [fw=v005] if hypage <60 , int(0,1,3,6,12,24,36,48,60) failure

cross-posted here: http://userforum.dhsprogram.com/inde...bc4b969af1a2bf

How to calculate duration

$
0
0
Hello,

I want to calculate # of months with insurance.
The table below is for one person's records including start date and termination date.
I'd like to create # of months with insurance each year.
mem_effdate mem_termdate
1-Jun-10 31-May-11
1-Jun-11 30-Jun-11
1-Jul-11 31-May-12
1-Nov-11 30-Jun-12
1-Jun-12 31-May-13
1-Jul-12 30-Jun-13
1-Jun-13 31-May-15
1-Jul-13 30-Jun-14
1-Jul-14 31-Jan-43
1-Jun-15 31-May-16

For this case, the final data that I want to have looks like below:
insu_2010 insu_2011 insu_2012 insu_2013 insu_2014 insu_2015
6 12 12 12 12 12

The red-colored year, 2043, can be interpreted as 2017.
Would you help me how to calculate # of months each year?

This is the 1st time I used dataex saved for Stata version 11 or 12. Array
Let me know if something is wrong with the data.

I deeply appreciate your help in advance,

Soyeon
Viewing all 65514 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>