tab by order of frequency across multiple variables

March 16, 2016, 9:34 am

≪ Previous: Using ml command to maximise user-defined (log) likelihood function

Hi there
If I want to tab values by order of frequency for one variable, I simply do this:
tab var1, sort

But what if I want to tab across multiple variables var1 to var7?
One way might be to restructure my data from wide to long, so that I have var1 to var7 listed as observations.
Can anyone suggest a nice alternative that doesn't involve having to restructure my data?

Any thoughts much appreciated!

With thanks
Tim

↧

Oil and Gold Prices Time Series

March 16, 2016, 10:06 am

≫ Next: Robust estimators for ANOVA?

≪ Previous: tab by order of frequency across multiple variables

Im doing a project on the relationship of oil and gold prices over the period 1980 - 2006 and just wanted to ask what tests i should run, for example: Dickey fuller and phillips perron for stationarity, ARIMA etc.

My aim is to analyse any relationship between the two and see if there's any correlation as in if one price goes up, what happens to the other

Any help would be much appreciated

thanks

↧

Robust estimators for ANOVA?

March 16, 2016, 10:41 am

≫ Next: Problems importing XML files

≪ Previous: Oil and Gold Prices Time Series

I want to compare questionnaire scores across three groups of participants, however, the questionnaire scores are positively skewed. I've been advised to use a robust estimator instead of transforming the data, but there doesn't seem to be an option for this with the oneway or anova command? Does anybody have advice on how I might do this?

Many Thanks!

↧

Problems importing XML files

March 16, 2016, 10:50 am

≫ Next: Warning: derivative missing; try rescaling variable

≪ Previous: Robust estimators for ANOVA?

Could somebody please help me out. I'm trying to important an DDI (XML file) into STATA: http://catalog.ihsn.org/index.php/ca...tab=study-desc

When I try to important it says: unrecognizable XML doctype

Any ideas on what I could do to solve this problem? Thanks!

↧

Warning: derivative missing; try rescaling variable

March 16, 2016, 11:00 am

≫ Next: Marginal effects after GMM estimation

≪ Previous: Problems importing XML files

Hi All,

I desperately need help. I have been trying to sort out the problem for the past week. I have been trying to run marginal effects after a Bivariate Probit model, my data is panel. I get the following error massage:

warning: derivative missing; try rescaling variable bill_amount

warning: derivative missing; try rescaling variable carbon_emm

warning: derivative missing; try rescaling variable _cons

warning: derivative missing; try rescaling variable carbon_emm

warning: derivative missing; try rescaling variable employed

warning: derivative missing; try rescaling variable carbon_emm

Marginal effects after biprobit
y = Pr(wtp_re20=1,wtp_re50=1) (predict)
= .00044697
------------------------------------------------------------------------------
variable | dy/dx Std. Err. z P>|z| [ 95% C.I. ] X
---------+--------------------------------------------------------------------
bill_a~t | 2.59e-07 . . . . . 57.9548
carbon~m | .0015463 . . . . . .069837
re_kno~e*| .0026403 .00122 2.16 0.031 .000248 .005033 .037862
avail_~p*| -.0001679 .00023 -0.72 0.469 -.000623 .000287 .009511
medica~p*| .0020233 .0016 1.26 0.207 -.001117 .005164 .008605
gender*| .00042 .0003 1.41 0.157 -.000162 .001002 .296196
age | -.0000394 . . . . . 35.4783
hh_size | 2.23e-06 . . . . . 3.8
kids_u18*| .0002573 .00022 1.15 0.248 -.000179 .000694 .568116
educ_y~s | -.0000708 . . . . . 13.7174
loginc~e | -.0001237 . . . . . 11.7852
employed*| -.0001205 .0004 -0.30 0.761 -.000896 .000655 .681159
student*| -.0002307 .00032 -0.73 0.467 -.000852 .000391 .052174
selfem~d*| .0003823 .00064 0.60 0.551 -.000876 .00164 .153623
retired*| -.0007175 .00029 -2.45 0.014 -.001292 -.000143 .024638
------------------------------------------------------------------------------
(*) dy/dx is for discrete change of dummy variable from 0 to 1

What can be the solution to this?

Regards,
Phindile

↧

Marginal effects after GMM estimation

March 16, 2016, 11:15 am

≫ Next: Data manipulation in Stata.

≪ Previous: Warning: derivative missing; try rescaling variable

Dear All,
How can one obtain Marginal Effects after GMM estimation?

Thanks,
Dapel

↧

Data manipulation in Stata.

March 17, 2016, 2:36 am

≫ Next: How to find the right model of ARCH and GARCH

≪ Previous: Marginal effects after GMM estimation

Dear All,

I have a sample data set (please see the attached file, Array Stata 14) with the first five observations being:

. list in 1/5

+-------------------------------------------------------+
| id ymd Changtyp ceo |
|--------------------------------------------------------|
1. | 1 1995-08-21 1 Name_1_4 |
2. | 1 1995-08-21 2 Name_1_20 |
3. | 1 1997-07-22 1 Name_1_20 |
4. | 1 1997-07-22 2 Name_1_6 |
5. | 1 2003-10-16 1 Name_1_6 |
+-------------------------------------------------------+

where id denotes the firm, ymd denotes the date that CEO left (assumed) office when Changtyp = 1 (2), and ceo is the name (I created, the original names were in Chinese characters).
I'd like to have a new format (new variables) like:

+--------------------------------------------------------------------------------------------------+
| id ymd Changtyp ceo start end |
|---------------------------------------------------------------------------------------------------|
1. | 1 1995-08-21 1 Name_1_4 . 1995-08-21
2. | 1 1995-08-21 2 Name_1_20 1995-08-21 1997-07-22
3. | 1 1997-07-22 2 Name_1_6 1997-07-22 2003-10-16
4. | 1 2003-10-16 2 Name_1_2 2003-10-16 2004-12-14

. | 1 2012-11-21 2 Name_1_22 2012-10-16 2015-12-31 (say)
. | 2 1999-02-08 1 Name_2_13 . 1999-02-08
. | 2 1999-02-08 2 Name_2_13 1999-02-08 2001-02-15
| 2 2001-02-15 2 Name_2_23 2001-02-15 2015-12-31 (say)

+-------------------------------------------------------+
and so on.

Any suggestion is highly appreciated!

↧

How to find the right model of ARCH and GARCH

March 17, 2016, 3:03 am

≫ Next: Question regarding the proper use of dummy variables in panel data fixed effects regression

≪ Previous: Data manipulation in Stata.

Dear All,

To measure volatility in the stock returns of KSE 100 index I am applying ARCH GARCH models as it fulfilled the pre conditions of presence of volatility clustering and arch effects. I have tried GARCH (1, 1) and GARCH (2,0) under all three distributions but still the residuals show white noise. How can i find a better fit model of GARCH to predict volatilities?

↧

Question regarding the proper use of dummy variables in panel data fixed effects regression

March 17, 2016, 3:53 am

≫ Next: multiple margins post or estpost margins

≪ Previous: How to find the right model of ARCH and GARCH

Hello,

I am not sure how to properly use dummy variables in my panel data regression:
I have 7 variables (A B C D E F G), where A is the dependent variable and the rest are independent variables.
The panel data is set up with respect to G (which is a group id taking the values 1 to 3), and a date variable (H).

I am running the following two commands and I am not sure which one is the correct one for the interaction between the dummy variable and the independent variable:

Code:

set more off

eststo clear


eststo: xtreg A B C D E F c.F#i.G, fe cluster(G)
eststo: xtreg A B C D E c.F#i.G, fe cluster(G)


esttab using panel_data_dummy_variable_test.tex, label star(* 0.10 ** 0.05 *** 0.01) stats(r2 N) replace booktabs ///
   title(panel data dummy variable test\label{tab1})

The results I get are:

Code:

\begin{table}[htbp]\centering
\def\sym#1{\ifmmode^{#1}\else\(^{#1}\)\fi}
\caption{panel data dummy variable test\label{tab1}}
\begin{tabular}{l*{2}{c}}
\toprule
                    &\multicolumn{1}{c}{(1)}&\multicolumn{1}{c}{(2)}\\
                    &\multicolumn{1}{c}{A}&\multicolumn{1}{c}{A}\\
\midrule
B                   &      -0.576\sym{***}&      -0.576\sym{***}\\
                    &    (-10.45)         &    (-10.45)         \\
\addlinespace
C                   &     -0.0468\sym{***}&     -0.0468\sym{***}\\
                    &    (-18.13)         &    (-18.13)         \\
\addlinespace
D                   &      -2.214\sym{*}  &      -2.214\sym{*}  \\
                    &     (-3.84)         &     (-3.84)         \\
\addlinespace
E                   &     -0.0345         &     -0.0345         \\
                    &     (-1.92)         &     (-1.92)         \\
\addlinespace
F                   &     -0.0302\sym{***}&                     \\
                    &    (-28.65)         &                     \\
\addlinespace
G=1 $\times$ F      &           0         &     -0.0302\sym{***}\\
                    &         (.)         &    (-28.65)         \\
\addlinespace
G=2 $\times$ F      &      0.0109\sym{**} &     -0.0193\sym{**} \\
                    &      (4.80)         &     (-9.63)         \\
\addlinespace
G=3 $\times$ F      &      0.0132\sym{***}&     -0.0170\sym{***}\\
                    &      (9.97)         &    (-12.85)         \\
\addlinespace
Constant            &       0.211\sym{***}&       0.211\sym{***}\\
                    &     (10.94)         &     (10.94)         \\
\midrule
r2                  &       0.152         &       0.152         \\
N                   &        7197         &        7197         \\
\bottomrule
\multicolumn{3}{l}{\footnotesize \textit{t} statistics in parentheses}\\
\multicolumn{3}{l}{\footnotesize \sym{*} \(p<0.10\), \sym{**} \(p<0.05\), \sym{***} \(p<0.01\)}\\
\end{tabular}
\end{table}

G=1 $\times$ F gets dropped from the 1st model, so I suspect the 2nd model is correct? The results are very different for G=2 and G=3 since in the case of model 1 the coefficients are positive, while for the model 2 the coefficients is negative. What is going on? What is the proper use of dummy variables?

Thank you in advance

P.S. I uploaded a txt file with the variables and the values (I cannot attach it because it exceeds the maximum file size of statalist).
Please find it here : http://filebin.ca/2aYGTr8AuX04/variables.txt

↧

multiple margins post or estpost margins

March 17, 2016, 4:14 am

≫ Next: Nonlinear Diff-in-Diff implemention in Stata (binary dependent variable)?

≪ Previous: Question regarding the proper use of dummy variables in panel data fixed effects regression

Hello Statalisters,

i'm investigating the effect of wealth on charitable donations. For this i'am running a heckman-regression on donations, a dummy for donations yes/no, wealth and a number of control- and selection-variables.

Now, i'd like to report margins or some other effects and i'd prefer to retrieve the margins via esttab/estout, first because it's simple, secondly because it looks good.

As far as my understanding goes, esttab retrieves the parameters values, summary-statistics, etc, stata stores in eststo. Margins usually does not store it's results there (or anywhere permanently, as far as i know), except if one uses the post-option or the estpost margins command.

The problem is that by specifying these options i seem to override all the input from my heckman-estimation, so that i can not use margins, post consecutively but rather only ones after the heckman-estimation. At least, regardless of the command i use (estpost margins or margins, post) when i want to use it a second time without reestimating the heckman, i get the following error-message:

margins cannot work with its own posted results
r(322);

Now my question: is there any way of using multiple, consecutive margin-commands, store the results and retrieve them via esttab (or something which is as convenient)?

This is my code:

Code:

* prepare locals
#delimit ;
local wealth = c.wealth_norm##(c.wealth_norm#c.wealth_norm);
local control = ib5.wealth_perc ib0.soc_eng_2011 ib0.child_dum ib4.age_gr ib4.gross_gr
                 ib1.sex ib3.casmin_comp ib1.migback ib3.hlf0261 ib0.unempl ib1.loc1989;
local income = c.netinc_norm##(c.netinc_norm#c.netinc_norm);
local inherit = c.inherit_norm##(c.inherit_norm#c.inherit_norm);
local selection = ib0.married ib0.retired ib0.inherit_dum;

#delimit cr

        svy:heckman lgivings ib0.hvid `wealth' `income' `inherit' ///
                     `control' `weight', ///
                 select(givings_dum = ib0.hvid `wealth' `income' `inherit' ///
                        `control' `selection')
        eststo heckman

        * effect of employing people on donations *
        estpost margins hlf0261, pred(ycon)
        eststo marginhlfdon
        
*** The last command is not working, stata returns error-message as described ***

        * effect of employing people on probability to donate donations *
        estpost margins hlf0261, pred(psel)
        eststo marginhlfprob

My results. Array

To get the results, i used estimates replay, where no information on svy is given:

Number of Strata: 1
Number of PSUs: 2121

Many thanks in advance,

Caspar

↧

Nonlinear Diff-in-Diff implemention in Stata (binary dependent variable)?

March 17, 2016, 5:26 am

≫ Next: Mixed logit estimation using Arne Risa Hole stata command

≪ Previous: multiple margins post or estpost margins

Hello,

I am running a difference-in-difference analysis using Stata. In this case, my dependent variable Y is binary (i.e. zero or one).

So far, I just ran a linear diff-in-diff which has - several drawbacks (common trend assumption unlikely, likely to predict outside unit interval etc.).

Apparently, just running a Probit/Logit regression does not identify the causal effect (average treatment effect on treated (ATET)) (see for example Lechner (2011) for a discussion).

I came across several suggestions - in particular Blundell and Costas Dias (2006)'s suggestion on estimating the ATET effect in a diff-in-diff framework where the dependent variable is binary.

Is anyone aware of a Stata implementation of Blundell and Costas Dias (2006)?

Further suggestions on non-linear diff-in-diffs in Stata are welcome.

Thanks in advance.
Ruediger

↧

Mixed logit estimation using Arne Risa Hole stata command

March 17, 2016, 6:43 am

≫ Next: What would be the best way to analyze this data? I think I'm missing something really simple

≪ Previous: Nonlinear Diff-in-Diff implemention in Stata (binary dependent variable)?

Dear all,

My name is Babatope Akinyemi! I am not a Stata expert, and I treasure your invaluable contributions to other researchers' work on this forum.
I am currently estimating willingness to pay for attributes of community-based ecotourism in adjacent community to national parks among tourists visiting national parks in South Africa.
Here is a brief explanation of my research: My dataset was collected from tourists visiting the national parks using a discrete choice experiment questionnaire. I have four attributes (village accommodation(yes/no) craft market (yes/no) village tours (yes/no) and price ($0, $10, $20, $30, $40, $50)). Each tourist were presented with seven (7) choice sets with each choice set having three (3) alternatives (i.e. 2 improved alternatives and a status quo). Meaning that each tourist made seven choices (panel data). Following Arne Risa Hole Mixed logit modelling in stata: An overview paper (find attached), I implemented a mixed logit to account for preference heterogeneity in tourists choices.

I however got stuck on page 14 not knowing how to generate the matrix start = b[1,1..7],0,0,0,0,b[1,8],0,0,0,b[1,9],0,0,b[1,10],0,b[1,11] presented here. I tried using this value for my analysis and I got error message "initial vector: matrix must be dimension 10".
I know Francesco Chirico encountered similar issue when conducting Survival Analysis in June 2014 and Stephen Jenkins and Mike Lacy comments and advice were very handy and helpful. Francesco comments on how the problem was solved is not detailed enough to assist in my analysis and that's why I am asking for help.
If I know how to derive matrix start = b[1,1..7],0,0,0,0,b[1,8],0,0,0,b[1,9],0,0,b[1,10],0,b[1,11] of this kind for dimension 10 matrix for my dataset I should be able to continue with the analysis.

Many many thanks and again many thanks for your help.

Kind Regards:
Babatope Akinyemi

↧

What would be the best way to analyze this data? I think I'm missing something really simple

March 17, 2016, 7:31 am

≫ Next: margins after cmp with complex data

≪ Previous: Mixed logit estimation using Arne Risa Hole stata command

Hey everyone,
I have a dataset of two kinds individuals, older than 85 yrs and younger than 50 (variable 'Old', coded 1 if old, 0 if not). For each of these patients, I have 3 lengths of the same muscle (variable len1 len2 len3). Please see example of the data setup below.

I want to 1) assess the variability in the lengths of each patient (so variability between len1 len2 len3 for each patient), and then 2) compare this variability between 'old' and 'not old' populations.

ID	Old	Len1	Len2	Len3
1	1	20.0	21.0	22.0
2	0	21.0	19.0	18.0
3	0	22.0	18.0	17.0
4	1	15.0	19.0	19.0
5	1	19.0	20.0	17.0

So essentially I want to compare the difference in len1, 2 and 3 to the difference in len1, 2 and 3.

I've tried reshaping to long format, and thought I would run

Code:

bysort ID: ttest len, by(old)

but it says "1 group found, 2 required".

Am I missing something?
Could someone advise a better way to do this please?

Thank you!
Mohammad

↧

margins after cmp with complex data

March 17, 2016, 7:35 am

≫ Next: Itsa commande

≪ Previous: What would be the best way to analyze this data? I think I'm missing something really simple

Hi Statalisters,
I am using Stata 11.1 in a Mac to estimate marginal effects after an ordered probit with complex data and the cmp command. To account for the complex data, I am incorporated the option
vce(unconditional) to use the linearized variance matrix.
When I define the subpopulation with an "if command", the margins can not be computed. Here the code :

Code:

svyset conglome [pw=factor07], strata(estrato) 
cmp (educ2 = status sex age age2 father_miss mother_miss edu_father_nor edu_mother_nor), ind($cmp_oprobit) svy subpop(if age>=25) qui
margins, dydx(*) predict(outcome(#1)) subpop(if age>=25) vce(unconditional)

I get :

invalid subpop() option

However, when I define the subpopulation with an indicator variable, the margins are computed:

Code:

gen x=1 if age>=25
replace x=0 if x==.
cmp (educ2 = status sex age age2 father_miss mother_miss edu_father_nor edu_mother_nor), ind($cmp_oprobit) svy subpop(x) qui  
margins, dydx(*) predict(outcome(#1)) subpop(x) vce(unconditional)

While the second way gives the result, I wonder if I am doing something wrong in the first case. I appreciate very much your help. Celia P.

↧

Itsa commande

March 17, 2016, 7:53 am

≫ Next: Sorting stocks into portfolios by historical volatility of returns

≪ Previous: margins after cmp with complex data

Hello everyone,

I'm facing a problem regarding intervention analysis.

I'm studying the effect of the increase of two new cost-sharing policies on the number of outpatient visits in the last 20 years.
So, I used an intervention analysis with itsa on Stata: itsa outvisit, single trperiod(199. 201.) lag (1) fig posttrend.
My model was: Yt=β0 + β1 * timet + β2 * intervention1t + β3 * time after intervention1t + β4 * intervention2t + β5 * time after intervention2t + et with t=199. to 201. Until now, no problem.

But next, I would like to create a model which calculate a global average increase rate for all policies, which that sets the beta β2 and β4 as an unified coefficient β2. i.e, this model: Yt=β0 + β1 * timet + β2 * (intervention 1 and intervention 2)t + β3 * time after (intervention1 + intervention 2)t + et with t=199. to 201.

Is it possible to do that with itsa? Does anyone have an idea?

Thank you very much

Peter

↧

Sorting stocks into portfolios by historical volatility of returns

March 17, 2016, 7:57 am

≫ Next: Issue with merging two databases

≪ Previous: Itsa commande

Dear Statalisters,

I am having some issues with finding the right command for sorting the data into groups (portfolios) by historical volatility (standard deviation) of returns. I am hoping someone will be able to help me.

I have the data on weekly prices ("price") of all the constituents of S&P 600 from 31 Dec 2004 - 29 Jan 2016. From these prices I calculated log returns ("return"). The time variable is called "date".

What I need to do:
1.) Calculate volatility of returns from 31 Dec 2004 - 30 December 2005 for each stock. Then move 1 month forward, so 28 Jan 2005 - 27 Jan 2006 (monthly rebalancing) and again calculate volatility of returns, and so on. So, what I need to do is calculate past 1-year volatility of weekly returns.
2.) Then I need to sort the stocks ("id") into 5 portfolios based on this historical volatility. The portfolios are rebalanced each month, so there will be different stocks belonging to each portfolio every period.

I have googled a lot of commands, but I am not quite sure which one to use. I am guessing I need to have a loop over all my stocks. I was thinking about foreach, but I am not sure whether it is the most appropriate considering that my data is in the long format (i.e., I only have one variable "return").

I thought about using the commands rolling, mvsumm, asrol, but I am not quite sure how I should write my loop.

Any help would be much appreciated!

Laura

↧

Issue with merging two databases

March 17, 2016, 8:48 am

≫ Next: Graph Bar -> adding a "total" bar automatically, without generating a new category

≪ Previous: Sorting stocks into portfolios by historical volatility of returns

Hello,

I have an issue merging two databases.

Database 1 contains information on public procurements contracts (e.g. €-value of contract, activity of contract, winning company, etc.)
Database 2 contains company information (company name, number of employees, turnover, balance sheet, etc.)
Evidently, I want to merge both databases based on the name of the company.
The problem lies in the fact that the company name is not always consistent in both databases.

Examples

Database1

Database2

A&M Motors

A & M Motors

Architectuurbureau Filips

Architectuurbureau Filips BVBA

Aforest Belgium

Aforest

Air Liquide Medical NV

Air Liquide Medical SA

Do you guys have any suggestions on how to deal with this?

Kind regards,

Willem

↧

Graph Bar -> adding a "total" bar automatically, without generating a new category

March 17, 2016, 9:09 am

≫ Next: Recreating DHS under five mortality statistics

≪ Previous: Issue with merging two databases

Dear all, I would like to add a "total" bar within my bar chart

There are two variables - and five different categories.

Now i would like to obtain - without creating another category - the total over all categories, so I can compare the variables directly in my bar chart.

Is there any option I can add in my code?

Thank you for any hints!

graph bar (median) a_t w_t if sex==1,
over(erwst_paar, relabel(1 "a" 2 "b" 3 "c" 4 "d" 5 "e") )
ytitle("Stunden (Median)")
yscale(range(0/50))
ylabel(0(5)50, labsize(small))
legend(position(6) label(1 "actual time") label(2 "wanted time")
size(vsmall) cols(3) colfirst )

↧

Recreating DHS under five mortality statistics

March 17, 2016, 9:11 am

≫ Next: How to calculate duration

≪ Previous: Graph Bar -> adding a "total" bar automatically, without generating a new category

I'm trying to estimate U5 mortality rates in for DHS surveys in India (98,06) and Kenya (98,08), and then do sub-population analysis of child mortality in slum areas. To start, I want to recreate the DHS estimates for country level under five mortality rates.

I'm using this code in Stata and am getting close, within 1-2 people, of the published DHS rate, but not exact matches. Can anyone help me see what I am missing?

Very appreciative of any suggestions.

I've created this code for the BR (birth rate) DHS file following http://siteresources.worldbank.org/INTPAH/Resources/Publicat ions/459843-1195594469249/HealthEquityCh3.pdf and http://legacy.measuredhs.com/help/datasets/

In the code below, b3 is age (cmc), and v008 is date of survey (cmc). b5 indicates if the child is alive, and b7 indicates age at death. All ages are in months. v005 is the proxy used for sampling weight suggested by DHS, since pweights aren't available in ltable.

gen hypage=(v008-b3)
gen survivelength=.
replace survivelength=hypage
replace survivelength=b7 if b5==0
gen dead=(b5==0)

ltable survivelength dead [fw=v005] if hypage <60 , int(0,1,3,6,12,24,36,48,60) failure

cross-posted here: http://userforum.dhsprogram.com/inde...bc4b969af1a2bf

↧

How to calculate duration

March 17, 2016, 9:12 am

≫ Next: Reduced form regression

≪ Previous: Recreating DHS under five mortality statistics

Hello,

I want to calculate # of months with insurance.
The table below is for one person's records including start date and termination date.
I'd like to create # of months with insurance each year.

mem_effdate	mem_termdate
1-Jun-10	31-May-11
1-Jun-11	30-Jun-11
1-Jul-11	31-May-12
1-Nov-11	30-Jun-12
1-Jun-12	31-May-13
1-Jul-12	30-Jun-13
1-Jun-13	31-May-15
1-Jul-13	30-Jun-14
1-Jul-14	31-Jan-43
1-Jun-15	31-May-16

For this case, the final data that I want to have looks like below:

insu_2010	insu_2011	insu_2012	insu_2013	insu_2014	insu_2015
6	12	12	12	12	12

The red-colored year, 2043, can be interpreted as 2017.
Would you help me how to calculate # of months each year?

This is the 1st time I used dataex saved for Stata version 11 or 12. Array
Let me know if something is wrong with the data.

I deeply appreciate your help in advance,

Soyeon

↧