Question about inclusion of the reference group

August 5, 2018, 5:28 am

≫ Next: Merging panel data with time specific indicators

≪ Previous: Replacing egen variable with mean

Hey all,

The question I have is not so much a Stata-specific question, but I did not know where to ask elsewhere with this empirical kind of question (if you have suggestions about where to find such a forum, feel free to do so!).

I'm working on an assignment and I have used a binary logistic regression model. Now I'm writing down the model for estimation, but I was wondering if I should include reference categories in the description of my regression equation. For instance, I have innovation (m), a categorical variable (1 = less than 25% (reference group); 2 = 25-50%; 3= 75-90%; 4= more than 90%), as one of my independent variables, but I'm not sure whether I should include the reference group in describing my model. Likewise, I have an independent variable age (k) with 5 age categories (1 = age 18-25 (reference group); 2 = 25-45; 3= 45-65; 4= older than 65).

Right now, I've written down the following in my chapter about the empirical model: "m is the index of four categories of innovation (m = 1, 2, 3, 4) and k is the index of five categories of age (k = 1, 2, 3, 4, 5)". These two represent option A, so to speak. I would like to know whether this is correct or if, because the first innovation and age category are my the reference groups, I should write something like this: "m is the index of three categories of innovation (m = 2, 3, 4) and k is the index of four categories of age (2, 3, 4, 5)". These represent option B.

Now my question is: should I include my reference group in the description of my model (like in option A) or should I remove it (like in option B) for both innovation and age?

Hopefully my small problem is clear and someone can help me, thanks in advance!

Tim

↧

Merging panel data with time specific indicators

August 5, 2018, 6:04 am

≫ Next: AR(2) tests in dynamic panel data regressions

≪ Previous: Question about inclusion of the reference group

Hello,

i have two datasets that should be merged.
The first dataset is panel data consisting of a unique firm identifier (ID) a date variable (fiscal year end, e.g. 31march2010) and a panel variable, year, 2009 (for 31march2010, respectively). Furthermore each row consists of a lot of variables about all firms in the dataset and an second identifier name. This second identifier enables me to merge with different datasets which use the same identfier. The problem with this identifier is that it changes when firms change names and is reused after time.

The second dataset (linking table) consists of various date variables were first time name was distributed to a company and last day name was used. The next row then shows the new name for the next time period This dataset also consits of a unique firm identifier firmid, that is used by my third dataset.

Now i search for a code that merges firmid to ID based on the variable name, if date lies in the interval of first and last day a firm using this identifier (name).

panel Data

ID	date	year	name
123	31march2009	2008	ZZAGFR
123	31march2010	2009	ZZAGFR
123	31march2011	2010	REDFTZ

Linking table

date	name	first day	last day	firm id
01may2005	ZZAGFR	01may2005	25april2009	5558877
26april2009	REDFTZ	26april2009	.	5558877
and so on for thousands of firms

The required commands should yield the following desired dataset.

ID	date	year	name	firm id
123	31march2009	2008	ZZAGFR	5558877
123	31march2010	2009	ZZAGFR	5558877
123	31march2011	2010	REDFTZ	5558877

With this dataset i am able to merge with third dataset, which is my overall goal.

I would be really happy if someone is able to help me.

↧

AR(2) tests in dynamic panel data regressions

August 5, 2018, 7:06 am

≫ Next: Making a time-band in dummy variables

≪ Previous: Merging panel data with time specific indicators

Hi,

I am using a dynamic panel regression using xtabond2 where y=lagged y + control variables
I ran the regressions then I changed the y variable for robustness . The AR(2) test remained the same
Then for additional robustness I took a subsample where i removed teh 2007-2009 financial crisis years . The AR(2) test gave very different result.
I am wondering if it is normal to end up with the same AR(2) when I change the dependent variable or is it a sign that i did a programming mistake?
also is it normal that AR(2) drops from 70.3 to 12.9 if i omit three years from the sample?

↧

Making a time-band in dummy variables

August 5, 2018, 8:13 am

≫ Next: Common Support Graphs in Kmatch (PSM)

≪ Previous: AR(2) tests in dynamic panel data regressions

Hi, I am relatively new to stata. I have a question on how to make a time-band for dummy variable. So, i have a monthly data on currency crisis and it's a dummy variable (it's 1 if crisis occurs on that month). However, I want to set the value as 1 from t-6 until t+6 of the crisis month to get rid of the ambiguity of start and end period of the crisis. Can someone help me with this issue?

Many thanks in advance!

↧

Common Support Graphs in Kmatch (PSM)

August 6, 2018, 3:37 am

≫ Next: Three-stage procedure with binary endogenous independent variable using 2SLS

≪ Previous: Making a time-band in dummy variables

Dear all,
I work with Ben Jann's kmatch ado and got a question concerning the common support graphs. Basically I do not understand what his graph is supposed to show in contrast to the ones I know from other ados or researchers. Consider the testing code below:

Code:

ssc install kmatch
ssc install moremata
ssc install kdens

version 15
sysuse nlsw88, clear
kmatch ps union c.ttl_exp i.south c.grade (wage), gen(diag)

kmatch cdensity, name(regular, replace)        //regular Command in kmatch
twoway (kdensity _KM_ps if diag == 0) (kdensity _KM_ps if diag == 1), name(myown, replace)    //own version
graph combine regular myown

I am used to common support diagnostics like this here: https://www.statalist.org/forums/fil...43&type=medium
You see there are two groups, treatment and control. I do not get why in kmatch the groups are called total, unmatched and matched. I create my own version called "myown". You see, this clearly deviates from "regular". Basically, I do not understand how "regular" is a common support graph and how it should be interpreted.

↧

Three-stage procedure with binary endogenous independent variable using 2SLS

August 6, 2018, 4:40 am

≫ Next: Descriptive statistics on word

≪ Previous: Common Support Graphs in Kmatch (PSM)

Dear statalist,

I am trying to use probit to get more efficient estimates while evading the forbidden regression using 2SLS. I have an original regression with an endogenous binary independent variable and continuous dependent variable (Y). I follow Adams et al. (2009) using a three-stage procedure https://www.sciencedirect.com/scienc...27539808000388, also mentioned in Wooldridge (2010):

1. Use probit to regress the endogenous variable on the instrument(s) and exogeneous variables

Code:

probit X1 Xi-Xn Z i.year i.ffi, vce(robust), where X1 is the endogenous dummy, Xi-Xn are the exogenous variables and Z is the instrument

Code:

predict shat, pr

2. use the predicted values from the previous step in an OLS first-stage together with the exogenous (but without the instrumental) variables

Code:

regress Y X1 Xi-Xn shat i.year i.ffi, robust

3. Do the second stage as usual

Code:

ivregress 2sls Y X1 Xi-Xn (X1 = shat), vce(robust) first

My question is whether I correctly implement step 3 into STATA? Do I need to use ivregress here or is there a more suitable command?

↧

Descriptive statistics on word

August 6, 2018, 5:37 am

≫ Next: Labeling x-axis of grouped bar graph

≪ Previous: Three-stage procedure with binary endogenous independent variable using 2SLS

Good morning,
I would like to transfer the following table from stata to word for my thesis however, I do not know the command to do so,

Code:

. tabstat riskavers, statistics(mean sd) by(wave)

Summary for variables: riskavers
     by categories of: wave 

     wave |      mean        sd
----------+--------------------
        1 |  1.748762  .5516792
        2 |  1.719628  .5518228
        3 |  1.739416  .5440446
        4 |  1.634218  .5705484
        5 |  1.653533  .5531243
        6 |  1.713028  .5553211
----------+--------------------
    Total |  1.695483  .5571508
-------------------------------

Thank you so much

↧

Labeling x-axis of grouped bar graph

August 6, 2018, 6:01 am

≫ Next: (How to) set page margins with -putdocx-?

≪ Previous: Descriptive statistics on word

Hi,

I'm having trouble labelling a bar graph in Stata. The current output looks like this:
Array

Produced by the following code:

Code:

//Label variables
lab var CAAR_1 "1"
lab var CAAR_2 "2"
lab var CAAR_3 "3"
lab var CAAR_4 "4"
lab var CAAR_5 "5"

graph bar CAAR_1 CAAR_2 CAAR_3 CAAR_4 CAAR_5, over(group) ascategory ///
title("Abnormal returns per group") ytitle("Size-adjusted return (%)")

So, the data is grouped by three categories which I have labelled and these are displayed correctly in the bottom. The bars represent different years, which is why I have labelled those 1 to 5.
However, Stata still displays 'mean of CAAR_1", etc. rather than the label "1" I have given that variable.

I have checked the manual and these forums and tried almost all the options. One suggested solution I have found is to use tabplot (ssc install) as it will always number bars. But it seems to me a simpler solution should exist. But all the options I try either effect the Y-label or the label of over().

↧

(How to) set page margins with -putdocx-?

August 6, 2018, 6:31 am

≫ Next: Confidence intervals for weighted means: bootstrap, delta-method, or something else?

≪ Previous: Labeling x-axis of grouped bar graph

Dear all,

I'm almost done with my first -putdocx- document (yes!) but I was wondering whether it is possible to set the page margins of the Word document from within -putdocx-? This should be a -putdocx begin- option, as is the case for -putpdf-, but -putdocx begin- does not support the -margin- option and I can't find a workaround... I have tried changing the default margins of the "Normal.dot" template inside Word, but -putdocx- overrides them... Any idea?

Thanks in advance!
Laurent

↧

Confidence intervals for weighted means: bootstrap, delta-method, or something else?

August 6, 2018, 6:36 am

≫ Next: Dificulties understanding some smoothing code using scalar command

≪ Previous: (How to) set page margins with -putdocx-?

Dear all,

Hopefully, you can help me calculating confidence intervals for weighted means using post-stratification weights.

I am working with cross-sectional individual-level survey data in Stata 15 on Windows.

I have calculated post-stratification weights using Nicholas Winter's -survwgt rake- . Now, I would like to calculate weighted means by age and sex based on these post-stratification weights, including confidence intervals. Actually, I have calculated several sets of post-stratification weights (let's call them poststratificationweight_1 and poststratificationweight_2) and would like to compare means based on them. Hence, I would prefer a method that does not require for the data to be collapsed.

To calculate the weighted means, I simply used the _gwteman package (ssc inst _gwtmean) and typed:

Code:

egen x_mean_1 = wtmean(x), by(sex age) weight(poststratificationweight_1)
egen x_mean_2 = wtmean(x), by(sex age) weight(poststratificationweight_2)

Yet I do not know how to calculate the responding confidence intervals. Would it be possible to bootstrap even though I want to use weights? If so, how could I do that in Stata? Also, I read that one could use the delta-method. I read up on this method, but don't know how to apply it to my specific problem.

Thank you for your help!

Best,
Stephanie

↧

Dificulties understanding some smoothing code using scalar command

August 6, 2018, 7:04 am

≫ Next: Obtaining Survival Point-estimates using stcox and streg

≪ Previous: Confidence intervals for weighted means: bootstrap, delta-method, or something else?

Hi,

I'm still new to Stata and am having some trouble following some code that looks as follows. I've included my interpretation in a step by step manner below and hope that you can let me know if I've understood it properly.

Code:

gen smooth = 0
quietly summarize mass
scalar p1 = r(p1)
scalar p99 = r(p99)
scalar interval = 10
forvalues i = `=scalar(p1)'(`=scalar(interval)')`=scalar(p99)' {
scalar begin = `i'
scalar end = `i'+`=scalar(interval)'
if `i' == `=scalar(p1)' {replace smooth = `i' if mass <= `=scalar(end)'}
else {replace smooth = `i' if mass> `=scalar(begin)' & mass <= `=scalar(end)'}
}

From what I can gather, this loop generates begin and end for every i at an absolute interval of 10, between the 1st and 99th percentiles of mass.
At each iteration of the loop, Stata replaces smooth with i if mass falls within the current interval (i - i+10).
So for example if i=50 and mass=53, smooth would be replaced by i=50. This also holds for mass=60, right?
Effectively this means that at i=50, observations for mass=51 to mass=60 will receive smooth=50, correct?
This is close to, but not exactly the same as, rounding down to the nearest 10, yes?
Is there a statistical term for this type of smoothing, (e.g. smoothing by rounding down)?

Thanks, I look forward to your feedback.

↧

Obtaining Survival Point-estimates using stcox and streg

August 6, 2018, 7:14 am

≫ Next: Calculate standard deviation of daily stock returns per firm-year in panel?

≪ Previous: Dificulties understanding some smoothing code using scalar command

Hello,

I am using Stata 14.1. I have a survival analysis and I am trying to use a cox proportional hazards model to estimate the time to death after transplant for rld_tx_type = 1, 2 and 3, in a model with multiple covariates. Using stcox I can obtain hazard ratios, but I would like to obtain adjusted point estimates for survival at specific time points like 365 days, 1095 days, 1825 days.

Using streg and weibull distribution parametric survival model followed by the margins command, I can obtain point estimates for survival at 365 days.

I have read numerous similar entries on this topic, that discuss the inability to estimate baseline hazards using stcox b/c cox regression doesnt estimate the baseline hazard. Is this why I am unable to obtain point estimates using stcox?

*Code

stset gtime, failure(outcome)

streg i.rld_tx_type gender_group ecmo_trr ventilator_trr i.blood_type i.year i.Ethnic bmi_analysis pTLC_ratio_copd end_match_las age_don gender_mismatch ischtime end_o2 i.lung_preference, vce(robust) dist(weibull)

streg, coeflegend

margins, expression(exp(-exp(predict(xb))*1825^exp(_b[ln_p:_cons]))) at(rld_tx_type=(1 2 3))

*Output

. stset gtime, failure(outcome)

failure event: outcome != 0 & outcome < .
obs. time interval: (0, gtime]
exit on or before: failure

------------------------------------------------------------------------------
5055 total observations
13 observations end on or before enter()
------------------------------------------------------------------------------
5042 observations remaining, representing
2212 failures in single-record/single-failure data
6507620 total analysis time at risk and under observation
at risk from t = 0
earliest observed entry t = 0
last observed exit t = 4412

.
. streg i.rld_tx_type gender_group ecmo_trr ventilator_trr i.blood_type i.year i.Ethnic bmi_analysis pTLC_ratio_copd end_match_las age_don
> gender_mismatch ischtime end_o2 i.lung_preference, vce(robust) dist(weibull)

failure _d: outcome
analysis time _t: gtime

Fitting constant-only model:

Iteration 0: log pseudolikelihood = -5568.0139
Iteration 1: log pseudolikelihood = -5556.817
Iteration 2: log pseudolikelihood = -5556.8074
Iteration 3: log pseudolikelihood = -5556.8074

Fitting full model:

Iteration 0: log pseudolikelihood = -5556.8074
Iteration 1: log pseudolikelihood = -5507.4874
Iteration 2: log pseudolikelihood = -5506.2359
Iteration 3: log pseudolikelihood = -5506.2337
Iteration 4: log pseudolikelihood = -5506.2337

Weibull regression -- log relative-hazard form

No. of subjects = 4,866 Number of obs = 4,866
No. of failures = 2,117
Time at risk = 6193954
Wald chi2(36) = 105.86
Log pseudolikelihood = -5506.2337 Prob > chi2 = 0.0000

---------------------------------------------------------------------------------
| Robust
_t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval]
----------------+----------------------------------------------------------------
rld_tx_type |
2 | .8249158 .0641938 -2.47 0.013 .7082234 .9608354
3 | .7857402 .062952 -3.01 0.003 .6715562 .9193388
|
gender_group | 1.094763 .0534039 1.86 0.063 .9949418 1.2046
ecmo_trr | 1.587534 .9326214 0.79 0.431 .501963 5.020819
ventilator_trr | 1.325516 .1665057 2.24 0.025 1.036241 1.695545
|
blood_type |
2 | .9575172 .0441988 -0.94 0.347 .8746923 1.048185
3 | .9455973 .0703752 -0.75 0.452 .8172527 1.094098
4 | .9209833 .0991832 -0.76 0.445 .7457333 1.137418
|
year_tx |
2 | .8792394 .1061766 -1.07 0.287 .6939312 1.114032
3 | .9375805 .1130025 -0.53 0.593 .7403157 1.187409
4 | .8853886 .1082717 -1.00 0.320 .6966953 1.125188
5 | .817168 .0988432 -1.67 0.095 .6446908 1.035789
6 | .7776473 .0963679 -2.03 0.042 .6099578 .991438
7 | .8771987 .1082622 -1.06 0.288 .6887226 1.117253
8 | .7459922 .0964708 -2.27 0.023 .5789726 .961193
9 | .7436866 .1000223 -2.20 0.028 .5713568 .9679937
10 | .6911968 .1026914 -2.49 0.013 .5165809 .9248367
11 | .944747 .1380387 -0.39 0.697 .7094878 1.258016
12 | .7908836 .1392182 -1.33 0.183 .5601143 1.116731
13 | .4194482 .2500005 -1.46 0.145 .1304197 1.349004
|
Ethnic |
2 | 1.044576 .0912809 0.50 0.618 .8801515 1.239718
4 | 1.11143 .2007295 0.58 0.559 .7801019 1.583482
5 | .9110327 .2200833 -0.39 0.700 .5674197 1.462728
|
bmi_analysis | 1.010092 .0055142 1.84 0.066 .9993424 1.020958
pTLC_ratio_copd | .8998795 .0441061 -2.15 0.031 .8174554 .9906143
end_match_las | .9960659 .0054117 -0.73 0.468 .9855155 1.006729
age_don | 1.000985 .0015336 0.64 0.521 .9979834 1.003995
gender_mismatch | 1.16013 .0621598 2.77 0.006 1.044477 1.288588
ischtime | .9901301 .0149002 -0.66 0.510 .9613527 1.019769
end_o2 | 1.025314 .0115466 2.22 0.026 1.002931 1.048197
|
lung_preference |
2 | .910501 .1291987 -0.66 0.509 .6894396 1.202443
3 | .9074591 .0921592 -0.96 0.339 .7436717 1.107319
4 | .847192 .0941981 -1.49 0.136 .6812992 1.053479
5 | 1.095501 .109549 0.91 0.362 .9005204 1.332699
6 | .9505574 .1139962 -0.42 0.672 .7514455 1.202428
7 | .9216207 .1356492 -0.55 0.579 .6906654 1.229806
|
_cons | .0008421 .0002809 -21.23 0.000 .000438 .001619
----------------+----------------------------------------------------------------
/ln_p | -.0939497 .0231057 -4.07 0.000 -.1392361 -.0486633
----------------+----------------------------------------------------------------
p | .9103286 .0210338 .8700226 .9525018
1/p | 1.098504 .0253817 1.049867 1.149395
---------------------------------------------------------------------------------

.
. streg, coeflegend

Weibull regression -- log relative-hazard form

No. of subjects = 4,866 Number of obs = 4,866
No. of failures = 2,117
Time at risk = 6193954
Wald chi2(36) = 105.86
Log pseudolikelihood = -5506.2337 Prob > chi2 = 0.0000

---------------------------------------------------------------------------------
_t | Coef. Legend
----------------+----------------------------------------------------------------
rld_tx_type |
2 | -.1924739 _b[_t:2.rld_tx_type]
3 | -.2411291 _b[_t:3.rld_tx_type]
|
gender_group | .0905383 _b[_t:gender_group]
ecmo_trr | .4621821 _b[_t:ecmo_trr]
ventilator_trr | .2818021 _b[_t:ventilator_trr]
|
blood_type |
2 | -.0434116 _b[_t:2.blood_type]
3 | -.0559385 _b[_t:3.blood_type]
4 | -.0823133 _b[_t:4.blood_type]
|
year_tx |
2 | -.1286981 _b[_t:2.year_tx]
3 | -.0644526 _b[_t:3.year_tx]
4 | -.1217286 _b[_t:4.year_tx]
5 | -.2019106 _b[_t:5.year_tx]
6 | -.2514822 _b[_t:6.year_tx]
7 | -.1310217 _b[_t:7.year_tx]
8 | -.2930401 _b[_t:8.year_tx]
9 | -.2961355 _b[_t:9.year_tx]
10 | -.3693308 _b[_t:10.year_tx]
11 | -.0568382 _b[_t:11.year_tx]
12 | -.2346045 _b[_t:12.year_tx]
13 | -.8688152 _b[_t:13.year_tx]
|
Ethnic |
2 | .0436113 _b[_t:2.Ethnic]
4 | .1056479 _b[_t:4.Ethnic]
5 | -.0931765 _b[_t:5.Ethnic]
|
bmi_analysis | .0100419 _b[_t:bmi_analysis]
pTLC_ratio_copd | -.1054945 _b[_t:pTLC_ratio_copd]
end_match_las | -.0039418 _b[_t:end_match_las]
age_don | .0009842 _b[_t:age_don]
gender_mismatch | .1485317 _b[_t:gender_mismatch]
ischtime | -.0099189 _b[_t:ischtime]
end_o2 | .0249992 _b[_t:end_o2]
|
lung_preference |
2 | -.0937603 _b[_t:2.lung_preference]
3 | -.0971068 _b[_t:3.lung_preference]
4 | -.165828 _b[_t:4.lung_preference]
5 | .091212 _b[_t:5.lung_preference]
6 | -.0507067 _b[_t:6.lung_preference]
7 | -.0816215 _b[_t:7.lung_preference]
|
_cons | -7.079553 _b[_t:_cons]
----------------+----------------------------------------------------------------
/ln_p | -.0939497 _b[ln_p:_cons]
----------------+----------------------------------------------------------------
p | .9103286
1/p | 1.098504
---------------------------------------------------------------------------------

.
. margins, expression(exp(-exp(predict(xb))*1825^exp(_b[ln_p:_cons]))) at(rld_tx_type=(1 2 3))

Predictive margins Number of obs = 4,866
Model VCE : Robust

Expression : exp(-exp(predict(xb))*1825^exp(_b[ln_p:_cons]))

1._at : rld_tx_type = 1

2._at : rld_tx_type = 2

3._at : rld_tx_type = 3

------------------------------------------------------------------------------
| Delta-method
| Margin Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_at |
1 | .4857455 .0222814 21.80 0.000 .4420748 .5294162
2 | .5503463 .0220442 24.97 0.000 .5071405 .593552
3 | .5659892 .0123369 45.88 0.000 .5418094 .590169
------------------------------------------------------------------------------

Thanks,

Luke

↧

Calculate standard deviation of daily stock returns per firm-year in panel?

August 6, 2018, 7:43 am

≫ Next: Passing arguments to a function within another function

≪ Previous: Obtaining Survival Point-estimates using stcox and streg

Hi Statalisters,

I could use some help calculating the annualized standard deviation of daily stock returns (total risk) for my dataset. I am fairly new to stata and don't really know one would code this.

I have a panel of CRSP daily stock return data from 2006 - 2017 for 3822 unique firms (permco), approx. 7 million observations.
My time variable = date (business date). See -dataex- below.

I have already calculated the log daily stock return (lret) using:

Code:

bysort permco (date): gen lret = (ln(prc)-ln(prc[_n-1]))

What I want is to have one observation per firm-year containing the annualized sd of daily stock returns.
So for permco 7 (APPLE COMPUTER INC) I would have the annualized sd of daily stock returns for each 2006, 2007, 2008, 2009, 2010....... 2017
and for the next firm permco 33 (MOLSON COORS BREWING CO) <-- not in dataex. I would also have the annualized sd for 2006, 2007, 2008 etc.

My assumption is that there are 252 trading days in a year. Of course this varies, some year have 252 tradings, some have 250.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input long permco str32 comnam str8 date float(lret mktrf smb hml rf)
7 "APPLE COMPUTER INC" "20060103"            .  1.59 -.26  .16 .02
7 "APPLE COMPUTER INC" "20060104"  .0029388375   .52  .43 -.14 .02
7 "APPLE COMPUTER INC" "20060105"  -.007900998  -.08  .16 -.21 .02
7 "APPLE COMPUTER INC" "20060106"    .02548593   .93  .09 -.33 .02
7 "APPLE COMPUTER INC" "20060109" -.0032819195   .36  .52 -.32 .02
7 "APPLE COMPUTER INC" "20060110"    .06132821   .09  .54  .01 .02
7 "APPLE COMPUTER INC" "20060111"    .03690636   .31 -.13 -.11 .02
7 "APPLE COMPUTER INC" "20060112"  .0046494706  -.64  .16  .08 .02
7 "APPLE COMPUTER INC" "20060113"    .01529215   .15  .21  .13 .02
7 "APPLE COMPUTER INC" "20060117"  -.010332434  -.42  -.2  .16 .02
7 "APPLE COMPUTER INC" "20060118"   -.02655657  -.47   .2   .1 .02
7 "APPLE COMPUTER INC" "20060119"   -.04278741   .75  .94 -.14 .02
7 "APPLE COMPUTER INC" "20060120"   -.03798717 -1.58  .54  .44 .02
7 "APPLE COMPUTER INC" "20060123"    .02056539   .28  .19  .34 .02
7 "APPLE COMPUTER INC" "20060124"   -.02120953   .35  .84 -.09 .02
7 "APPLE COMPUTER INC" "20060125"   -.02449542  -.24  .19 -.04 .02
7 "APPLE COMPUTER INC" "20060126"    -.0255251   .77   .6 -.01 .02
7 "APPLE COMPUTER INC" "20060127"  -.004156324   .71  -.1 -.17 .02
7 "APPLE COMPUTER INC" "20060130"    .04040543   .22 -.04  .12 .02
7 "APPLE COMPUTER INC" "20060131"   .006777013  -.18  .56  .11 .02
7 "APPLE COMPUTER INC" "20060201" -.0011926586   .08  .17  -.1 .02
7 "APPLE COMPUTER INC" "20060202"   -.04501845  -.85 -.16 -.03 .02
7 "APPLE COMPUTER INC" "20060203" -.0035151634  -.44   .2  .29 .02
7 "APPLE COMPUTER INC" "20060206"   -.06537858   .18  .25  .58 .02
7 "APPLE COMPUTER INC" "20060207"  .0044476786  -1.1 -.61 -.19 .02
7 "APPLE COMPUTER INC" "20060208"   .017741088   .68 -.28 -.16 .02
7 "APPLE COMPUTER INC" "20060209"   -.05773135  -.19 -.02  .04 .02
7 "APPLE COMPUTER INC" "20060210"   .035691082   .04  -.6  .05 .02
7 "APPLE COMPUTER INC" "20060213"   -.03939304  -.55 -.72  .25 .02
7 "APPLE COMPUTER INC" "20060214"    .04435766   .95  .21 -.08 .02
7 "APPLE COMPUTER INC" "20060215"   .023016464    .3  .27 -.47 .02
7 "APPLE COMPUTER INC" "20060216"   .019315265   .79  .23  .05 .02
7 "APPLE COMPUTER INC" "20060217" -.0039755665  -.06  .16   .2 .02
7 "APPLE COMPUTER INC" "20060221"  -.017364275  -.36 -.11  .31 .02
7 "APPLE COMPUTER INC" "20060222"   .031911507   .63 -.04  .04 .02
7 "APPLE COMPUTER INC" "20060223"   .006011066  -.34   .2 -.24 .02
7 "APPLE COMPUTER INC" "20060224"  -.004050015   .28  .39  .18 .02
7 "APPLE COMPUTER INC" "20060227"  -.006598848   .34  .06  -.4 .02
7 "APPLE COMPUTER INC" "20060228"  -.035851274  -.96 -.07  .29 .02
7 "APPLE COMPUTER INC" "20060301"   .008866991   .91  .56 -.08 .02
7 "APPLE COMPUTER INC" "20060302"   .007353535  -.04  .03  .02 .02
7 "APPLE COMPUTER INC" "20060303"   -.02752667   -.1 -.04 -.05 .02
7 "APPLE COMPUTER INC" "20060306"   -.03363677  -.82 -.26 -.14 .02
7 "APPLE COMPUTER INC" "20060307"   .012595875  -.49 -1.1  .19 .02
7 "APPLE COMPUTER INC" "20060308"   -.00985071   .03  -.2 -.05 .02
7 "APPLE COMPUTER INC" "20060309"   -.02670123  -.48  .04  .11 .02
7 "APPLE COMPUTER INC" "20060310"    -.0116427   .68  .26 -.01 .02
7 "APPLE COMPUTER INC" "20060313"    .03864843    .3  .08  .18 .02
7 "APPLE COMPUTER INC" "20060314"   .024662895   .99 -.04   .2 .02
7 "APPLE COMPUTER INC" "20060315"  -.016323783   .51  .35  .08 .02
end

I would really appreciate your help with this!

Kind regards,

Shaquille Wijngaarde

↧

Passing arguments to a function within another function

August 6, 2018, 7:52 am

≫ Next: Creating a spikeplot with labeling specific values

≪ Previous: Calculate standard deviation of daily stock returns per firm-year in panel?

Hello,

Is there a simple way to pass an *arbitrary* number of arguments to a function called within another function?

For example, I want to write a very silly function silly_f that takes a pointer to another function and the arguments (an arbitrary number) that are to be passed to that other function.

All silly_f does is to execute the other function. The thing is, I don't know it in advance how many arguments the other function takes.

(For the R users out there, I'm thinking of something similar to R's ...)

I'd like to do something along these lines (warning: illegal Mata code ahead!)

Code:

mata

real scalar myf(real scalar a, real scalar b) {
c = a + b
return(c)
}

real scalar myf2(real scalar a, real scalar b, real scalar c) {
d = a + b + c
return(d)
}

real scalar silly_f(ponter scalar f, ...) {
    (*f)(...)
}

silly_f(&myf(), 2, 4)
silly_f(&myf2(), 2, 4, 5)

end

↧

Creating a spikeplot with labeling specific values

August 6, 2018, 8:13 am

≫ Next: Ambiguous Abbreviation Error with test command

≪ Previous: Passing arguments to a function within another function

Hi,
I would like to get a spikeplot just like in the figure attached.
I need some of the spikes labelled with the value on the x axis (like 26, 36, 42,...)
I would also be greatful for the entire code to create a spikeplot like that. I could only do it with the graph editor.
Thanks for any help.

↧

Ambiguous Abbreviation Error with test command

August 6, 2018, 8:37 am

≫ Next: bysort and summary statistics

≪ Previous: Creating a spikeplot with labeling specific values

Hi, All:

After running a logistic regression, I wanted to do a Wald test for my variable female. Stata gives me a r111 and says that the variable female is not found, but it is because I am able to tabulate it. Can someone please help me figure out why my "test" command is not working?

logit binarycitations i.female i.mmale i.nopub1 i.fellow i.workadmn

Iteration 0: log likelihood = -174.02793
Iteration 1: log likelihood = -150.89347
Iteration 2: log likelihood = -149.39611
Iteration 3: log likelihood = -149.38601
Iteration 4: log likelihood = -149.38601

Logistic regression Number of obs = 297
LR chi2(5) = 49.28
Prob > chi2 = 0.0000
Log likelihood = -149.38601 Pseudo R2 = 0.1416

-------------------------------------------------------------------------------
binarycitat~s | Coef. Std. Err. z P>|z| [95% Conf. Interval]
--------------+----------------------------------------------------------------
female |
1Female | -.7626615 .3380316 -2.26 0.024 -1.425191 -.1001317
|
mmale |
1MalMent | -.7277077 1.108126 -0.66 0.511 -2.899595 1.44418
|
nopub1 |
1NoPubs | -1.405057 .4389459 -3.20 0.001 -2.265375 -.5447388
|
fellow |
1Fellow | 1.318628 .292187 4.51 0.000 .7459525 1.891304
|
workadmn |
1Admin | .6988945 .5506152 1.27 0.204 -.3802914 1.77808
_cons | -.4831004 1.109789 -0.44 0.663 -2.658247 1.692046
-------------------------------------------------------------------------------

.
end of do-file

. do "C:\Users\rjohn123\AppData\Local\Temp\STD23a8_0000 00.tmp"

. test female
female not found
r(111);

end of do-file

r(111);

. tab female

Female? | Freq. Percent Cum.
------------+-----------------------------------
0Male | 195 65.66 65.66
1Female | 102 34.34 100.00
------------+-----------------------------------
Total | 297 100.00

↧

bysort and summary statistics

August 6, 2018, 8:39 am

≫ Next: Oaxaca Binder decomposition query

≪ Previous: Ambiguous Abbreviation Error with test command

Dear All-
I am trying to create a new variable ('reference_median') populated with the median value of the number of staff (var: 'fte') for each level of health care worker (var: 'cadre_cat'), type of support (var: 'support_type'), and clinic size category (var: cop18tier). I understand that this is some combination of a bysort command along the lines of: bysort cadre_cat support_type (cop18tier): egen ref_fte=median(fte).

The other wrinkle is that I want to populate all of the other quartiles (var: quartile_txcurr) with the median values that I get for quartile_txcurr=4 (quartile 4 is the best performing quartile and I want to use their median number of staff as a reference for the other quartiles).

I can't seem to get the right combination---I have attached a sub-set of my dataset here.

I would very much appreciate any quick help/guidance anyone may have.

Best-Patrick

↧

Oaxaca Binder decomposition query

August 6, 2018, 8:43 am

≫ Next: mathematical formula of mixed models for repeated measures

≪ Previous: bysort and summary statistics

Hi all,

I am using the blinder Oaxaca decomposition method to study spousal decision making outcomes by different groups (wives with children versus wives with no children). My dataset is a repeated cross section with two survey rounds. I ran a threefold decomposition using the below code (My outcome variable is a decision making score, independent variables are various demographic predictors and the groups I’m comparing are women who have children with women who have no children):

Code:

oaxaca M1 deduc2 deduc3 deduc4 dreleduc2 dreleduc3 dses1 dses2 dses3 dses4 dsondum1, by (birthstat) detail

Which returned the below output:

Code:

Blinder-Oaxaca decomposition                    Number of obs     =      4,560

           1: birthstat = 0
           2: birthstat = 1

------------------------------------------------------------------------------
          M1 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
Differential |
Prediction_1 |  -.2463795   .0761519    -3.24   0.001    -.3956345   -.0971245
Prediction_2 |   .0185322   .0219223     0.85   0.398    -.0244347    .0614991
  Difference |  -.2649117   .0792446    -3.34   0.001    -.4202281   -.1095952
-------------+----------------------------------------------------------------
Endowments   |
      deduc2 |   .0002665    .005405     0.05   0.961    -.0103271    .0108602
      deduc3 |   .0004218   .0014309     0.29   0.768    -.0023828    .0032263
      deduc4 |   .0011656   .0084264     0.14   0.890    -.0153499     .017681
   dreleduc2 |   .0015314   .0021107     0.73   0.468    -.0026055    .0056683
   dreleduc3 |   .0047169   .0035385     1.33   0.183    -.0022185    .0116522
       dses1 |   .0070985   .0068327     1.04   0.299    -.0062932    .0204903
       dses2 |   .0101704   .0062187     1.64   0.102    -.0020181    .0223589
       dses3 |  -.0097885   .0058672    -1.67   0.095    -.0212881     .001711
       dses4 |   -.001965   .0026364    -0.75   0.456    -.0071323    .0032022
    dsondum1 |  -.1796086   .0486933    -3.69   0.000    -.2750457   -.0841715
       Total |  -.1659911   .0495968    -3.35   0.001    -.2631991   -.0687831
-------------+----------------------------------------------------------------
Coefficients |
      deduc2 |  -.1537282   .1858926    -0.83   0.408    -.5180711    .2106147
      deduc3 |  -.0243597   .0096307    -2.53   0.011    -.0432357   -.0054838
      deduc4 |   .0247731   .0669242     0.37   0.711    -.1063959    .1559421
   dreleduc2 |     .08938   .0493094     1.81   0.070    -.0072647    .1860247
   dreleduc3 |    .072015   .0559359     1.29   0.198    -.0376173    .1816474
       dses1 |  -.0126002   .0334791    -0.38   0.707    -.0782179    .0530176
       dses2 |   .0858618   .0497683     1.73   0.084    -.0116823    .1834059
       dses3 |   .0455632   .0556434     0.82   0.413    -.0634959    .1546222
       dses4 |   .0754199   .0555338     1.36   0.174    -.0334242    .1842641
    dsondum1 |  -.1796086   .0486933    -3.69   0.000    -.2750457   -.0841715
       _cons |  -.3289041   .3380435    -0.97   0.331    -.9914571     .333649
       Total |  -.3061878    .080242    -3.82   0.000    -.4634593   -.1489163
-------------+----------------------------------------------------------------
Interaction  |
      deduc2 |    .016801   .0213559     0.79   0.431    -.0250558    .0586578
      deduc3 |  -.0040486   .0131121    -0.31   0.757    -.0297478    .0216507
      deduc4 |   .0108323   .0294287     0.37   0.713    -.0468469    .0685115
   dreleduc2 |   .0081574   .0099035     0.82   0.410    -.0112531    .0275678
   dreleduc3 |  -.0117695   .0111562    -1.05   0.291    -.0336352    .0100963
       dses1 |   .0021806    .006144     0.35   0.723    -.0098614    .0142225
       dses2 |  -.0179886   .0143918    -1.25   0.211     -.046196    .0102189
       dses3 |   .0128592   .0165353     0.78   0.437    -.0195494    .0452677
       dses4 |   .0106348   .0112055     0.95   0.343    -.0113276    .0325973
    dsondum1 |   .1796086   .0486933     3.69   0.000     .0841715    .2750457
       Total |   .2072672   .0580637     3.57   0.000     .0934645    .3210699
------------------------------------------------------------------------------

.

From my understanding I have interpreted as:

The mean of the decisions score is -0.24 for women with no children and 0.02 for women with children, yielding a gap in women’s contribution to decisions of -0.26. The decrease of -0.16 indicates that differences in endowments account for just over half of the gap. The total for endowments is the total explained portion by my predictors and the total for coefficients is total unexplained portion.

My question is threefold:

If the unexplained portion (-0.30) is smaller than the difference (-0.26) does that mean that the unexplained portion is explaining less than the total observed gap (in spousal decision making)?
What does it mean if the unexplained portion is larger than the explained portion?
How would one interpret the results below “endowments” and “coefficients”, for instance how are the results for deduc4 (having a university degree where reference group is no education at all) under endowments different from the results under coefficients?

I have used the following literature to help aid my understanding of Oaxaca:

Jann, B., 2008. The Blinder-Oaxaca decomposition for linear regression models. The Stata Journal, 8(4), pp.453-479.

O’Donnell, O., Van Doorslaer, E., Wagstaff, A. and Lindelow, M., 2008. Explaining differences between groups: Oaxaca decomposition. Analysing health equity using household survey data. Inst Learn Resourc Ser, pp.147-157.

Thank you.

↧

mathematical formula of mixed models for repeated measures

August 6, 2018, 9:36 am

≫ Next: Garch model

≪ Previous: Oaxaca Binder decomposition query

Dear Statalist users,
I have a question about the mathematical formula of linear mixed models when we have pre-/-post test data.
The data I work with come from a randomized controlled trial, where subjects were assigned to a control and treatment group and took had pre and post-test surveys. To analyze the effect of the treatment on the dependent variable (Y), I used the command 'mixed' on long-shaped data.

Code:

 mixed Y Time##Treatment covariates || id:

Now I am trying to convert it into a formula, yet I am not sure if the formula below captures the random effect. Unfortunately, it is in linear format; I could not figure out how to paste the subscripted version:

Y_(ij= ) β_0 + β_1 G_i + β_2 T_j + β_3 G_i T_j + γX_i + ε_ij

where
Y_ij denotes the response variable scores for subject i at time j ;
β₀is the mean response ;
Gi refers to Group—Control (0) Treatment (1) ;
Tj refers to time (pre-test (0) or post-test (1) ;
X_i is the vector of control variables all measured in time 0 (pre-test survey).

Thanks much in advance.
Regards,
Sule

↧

Garch model

August 6, 2018, 11:24 am

≫ Next: New version of invcise on SSC

≪ Previous: mathematical formula of mixed models for repeated measures

Dear all, please I need your help. I am working with daily stocks returns in a Garch model. In order to control the lags I create a business calendar. But now, I want to forescast the volatilty for 30 days, however when I put the command tsappend, add(30), a message appers: the time variable may not be missing.

Could you help me?

Thanks a lot.

↧