Quantcast
Channel: Statalist
Viewing all 65526 articles
Browse latest View live

Estimating beta with rolling standard deviation and rolling correlation

$
0
0
Hello everybody,

for my master thesis I replicate a scientific paper. I got the following task:

We estimate pre-ranking betas from rolling regressions of excess returns onmarket excess returns.Whenever possible, we use daily data,rather than Monthly data,as the accuracy of covariance estimation improves with the sample frequency(Merton,1980).

We estimat volatilities and correlations separately for two reasons.First, we use a one-year rolling Standard Deviation for volatilities and a five-year horizon for the correlation. Second, we use one-daylog returns to estimate volatilities and overlapping three-day log return for correlation to control for non synchronous trading(which affects only correlations).
The calculations are based on daily Returns. My approach looked like this (Jahr=Year):
Code:
rangestat (sd) r_l, interval(Jahr -1 0) by(business)

rangestat (sd) r_il_totm, interval(Jahr -1 0) by(business)

rangestat (corr) r_l_sum r_il_totm_sum, interval(Jahr -4 0) by(business) (r_sum equals the overlapping log Returns)
by business: gen b_=corr_x*(sd_/sd_totm)
Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input float(id date2 Jahr Monat r_l r_il_totm) double(r_l_sum r_il_totm_sum sd_ sd_totm r_il_totm_sd) float b_
 1 0 1989 12           .            .                    0                    0                   .                   .                   .        .
 2 1 1990  1   .01092663   .016545914  .010926631279289722  .016545914113521576 .012027065923998433 .009804939291130532 .009824321080103091 .9354857
 3 2 1990  1  .010813636 -.0006053161  .021740267053246498  .015940597979351878 .012027065923998433 .009804939291130532 .009824321080103091 .9354857
 4 3 1990  1  .007646538  -.007902635  .029386804904788733  .008037962717935443 .012027065923998433 .009804939291130532 .009824321080103091 .9354857
 5 4 1990  1 -.016907455   -.00959534 .0015527182258665562 -.018103292444720864 .012027065923998433 .009804939291130532 .009824321080103091 .9354857
 6 5 1990  1  .026013814   .003943066   .01675289636477828 -.013554910197854042 .012027065923998433 .009804939291130532 .009824321080103091 .9354857
 7 6 1990  1           0  -.011328274  .009106358513236046 -.016980549320578575 .012027065923998433 .009804939291130532 .009824321080103091 .9354857
 8 7 1990  1 -.006059999  -.007035573  .019953815266489983 -.014420781284570694 .012027065923998433 .009804939291130532 .009824321080103091 .9354857
 9 8 1990  1  .009078786  .0031941284  .003018787130713463 -.015169719001278281 .012027065923998433 .009804939291130532 .009824321080103091 .9354857
10 9 1990  1 -.033692833  -.024339866 -.030674045905470848 -.028181310510262847 .012027065923998433 .009804939291130532 .009824321080103091 .9354857
end
format %tbNYSE date2
The sd,corr and beta for each Business changes every year. For the paper I Need to rank the betas of the stocks every month and put them in a certain portfolio, which makes not much sense if I have a beta which is changing its value just every year. My supervisor told me I have to calculate betas "daily". Do you guys have any idea what I am doing wrong/don't recognize.

Thank you in Advance!

Flagging event in time series data

$
0
0
I am currently using Stata 16 for a long dataset which contains multiple observations per person in a time series format. I specifically want to know how many people returned to the clinic for testing after receiving a low ferritin result. Not every person was tested at each visit, hence the missing observations for Low ferritin for some visits. Here is an example dataset and the variable (Return Flag) I would like to achieve:
ID Visit Num Low Ferritin Return Flag
1 1
1 2
1 3
1 4
1 5 1
1 6 1 1
1 7 0
2 1
2 2 1
3 1 1
3 2 1 1
3 3 1
3 4
4 1
4 2
4 3
4 4
4 5
4 6
4 7 0
4 8 0
4 9 1
4 10 0 1
5 1 0
5 2 0
5 3 0
5 4 0
So, for ID 1, the person returned to the clinic and was tested again, giving a return flag of 1, whereas for ID 2, the person did not return to the clinic and no flag would be given.
I am guessing a do loop may be the route I want to go, but I'm a little stumped by the variation in the data from ID to ID on how to make this work.
I appreciate any suggestions on how to solve this problem.
Thank you so much!

Help needed with creating loop

$
0
0
Hello

I have attached a do-file which contains code which I appreciate needs to be corrected in order to achieve my desired end. From a pre-existing file, which I call 'datafile.dta', I wish to create a random allocation variable 'random_V' which randomly assigns treatments 1 and 0 to the patients listed (one per row) in datafile.dta. I wish this random allocation to change across 10 replications. I have attempted to make a start on this by using the code prior to 'equal variance t-test' in the do-file.

I also wish to perform all of the
calculations I have already listed under 'equal variance t-test', 'unequal variance t-test' and 'testing for equality of variances', but separately for each of the 10 replications. The code

Code:
keep PATIENTNUM DALIVED random_V replicate
append using mynewdata2_10
save mynewdata2_10, replace
in this do-file is intended to request that each of the listed variables 'PATIENTNUM', 'DALIVED' and random_V and the replicate number are stored in the file mynewdata2_10. The variables 'PATIENTNUM' and 'DALIVED' already exist in the orginal file 'datafile' but, of course, the variable values for 'random_V' do not and will need to be regenerated across the 10 replications.

Towards the bottom of the do-file I also specify that "I wish to create a file 'mynewtest2_10' which contains all of the variables listed below with one row for each of the ten iterations". However, I need help in achieving this, please.

I would be most grateful for assistance with editing my do-file accordingly. I suspect that I need to use a loop. However, I also sense that the code is not ready for application of a loop as I have carried forward some lines of code over from a context where bootstrapping was appropriate and I am unclear what changes to make.

Many thanks in advance for your assistance.

Best wishes
Margaret

Linear Lasso Regressions and Stata's "lasso linear" Command

$
0
0
Hi,

I have a methodological question concerning lasso regressions and the lasso linear command in Stata.

I have a dataset on daily investment flows of firms and a huge collection of dummy variables which constitute daily signals upon which the firms potentially invest.
There are more than one million observations and more than 2000 dummy variables (D_*) and a set of a few further controls (C_*).

I want to find out which of the dummy variables are most relevant to explain the dependent variable (FLOW).

To do so, I estimate a lasso linear regression command of FLOW on D_* with C_* being variables which are always included. Due to a long computation time over the whole sample, I first ran this command on a subsample of a random draw of 10,000 observations.


Code:
lasso linear FLOW (C_*) D_* if random_sample == 1
I obtained:

Code:
Lasso linear model                          No. of obs        =     10,000
                                            No. of covariates =      2,179
Selection: Cross-validation                 No. of CV folds   =         10

--------------------------------------------------------------------------
         |                                No. of      Out-of-      CV mean
         |                               nonzero       sample   prediction
      ID |     Description      lambda     coef.    R-squared        error
---------+----------------------------------------------------------------
       1 |    first lambda     612.519        32       0.1412     1.13e+08
       6 |   lambda before    384.6798        34       0.1427     1.13e+08
     * 7 | selected lambda    350.5059        35       0.1427     1.13e+08
       8 |    lambda after    319.3679        35       0.1426     1.13e+08
      12 |     last lambda    220.1279        57       0.1407     1.14e+08
--------------------------------------------------------------------------
From a conceptual point of view, only explanatory variables with a positive effect on the dependent variable are of interest (i.e., those can be thought of positive stimulus to invest). Both explanatory variables with a significantly negative as well as negligible effect on the dependent variable are out of interest. However, of course, in my lasso specification, also variables with large negative coefficients are selected if those exist (and they do, as I found out after checking the variables from the selected lambda-model.)

Therefore, my question: Is it possible to run lasso such that it sets coefficients to zero which are close to zero or less than zero?

Thanks

What does " Set the active IRF file" mean?

$
0
0
Dear all,
In calculating the Impulse response functions I type in

irf create irf, set(myname)

I am not sure what set means in this context. The Stata interpetation is "Set the active IRF file" but I can not understand this explanation. Does it create an irf file which is assigned the name "myme"?

Thank you

Importing fixed width text data

$
0
0
Hello,
I am importing fixed width text data in stata using, for example, the following command
infix cid 1-34 using "R75250L04.TXT"
There is one line for each observation. I am using stata 14.2
It is importing one observation less.
Please help.
Thanks

Marginal effects after IVprobit is same as the IVprobit regression coefficient

$
0
0
Dear Statalist Users,

I am hoping for some insight into the issue that I am facing.

I am using IVPROBIT command in stata and the coefficients of the regression are exactly same as that of the marginal effects.

https://www.stata.com/statalist/arch.../msg00405.html - this thread contains the issue I am facing.

I am using the following commands

ivprobit gs age foreign group tfp (xliq=l.xlq_mean) , vce(cluster id)

margins, dydx(_all) post

Could someone provide me insight into as to why the marginal effects are similar to the coefficients of IVprobit. I am using Stata 15.

Significance levels - explanation below table

$
0
0
Hi everyone,

After using the command reghdfe for my regressions I use esttab:
Code:
esttab using table.rtf, label cells("b(star fmt(3))" "se(par fmt(3))") indicate(`r(indicate_fe)') fonttbl(\f0\fnil Arial; ) stats(r2 N, labels(R2 "N")) starl(* 0.10 ** 0.05 *** 0.010)
The table looks nice. However, the "legend" of the significance levels (*** p<0.01, ** p<0.05, * p<0.1) is not printed below the table. I think it has something to do with cells()

Does anyone of you know why the explanation of the significance levels is not shown below the table?

I am looking forward to your responses.
Jane

Svy for stratified experimental data

$
0
0
Almost more of conceptual question, but I am wondering if one should use the svy prefix with specified strata when analysing data from a stratfied randomized controlled trial using regression techniques?

Statistical test for comparing marginal probabilities for 2 sub-groups from different regressions

$
0
0
I have a probably quite simple question which I do not know how to implement on Stata.

The background is that I estimate xtprobit separately for the employment probabilities of migrants from different areas of origin (7 categories). To simplify the model for discussion, let's say I control for education level which ranges from low, middle to high, gender and age. For example, for migrants from origin 2:
xtprobit i.employment i.edu i.female age if area_origin==2, i(id)
margins, predict(pu0) at (age=28 edu=1)

And so on for each area of origin.

I want to compare the marginal probabilities for migrants from each area of origin, at each education level and of each gender to one specific threshold, that is, the probability of migrants from EU (area_origin==1) with low education level (edu==1) by running a statistical test to see whether their employment probabilities will be significantly higher than the threshold.

But I have difficulty in how to implement this in practice, as these probabilities come not only from different regressions, I am specifically comparing to one group with a specific characteristic (low education level).

Could anyone help in sharing your insights on this? Many, many thanks indeed for your generous help!

statistical analysis on two-eye dataset

$
0
0
Hi everyone, I am very new to stata and I am trying to do an analysis with both eyes of one individual being contributed to the dataset. My dataset is as follows:



Since each subject contributes both eyes (my prof wants me to use both eyes to do the analysis), and I wanted to find out if the distribution of gender is different between myopes and non-myopes, I read up online that I can do an adjusted chi square: Chi-Square Test for R×CContingency Tables with Clustered Data by Jung et al (2003).

I am not sure if this is the right test to use, but i tried to run a test using clchi2 command and this is what I typed in:

clchi2 gender childmyopia, cluster(ID2)

But the error message was: gender not grouped within ID2

Can anyone tell me what went wrong or if there are alternative ways for me to do it??

Or is GEE the only way to do this??

Testing robustness

$
0
0
Hello,

I'm writing my thesis on the effect of nationality diversity on firm performance. I'm using panel data and I'm doing a fixed effects OLS. My supervisor asked my if I tested the robustness of my models.
1. What does she exactly mean?
2. How can I test this?

Thanks!

xtset and xtreg with all coefficients zero

$
0
0
Hi everyone,

I am new to using panel data on stata so I had a few questions.

I am using the India birth recode module of the Demographic and Health Surveys (DHS). This basically acts as an unbalanced panel in that a mother has multiple entries based on the number of children she has. So basically, she enters the survey at the time of marriage and has data entries for each child and exits at the date of interview of the survey. Thus this unbalanced panel gives the retrospective birth history of each woman.

I wanted to declare the dataset as a panel with a year variable corresponding to the id of the person (newid) and year of birth of each child (yobc) using xtset. But I kept getting the error that there are repeated time values, which is natural since mother can have multiple kids and multiple births. So I followed the advice given here and simply did

Code:
xtset newid
And this worked. However, whenever I run the following command for a difference in difference model:


Code:
xtreg childhealth policy treatment did `controls0'
I get that all coefficients are zero. Here policy is 1 for the years where it was in place and 0 otherwise, treatment is the cohort of mothers that were affected by this policy again a 0,1 dummy variable (did is policy*cohort). I am not entirely sure why the results showed that all coefficients were zero or if this was the right step forward.

Code:
reg childhealth policy treatment did `controls0'
however this gets me results i.e. coefficients that are not zero and makes intuitive sense. Do let me know how to proceed on this front. Thank you.

Best,
Lori

reghdfe: saving fixed effect + its standard error

$
0
0
Hi everyone,


I am estimating a triple Difference-in-Differences model in STATA to analyse the effects of a policy that was implemented in a staggered way in a country’s provinces over multiple years. The the differences are:
  • time: pre/post
  • regions: different regions a re treated at different points in time
  • subgroup of population: say I want to estimate the effect of the policy on the population with a characteristic C vs. the rest of the population.
Using year and municipality fixed effects (the regional variation is at “higher” level, i.e. each region in which the policy is implemented contains multiple municipalities), I run the following specification :


Array


Where C_i is whether individual I has the characteristic C, \phi_j is a set of municipality fixed effects, and \phi_t are year fixed effects. Since Treated\_mun_{jt} is a time-varying dummy for whether municipality j is treated in t, the first line is the triple interaction. The second line are the simple interactions and the third line contain the main effects.


The coefficients that I am really interested in are \beta (triple interaction) and \delta (main effect of characteristic C), since I want to see by how much the gap (\delta) closes with the implementation of the policy (\beta).


The problem: Since I have data in the millions and a large amount of fixed effects it takes a relatively long time to estimate this via the usual reg command in STATA. So I turned to reghdfe which seems to be well suited for this estimation. However, whatever way of specifying the command I cannot get it to show me the \delta coefficient. I have tried the following:



1) Leaving the C main effect outside of the absorb() option:

Code:
 reghdfe y 1.C#1.Treated_mun 1.C , absorb(i.year##i.municipality i.year#i.C i.municipality#i.C) vce(cl municipality)
The issue with this specification is that the coefficient on C is dropped due to collinearity, which makes sense since it is included in the absorbed FEs. I’ld just want reghdfe to drop the C main effect that is included in the absorb() option instead and show the one that is outside the absorb() option. (Side question: reghdfe seems to be insensitive to # or ## when specifying interactions, i.e. even if I specify an interaction between two variables with # it would still include both main effects, whereas this is not the case with the regular reg command.Is that correct? Any way of specifiying only the interaction w/o main effects in reghdfe?)


2) Including the C main effect in the absorb() option and using the save FE feature:

Code:
 reghdfe y 1.C#1.Treated_mun, absorb( C_FE=1.C i.year##i.municipality i.year#i.C i.municipality#i.C) vce(cl municipality)

Here the issue is that I don’t quite understand how the FE is saved. As it is a binary variable I am expecting a single value for the individuals with characteristic C. Instead what reghdfe saves are two values one for individuals with characteristic C and one for individuals without characteristic C. They don’t add up or similar to the FE I am getting when using the reg command on the same data. How do I interpret the saved FE?


Does anybody how I can solve this issue? Doesn’t matter whether it’s via fixing the first or the second reghdfe specification or any other way of speeding up the estimation procedure significantly.


Thanks a lot!

Best,

Laurenz

Duplicates - keep first and sometimes second...

$
0
0
Hello

I am working with a large dataset in Stata with 40,000 stroke patients admitted to a hospital, and I could really use some help regarding duplicates.

Some patients in my dataset have been admitted several times during the study period, and I want to keep only first visits, but it appears, that some patients were registered twice during the hospital stay with the same stroke - i.e. they were actually registered twice during the first visit (due to transfer between hospitals). In that case I want to keep both first and second entry of this patient, because it is in fact the same stroke, that is being treated.

For each observation I have a:
Personal identification number “pnr”
I have generated a variable “visit_n” (1 for first, 2 for second, 3 for third visit and so on).
I have the admission date “acutedate” (days after 1st Jan 1960).

I want to keep visit 1 for all patients and delete all visits with a value >1, except if visit 2 is within the first 2 days after visit 1.

I hope this is understandable, and that you can help me, thanks in advance!

Kind regards
Sine Mette Buus

esttab: Can't Export Continuous Variable's Coefficient when it's also in Another Interaction Term &amp; Exporting with Multiple Specifications

$
0
0
Dear Statalist Users,

I am having trouble exporting my estimation results for my paper. I am using Stata 14.0 MP. It seems that the esttab command couldn't export a continuous variable's coefficient when it's in multiple interaction terms and exporting with multiple specifications. I tried google it but couldn't find any similar questions.

I use the 1978 Automobile Data to replicate the problem:

The Code
Code:
use https://www.stata-press.com/data/r16/auto
qui reg price c.mpg##rep78 gear_ratio
eststo test1
qui reg price c.mpg##rep78 c.mpg##foreign gear_ratio
eststo test2
qui reg price c.mpg##rep78 c.gear_ratio##foreign
eststo test3
qui reg price c.mpg##rep78 c.mpg#foreign foreign gear_ratio
eststo test4
local test test1 test2 test3 test4
esttab `test' using test.csv, replace

The Result
Code:
 
(1) (2) (3) (4)
price price price price
mpg -204.1 0 -380.6 -324.2
(-0.35) (.) (-0.69) (-0.57)
.......
Other variables' results are normal; the only problem is the missing coefficient of mpg in test2. I guess the issue is mpg is in two interaction terms in test2. Test4’s mpg is also in two interaction terms, but the code is “c.mpg##rep78 c.mpg#foreign foreign” instead of “c.mpg##rep78 c.mpg##foreign ", so that’s probably a solution. Also when I tried exporting test2 separately it works, so the issue is also related to exporting multiple specifications. In my original data, esttab can export the categorical variable’s coefficient, so continuous variable is also part of the issue.

Does anyone know what’s the problem? Is there an easy way to solve it? Any help would be greatly appreciated


Ivreg 2sls and fixed effects

$
0
0
Dear all,
I would like to use instrument2 as an instrument for sp_city_10t.
I estimate the following regression and the regression table includes all the year and other fixed effects as additional instruments for sp_city even though I only specify instrument2.
What might be the reason?


ivregress 2sls `var' i.year i.il_kodu (sp_city_10t=instrument2) , robust

Instrumental variables (2SLS) regression Number of obs = 247799
Wald chi2(90) =10271.28
Prob > chi2 = 0.0000
R-squared = 0.0151
Root MSE = .08672

------------------------------------------------------------------------------
| Robust
relsales | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
sp_city_10t | -.0000711 .0000434 -1.64 0.102 -.0001562 .0000141
|
year |
2007 | .001895 .0009753 1.94 0.052 -.0000165 .0038064
2008 | -.000938 .0009446 -0.99 0.321 -.0027894 .0009134
2009 | .0075193 .0010504 7.16 0.000 .0054605 .0095781
2010 | -.0008531 .0009168 -0.93 0.352 -.00265 .0009438
2011 | -.0078511 .0008569 -9.16 0.000 -.0095307 -.0061715
2012 | -.0101156 .0008275 -12.22 0.000 -.0117376 -.0084937
2013 | -.0128642 .0008114 -15.85 0.000 -.0144545 -.0112739
2014 | -.0109149 .0010367 -10.53 0.000 -.0129468 -.0088831
2015 | -.0115138 .0011046 -10.42 0.000 -.0136789 -.0093487
|
il_kodu |
2 | -.0130217 .0027672 -4.71 0.000 -.0184453 -.0075981

81 | .0042069 .0035961 1.17 0.242 -.0028413 .0112551
|
_cons | .0451145 .0013347 33.80 0.000 .0424985 .0477304
------------------------------------------------------------------------------
Instrumented: sp_city_10t
Instruments: 2007.year 2008.year 2009.year 2010.year 2011.year 2012.year
2013.year 2014.year 2015.year 2.il_kodu 3.il_kodu 4.il_kodu
5.il_kodu 6.il_kodu 7.il_kodu 8.il_kodu 9.il_kodu 10.il_kodu
11.il_kodu 12.il_kodu 13.il_kodu 14.il_kodu 15.il_kodu
16.il_kodu 17.il_kodu 18.il_kodu 19.il_kodu 20.il_kodu
21.il_kodu 22.il_kodu 23.il_kodu 24.il_kodu 25.il_kodu
26.il_kodu 27.il_kodu 28.il_kodu 29.il_kodu 30.il_kodu
31.il_kodu 32.il_kodu 33.il_kodu 34.il_kodu 35.il_kodu
36.il_kodu 37.il_kodu 38.il_kodu 39.il_kodu 40.il_kodu
41.il_kodu 42.il_kodu 43.il_kodu 44.il_kodu 45.il_kodu
46.il_kodu 47.il_kodu 48.il_kodu 49.il_kodu 50.il_kodu
51.il_kodu 52.il_kodu 53.il_kodu 54.il_kodu 55.il_kodu
56.il_kodu 57.il_kodu 58.il_kodu 59.il_kodu 60.il_kodu
61.il_kodu 62.il_kodu 63.il_kodu 64.il_kodu 65.il_kodu
66.il_kodu 67.il_kodu 68.il_kodu 69.il_kodu 70.il_kodu
71.il_kodu 72.il_kodu 73.il_kodu 74.il_kodu 75.il_kodu
76.il_kodu 77.il_kodu 78.il_kodu 79.il_kodu 80.il_kodu
81.il_kodu instrument2

relsales_iv.xls
dir : seeout







[new on SSC] ineqord: module to calculate indices of inequality and polarization for ordinal data

$
0
0
With thanks as ever to Kit Baum, ineqord is now available on SSC. Its functionality is described below. For references to literature, see the help file.



TITLE
'INEQORD': module to calculate indices of inequality and polarization for ordinal data

DESCRIPTION/AUTHOR(S)

ineqord calculates indices of inequality and polarization for
ordinal data recorded in the response variable: the
Allison-Foster index, the normalized Average Jump index, multiple
Apouey indices (parameters 0.5, 1, and 2), multiple Abul
Naga-Yalcin indices (parameters (a,b) = (1,1), (2,1), (1,2),
(4,1) and (1,4)), multiple Cowell-Flachaire indices (for
peer-inclusive downward and upward-looking status; parameter
alpha = 0, 0.25, 0.5, 0.75 and, optionally, another alpha value
between 0 and 1), the Jenkins index, and also the standard
deviation. Optionally, ineqord also derives estimates of
cumulative distribution functions, survivor functions, and
Generalized Lorenz curves. These can be used to describe ordinal
data distributions and to undertake dominance checks of
differences between distributions.

KW: inequality
KW: polarization
KW: indices
KW: survivor functions
KW: Generalized Lorenz curves

Requires: Stata version 14

Distribution-Date: 20191214

removing the first character of a string variable if it is zero

$
0
0
Hi everyone,
How can I remove the first character(s) of values if they are zero? For example, I would like to convert 0003 to 3 and 019 to 19. Obviously, my variable of interest is a string var.
Thanks,
Nader

Post estimation with endogenous switching regressions

$
0
0
Dear STATALIST,

I am using an endogenous switching regression model to understand how selection affects a program's outcome. I use the movestay command by Lokshin and Sajaia (2004). It works fine though I get convergence issue with very large regression coefficients.

I now want to do some post estimation calculations in order to compute ATT and ATU. I use their exact code in the STATA journal which is "predicted fam, yc1 _1". However, STATA tells me the option
yc1 _1 not allowed.

What could be the issue?
Viewing all 65526 articles
Browse latest View live