Quantcast
Channel: Statalist
Viewing all 65025 articles
Browse latest View live

Importing a Calendar to Stata

$
0
0
I have a dataset for hospitalization procedures by date for a 10 years period. I would like to know if those procedures where perfomed on business days or not.
With the -dow- function, I know if it occurred during the week or weekend. This is a good start. However, I would like to know if it happened during a Brazilian holiday or not. How could I do so? Is there a way to import the Brazilian calendar with all holidays (national and state ones) during this period, so that stata can tell me if a certain procedure were perfomed whether on a business day or not?

Thank you!

egen

$
0
0
Dear Statalisters,
Below I present a few observation of a panel data set. The data is organized by firm and year. Within each firm year, the firm could have multiple patents, these patents have a classification id (patentclassid), e.g. firmid=1000, has 3 patents, 2 of them are in the same class (id=200).

I want to generate a mean value – a count of the average number of patent for each year and patentclass.


e.g. in year 1990, patentclassid, firm 1000 has 2 patents in this class, and firm 1001 has 1 patent in this class, so the mean is (2+1)/2=1.5



Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input int(firmid patentclassid year)
1000 200 1990
1000 200 1990
1000 201 1990
1001 200 1990
1001 201 1990
1001 201 1990
1002 201 1990
1002 202 1990
end

I thought of using

bysort year patentclassid: egen

But not sure what follows egen,

Thank you,
Rochelle


Time-varying Cox PH model with episode splitting – specifying vce(cluster clustervar) using a different variable from id() in stset

$
0
0
Dear Statalisters,

I have a dataset in which I am studying mortality risk in a cohort using Cox PH model. The main exposure of interest is time-varying, and I have used episode-splitting so that a given subject may have multiple lines of data.

I wish to account for correlation within centers of treatment, and I cannot use shared frailty since subjects have delayed entry as a result of episode-splitting. I therefore want to use vce(cluster center).

I have a concern with doing this, that is based around the handling of errors in Cox PH models that deal with episode-splitting. I understand that there is robust regression that treats the subject themselves as clusters. If I specify vce(cluster center), I worry that this may override the default clustering in the Cox PH models with episode-splitting, which is the one specified in stset, id().

I have two questions:

1) Is my supposition correct, or does the Cox PH model as implemented in Stata handle both multiple episodes within id(), and also clustering by a different variable?
2) If the answer above is “Yes”, the manual refers to the analysis of multiple-failure data (does this also apply to episode splitting?). The manual refers to the need to stset a pseudo-ID establishing the time from the last failure as the onset of risk, if one specifies a different clustervar from id(). Is this what I need to do, and if so, can someone provide an example of how this should be done?

Regards,

MM

Diff-in-Diff regression with panel data using weights from psmatch2: How to use weights with xtreg re?

$
0
0
Dear all,

I am using the current Version Stata 14 on Windows.
First, I want to provide a short explanation of what my analysis is:
I have an unbalanced panel of firm data for the years 2000-2014. I investigate the consequences of successions in family firms on firm performance using a difference-in-differences estimation approach on a matched sample. In my initial sample I have around 1600 firms out of which 235 firms experienced a succession in one year. To create a matched sample with control firms similar to the treated firms I use propensity score matching applying the Stata psmatch2 command. I consider firms that experienced a succession in one year as treated and firms that never experienced a succession as untreated.
After the matching procedure I run a diff-in-diff panel regression (using xtreg re) to evaluate whether the performance in the years after succession of firms with a succession differs from those firms that did not experience a succession. As performance measures I look at several different outcomes (from Survey answers or balance sheet information) such as the expected development of Business, the expected the development of employment, credit allocation, capital expenditures, debt, cash flow, roa etc.

So in my first step I run a logit regression and obtain pscores. For the logit regression I collapse my dataset to the firm Level and extimate the logit regression in the cross-section. I Regress the dummy of Treatment (succession yes or no) on several firm characteristics such as firm Age, firm Age squared, legal form, industry and employment size dummies.
Here is the code for that step:

* collapse data to firm level
collapse succession_yes state industry year_of_incorporation legal_form employment employment_size l_employment firm_age firm_age_cat state_business exp_business exp_employment orders diff_finan credit_alloc debt capex total_assets size_assets total_equity tangible_assets cash_flow cash_cash_equivalent roa sales operating_revenue gross_profit_loss, by(IDNUM_ZAEHLER)

*logit
logit succession_yes firm_age firm_age_2 i.r_legal_form i.r_employment_size i.industry
est store model1
predict pscore1


In the next step I apply the matching algorithm using psmatch2. For my baseline I use nearest-neighbor matching (1-to-1) without replacement imposing a caliper of 0.05 and common support option. I had to modify the matching procedure because of the following problems I encountered:
1) I looped over all years to guarantee that treatment and controls are taken always from same year
2) before matching I need to exclude firms that are treated in a year other than i, so that those can't be used as controls in year i (because later in the diff-in-diff I look at performance in the following years after treatment)
3) I need to exclude firms that were used as controls in year i (so they can't be used again as controls in other years)
4) I re-run the matching for every outcome as some of the outcomes have a lot worse data availability (many missing) and I wanted each match to create a sample as big as possible

Here is the code:
* loop over possible outcomes

foreach o in $outcomes_survey $outcomes_bs {


*go to folder
cd "${root}/${succession}/results/analysis/1NN-caliper0-05/`o'"


* loop over all years to guarantee that treatment and controls are taken always from same year

* replace outcome here
capture drop outcome
gen outcome = `o'
label variable outcome "`o'"
*1 nearest neighbor without replacement, caliper 0.05
capture drop ident treated control pscore treated2 support weight2 id_2 nn n1 pdif
capture drop _pscore _treated _support _weight _id _n1 _nn _pdif _outcome
foreach var in ident treated control pscore treated2 support weight2 id_2 nn n1 pdif {
gen `var' = .
}
local start = 2000
local end = 2014
forvalue i = `start'(1)`end' {
qui count if year == `i' & succession == 1 & pscore1 != .
local decideon = 0
local decideon = r(N)
if `decideon' > 0 {
capture drop _pscore _treated _weight _id _n1 _nn _pdif
set seed 123456
*DEALING WITH TREATED
*before matching I need to somehow exclude firms that are treated in a year other than i, so that those can't be used as controls in year i
*tagging firms treated in year other than i
sort IDNUM_ZAEHLER year
bysort IDNUM_ZAEHLER (year): gen treatnot`i'=1 if succession==1 & year!=`i'
count if treatnot`i'==1
bysort IDNUM_ZAEHLER: carryforward treatnot`i', gen(treatnot`i'2)
gsort IDNUM_ZAEHLER - year
bysort IDNUM_ZAEHLER: carryforward treatnot`i'2, gen(treatnot`i'final)
cap drop treatnot`i' treatnot`i'2
xtsum treatnot`i'final
sort IDNUM_ZAEHLER year
*save dataset containing firms treated in year other than i
preserve
by IDNUM_ZAEHLER (year): keep if treatnot`i'final==1
save data/treatnot`i'dataset.dta, replace
restore
*drop firms treated in year other than i
sort IDNUM_ZAEHLER year
by IDNUM_ZAEHLER (year): drop if treatnot`i'final==1
*MATCH
capture psmatch2 succession if year == `i' & pscore1 != .,out(`o') p(pscore1) neighbor(1) common caliper(.05) noreplacement
capture replace year_dummy = 1 if _treated!=. & year == `i'
capture replace ident = 1 if _weight != . & year == `i'
capture replace treated = 1 if _treated == 1 & _support == 1 & year == `i'
capture replace control = 1 if _treated == 0 & _support == 1 & year == `i'
capture replace pscore = _pscore if year == `i'
capture replace treated2 = _treated if year == `i'
capture replace support = _support if year == `i'
capture replace weight2 = _weight if year == `i'
capture replace id_2 = _id if year == `i'
capture replace n1 = _n1 if year == `i'
capture replace nn = _nn if year == `i'
capture replace pdif = _pdif if year == `i'
qui count if succession == 1 & year == `i'
di r(N) " treated firms exist in year = `i' "
qui count if _treated == 1 & year == `i'
di r(N) " treated firms are identified by the command in year = `i' "
qui count if _treated == 1 & _support == 0 & year == `i'
di r(N) " treated firms were off support in year = `i' "
*drop variable treatnot i
cap drop treatnot`i'final
*append dataset containing firms treated in year other than i
merge 1:1 IDNUM_ZAEHLER year using data/treatnot`i'dataset.dta
drop _merge
drop treatnot*final
*DEALING WITH CONTROLS
**drop firms that were used as controls in year i (so they can't be used again as controls in other years)
*tag controls
sort IDNUM_ZAEHLER year
bysort IDNUM_ZAEHLER (year): gen control`i'=1 if _treated == 0 & _weight == 1 & year == `i'
count if control`i'==1
bysort IDNUM_ZAEHLER: carryforward control`i', gen(control`i'2)
gsort IDNUM_ZAEHLER - year
bysort IDNUM_ZAEHLER: carryforward control`i'2, gen(control`i'final)
cap drop control`i' control`i'2
xtsum control`i'final
*problem now, as all control firms are dropped, we need to save them and add back in the end
preserve
sort IDNUM_ZAEHLER year
by IDNUM_ZAEHLER (year): keep if control`i'final!=.
if `i' == `start' {
save data/controldataset.dta, replace
}
else {
append using data/controldataset.dta
}
save data/controldataset.dta, replace
restore
*drop controls in i
sort IDNUM_ZAEHLER year
by IDNUM_ZAEHLER (year): drop if control`i'final!=.
cap drop control`i'final
}
}
*merge back controls
merge 1:1 IDNUM_ZAEHLER year using data/controldataset.dta
drop _merge
drop control*final
}


After that I looked at the quality of the match (balancing properties and graph pscore density). I will not post this part here.

As my last step I now want to run the difference-in-differences estimation using the matched sample given by the psmatch2 routine.
For the estimation I want to Regress my outcomes (=firm performance) on a dummy indication succession (yes, no), a dummy indicating the years post-succession (post = 1 if years after succession, 0 otherwise), the treatment effect is then the interaction of succession and post variable. As further controls I include the firm characteristics I used in the logit regression when I calculated the pscores.

In order to run this regression I first need to define the post variable for the matched control firms. For that I use the year of succession for treated firms to compute the counterfactual year also for the matched control group.

* generate post_c with a fake succession event for control group
gen post_c=1 if ident==1 & treated2==0 & weight2==1
* post_c for all years after fake succession
sort IDNUM_ZAEHLER year
forvalues i = 1/15 {
bysort IDNUM_ZAEHLER: replace post_c=1 if ident[_n-`i']==1 & treated2[_n-`i']==0 & weight2[_n-`i']==1
}


The next problem I encountered was than that the weight2 variable is only non missing in the year of succession, but whole firms should be included, otherwise I cant look at the development of performance after succession. So I created a variable that includes the whole firm ID.

* extend weight variable to whole idnum instead of just one year
sort IDNUM_ZAEHLER year
cap drop inmatch
bysort IDNUM_ZAEHLER (year): gen inmatch=1 if weight2 == 1
count if inmatch==1
cap drop inmatch2
bysort IDNUM_ZAEHLER: carryforward inmatch, gen(inmatch2)
gsort IDNUM_ZAEHLER - year
cap drop inmatchfinal
bysort IDNUM_ZAEHLER: carryforward inmatch2, gen(inmatchfinal)
cap drop inmatch inmatch2
xtsum inmatchfinal
sort IDNUM_ZAEHLER year


So now I can finally run my diff-in-diff estimation using the weights from the psmacth2 which I extended to include the whole firms:

I first run pooled OLS:
* DiD treatment effect
xi: reg outcome succession_yes##post firm_age firm_age_2 i.legal_form i.employment_size i.industry i.year [aw=inmatchfinal], cluster(IDNUM_ZAEHLER)
estimates store didatt1`v'

But to account for my panel data I actually want to run panel OLS using random effects.

xi: xtreg outcome succession_yes##post firm_age firm_age_2 i.legal_form i.employment_size i.industry i.year if inmatchfinal!=., re rob
estimates store didatt2`v'

My problem here is that no aweights are allowed with panel OLS RE.
Since my weight with the 1-1- matching is always 1, it should not matter and I just run xtreg re on all nonmissings.
But as robustness tests I run different matching algorithms ( 2NN, 5NN, radius and caliper). When using those matching techniques weights differ by firm and are smaller than 1. As far as I understand how I should run the diff-in-diff on the matched sample, I would have to use the weights also in the xtreg re regression for my panel data. But weights are not allowed for the Stata command xtreg re. I read that the population-averaged xtreg is supposed to be similar to xtreg re. So I tried to run xtreg pa rob instead and include the weights as pweights. But this does not work neither because the weights are not constant within the panel.
So how can I run a panel random-effects OLS regression (diff-in-diff) including the weights from matching?



I hope my procedure and estimations are clear. Your help is greatly appreciated.

I have the following questions:
- Is the Stata code how I perform the matching correct given my research question and data structure?
- Is my understanding of the matching procedure and how I apply it to the diff-in-diff estimation later correct? To run the regression on the matched sample is it enough to use the weights from psmatch2 or do I need to somehow differently account for the pairs created my the match? Because the way it is now, I just run the regression on a smaller sample than the full sample but I do not account for which controls are matched to which treated firms, correct? Or do the weights take care of that?
- And especially important for matching algorithms other than 1 NN: How can I run a panel OLS with XTREG RE including weights??


Thank you in advance,
Marina






What is the best way to plot regression weights, including interaction from SEM analysis?

$
0
0
I have developed a SEM model (latent growth model to be precise) that regresses a latent variable (the latent intercept) on several dichotomous predictors. Trouble is, I also test for interactions, which makes it all the more difficult to understand for the reader.

(I did the analysis with another software than Stata, this software has limited graphical capabilities (Mplus). In Stata I would need to use both -sem- and -gsem- (gsem due to categorical indicators in some analysis), at least that's what I believe. Another option is R/lavaan, but I prefer Stata over R.)

So, within Stata, how should I get plots for regression weights estimated in a SEM-model (-sem- and -gsem-)?

I have three main effects: a, b, and c. All these three variables are dichotomous, a and b are membership in religious groups, c is gender. (Nonreligios are scored zero on both a and b.) I then add two dichtomous variables representing interaction effects between religious affiliation and gender (a*c and b*c, both are dichotomous).

Thus:

Code:
latent intercept <- religousgroup1 religiousgroup2 gender religiousgroup1female religiousgroup2female
The results are bound to be confusing for many people unless I use plots. I found Ben Jann's presentation of -ceofplot- interesting. Before I start digging into this on my own: I wonder which package/approach I should consider first while educating myself on how to use plots for regression weights obtained with -sem- and -gsem-.



Creating difference from the average of the previous period without using tsset (lags)

$
0
0
Hey,

Currently I am trying to create differences from the Rate with the Mean_A of the previous period. So for the 2006m2 observations I want to create the difference 4 - 4.25 = -0.25, the next one 4.4 - 4.25 = 0.15 etcetera. Since I have multiple observations and do not want to collapse these (by creating means) it is not possible to use the
Code:
tsset
and
Code:
l.
commands Since then you get
Code:
. tsset lastrateadj
repeated time values in sample
r(451);
. Does any of you have a suggestion on how to create the differences in this situation?

Code:
Date        Rate    Mean_A
2006m1    4.3    4.25
2006m1    3.9    4.25
2006m1    4.8    4.25
2006m2    4       2.39
2006m2    4.4    2.29
Kind regards,

Danny


ml program question: right syntax to call parameters of an equation

$
0
0
I want to display and use a component of an equation in my ml program, but I couldn't find the right syntax that would call it. lnsigu2 is modeled with a constant and x1, but all of the last three lines below display the same number. For example, display `lnsigu2:_cons' doesn't work to display lnsigu2 equation's constant's parameter during the maximization process. display _b[lnsigu2:_cons] doesn't work during the maximization. So what is the right syntax to display the constant in the lnsigu2 equation throughout the maximization process?


Code:
program mlprog
        args todo b lnf

        tempvar xb lnsigu2 lnsigw2
        mleval `xb' = `b', eq(1)
        mleval `lnsigu2' = `b', eq(2)
        mleval `lnsigw2' = `b', eq(3)

.....

display `lnsigu2'
display `lnsigu2:_cons'
display `lnsigu2:x1'

.....

end





Export summary statistics using tabstat and estpost (esttab)

$
0
0
Hello,

after reading through the forum I found that esttab often causes issues. I hope you can help me with my problem.
I am trying to create a table in stata and export it to Word. In Stata this works fine, but as soon as I export it all the data is lost.

My data contains survey results. I have data from two groups (marked by either 1 or 2).

This is how my table looks like in Stata:

group | e(Y1_5) e(Y6_10) e(totalY) e(totalX)
-------------+----------------------------------------------------------------
1 | 2.886792 3.924528 6.811321 1.867925
2 | 3.041667 3.8125 6.854167 1.708333

Code:
 
global list1 var1 var2 var3 var4
global format1 rtf

estpost tabstat $list1, by(group) stat(mean)
esttab . using Table1.$format1, replace label cells("mean(fmt(%12.0fc)")
1. I would like to have stata use my labels instead of variable names.
2. I would like to export this into Word. So far I used the code (see above) but that did not work (i.e. I got an empty table).

This is what I get in Word:
(1)
mean
Observations 101

Thank you so much in advance!

Deciles of variable from entire dataset based on breakpoints of variable from part of the dataset

$
0
0
Dear community,


my name is Batuhan and I am new to stata. I have one problem that I am dealing with for a long time now and could not resolve yet. I hope you can help me.

I have monthly stock return data from NYSE, NASDAQ and AMEX (stock exchanges). Based on my data, I have calculated the Momentum (MOM) and now I need to categorize my MOM data in deciles based on NYSE breakpoints. The MOM deciles need to be refreshed monthly.

That means that first, I have to compute the MOM deciles and the breakpoints based only on my data from the NYSE. I have to used the calculated breakpoints in order to compute my MOM deciles for the entire database (not only the NYSE).

My problem is that I need to refresh my MOM deciles monthly. What I tried to do is:

I have computed the MOM deciles for my NYSE data only for each month. --> egen NYSE_decile = xtile(cumul), by(month_id) nq(10)
My problem is that I need the breakpoints (not the deciles: 1,2,3...,9,10) which I need to use in order to compute my MOM deciles for the entire database. --> egen MOM_Decile = xtile(cumul), by(month_id) nq (10) cutpoints(NYSE_decile).
I know that, in order to get the breakpoints, I need to use pctile not xtile. But when I use egen NYSE_decile = pctile(cumul), by(month_id) nq(10) genp(percent), I am told that nq() & genp() not allowed.

To sum it up, what could I do in order to compute my MOM deciles for the whole dataset (AMEX, NYSE, NASDAQ stocks) based on the NYSE-breakpoints?

It would be a huge help if you could help me out.


Best,

Batuhan

Deleting by ID

$
0
0
Hey guys, First of all , i'm German,so please excuse My english :D.
I have one big Problem.
I got a dataset Based on Gyms. Every Gym got an ID ( f.e 2001 / 2002 / 2003 etc. ).
To every Gym 2-10 People answered some Questions, and then got sortet by their Gyms.
F.e Person 1- 2001
Person 2 - 2003
Person 3 - 2001

Now i should delete all Gyms in which less then 5 Person answered the Questions. But there is no Variable that shows how much People answered The Questions . Do you got any idea how i can delete then?

Greetings !

Outreg2 Summary Stats: How to treat zeros as missing

$
0
0
Dear Users,

I work with survey data and there are two types of questions. First type is basic demographics which were asked everyone and the second type is entrepreneurship related questions which were asked only relevant individuals. When I use below command to get summary stats:
bysort female: outreg2 using data.doc , replace sum(log) eqdrop(min max)

I get wrong number of observations.

More specifically, I would like to get no of observations for 1s not for 0s. Any suggestions?

Many thanks for your time and help.


age male female businessowner
32 0 1 1
39 0 1 1
40 0 1 1
48 0 1 0
24 0 1 0
33 0 1 0
35 0 1 1
54 0 1 0
36 0 1 1
22 0 1 0




Count distinct observations per row for several variables

$
0
0
Dear Users,

I have a dataset that contains many variables, but in this instance 84 variables of interest (cal_MeX where X=11-17, 21-27, 31-37, 41-47; cal_MiX where X=11-17, 21-27, 31-37, 41-47; cal_DisX where X=11-17, 21-27, 31-37, 41-47 - in other words (cal_Me11-cal_Dis47) (representing 84 variables). Each participant (row) has an observation for each of these variables (although there are some missing within each row).

Each variable represents a depth measurement (integer) at a separate site, and the integers range from -3 to 6.

I am trying to write syntax to count the number of observations that ==2. I am hoping to generate a variable (nsites2) that when tabulated will give a frequency distribution of the number of observations that ==2 per row e.g.

tab nsites2 // would hopefully give:

==2 Freq.
0 x
1 y
2 z

and so on, with x, y, z indicating how many observations ==2 there are per row in the dataset.

I have tried egen anycount; egen count; egen rowtotal, (followed by one of the if statements below), but it appears that a cumulative total/sum results, not a count of the observations.

Also, I am having some trouble with how to indicate the range of variables of interest. I have tried:

if (cal_Me11-cal_Dis47) ==2

and also

if inlist(2, cal_Me11, cal_Me12, cal_Me13, cal_Me14, cal_Me15, cal_Me16, cal_Me17, cal_Me21, cal_Me22, cal_Me23, cal_Me24, cal_Me25, cal_Me26, cal_Me27, cal_Me31, cal_Me32, cal_Me33, cal_Me34, cal_Me35, cal_Me36, cal_Me37, cal_Me41, cal_Me42, cal_Me43, cal_Me44, cal_Me45, cal_Me46, cal_Me47, cal_Mi11, cal_Mi12, cal_Mi13, cal_Mi14, cal_Mi15, cal_Mi16, cal_Mi17, cal_Mi21, cal_Mi22, cal_Mi23, cal_Mi24, cal_Mi25, cal_Mi26, cal_Mi27, cal_Mi31, cal_Mi32, cal_Mi33, cal_Mi34, cal_Mi35, cal_Mi36, cal_Mi37, cal_Mi41, cal_Mi42, cal_Mi43, cal_Mi44, cal_Mi45, cal_Mi46, cal_Mi47, cal_Dis11, cal_Dis12, cal_Dis13, cal_Dis14, cal_Dis15, cal_Dis16, cal_Dis17, cal_Dis21, cal_Dis22, cal_Dis23, cal_Dis24, cal_Dis25, cal_Dis26, cal_Dis27, cal_Dis31, cal_Dis32, cal_Dis33, cal_Dis34, cal_Dis35, cal_Dis36, cal_Dis37, cal_Dis41, cal_Dis42, cal_Dis43, cal_Dis44, cal_Dis45, cal_Dis46, cal_Dis47)

but these each give different results (or an invalid syntax or a code 198 error); therefore, I am not sure which syntax to use.

I apologise if this question has been asked before (I have searched for hours and could not find a solution that fits my particular problem with such a long lost of variables of interest).
Please let me know if I need to provide more information.

Thank you for your time and help, it is much appreciated.

WTP econometric models

$
0
0
Hi statalist users,

I'm trying to estimate WTP values ​​for three different types of public services, ( water supply, aqueduct and waste management ) . So my first thought is use a multinomial logit model (no order presence, and no correlation asumptions) but I have questions:

1. Does the model is right for this kind of sets, or estimate a different model for each one? / these because all three services will be pay in on bill (sum of prices or values).
2. Can I use different independent variables for each public service equation on the multinomial logit?
3. How can I estimate the mean WTP for each one?

Thanks for any kind of help that you can provide me,

All the best.

Julíán


Using assert with missing observations

$
0
0
Dear all

Suppose i have a dataset with 5 variables. var1 and var2 have 100 observations while var3, var4 and var5 look like this:

Code:
var3  var4   var5
54     12      .
56     15    167
89     17    190
34     18    198
.      .       .
.      .       .
.      .       .
.      .       .
I am trying to use assert to see if a condition is satisfied but i do not get the desired result. I have tried with both !missing() and without:

Code:
assert var4 < var4[_n+1] & var5 > var5[_n+1]

assert var4 < !mi(var4[_n+1]) & !mi(var5) > !mi(var5[_n+1])
The assertion ought to be true but i suspect the missing values are interfering.

How can i get the right answer in the above example?

Saving regression results

$
0
0
How do I save and use regression results. I want to run an OLS regression of the kind:

reg X1 L.X1 L.X2

and then use the results this gives on X1 for the following specification:

reg X2 X1 L.X2

How do I save these results, which I plug into the second regression?

Generate R-squares from regressing PCA factors onto variables

$
0
0
Hi everyone,
I generated 8 factors using the pca methodology. I want to generate R-squares from regressing each generated factor(f1, f2,..,f8) on the 10 variables. My aim is to finally obtain a histogram with the R-squares on the y-axis and the 10 variables on the x-axis. Kindly find attached the 10 variables used to generate the factors(the file includes 2 of the 8 generated factors).Thank you in advance.

Cumulative count of distinct ids by group_id over time periods

$
0
0
Dear Statalist.
I am trying to generate a variable that for each org_id cumulatively over time periods counts the number of distinct employee_ids that meet a criterion >= 1. The dataset structure looks like this:
period org_id employee_id criterion wanted
1 1 1 1 2
1 1 2 2 2
1 1 3 0 2
1 2 4 0 1
1 2 5 1 1
1 2 6 0 1
2 1 7 1 3
2 1 1 1 3
2 4 8 0 0
2 4 9 0 0
3 2 4 0 1
3 2 5 1 1
3 2 10 0 1
Please note that employee_id == 1 is not counted again in period == 2, since this distinct employee_id was counted in period == 1. Rather, the value for this distinct employee_id is carried over from period == 1.

Any and all input on this problem would be highly appreciated.

Regards,
Erik Aadland

test

$
0
0
Hi Samuel,

The data are structured like this:

Code:
use http://www.stata-press.com/data/r8/stanford, clear
gen surgeon=round(runiform()*10)
stset stime, failure(died) id(id)
stsplit posttran, after(wait) at(0)
replace posttran = posttran + 1
....after which I would like to build a Cox PH hazards model, but cluster for surgeon.

So I am assuming that cluster by surgeon will override the clustering by id, which is necessary for correct implementation of the Cox PH model with multiple episodes per patient.



Construct equally weighted decile portfolios by ranking stocks on beta

$
0
0
I have a dataset of about 1324 companies with weekly returns of the last 16 years.
My data looks like this:

http://screencast.com/t/fHRRNl08bS

What i would like to do is: at the end of each month, construct equally weighted decile portfolios by ranking stocks on the past three-year volatility/beta of weekly returns.

And ultimately get a table like:

http://screencast.com/t/kxnHXgEs

I have collapsed the date with:
Code:
gen dm = mofd(date)
format dm %tm
collapse stock return beta vol, by(dm id)
I was thinking to create a variable for each portfolio and give a dummy variable to each id that is in the portfolio in that month. But not sure if that is the correct way and how to do it.

Could someone help me with this?

GMNL postestimation using margins

$
0
0
Hello,

I would like to know how to estimate marginal effetcs after a GMNL model (Fiebig et al., 2010 / Gu et al., 2013), more precisely after a SGMNL model ?

The goal of the first model is to estimate the role of observed and unobserved heterogeneity on the individual choice about tap water, filtered water or bottled water. The second model includes residuals which are estimated in an ologit model. This model allows me to control the problem of endogenous attitudinal variables, here, satisfaction concerning tap water quality (Terza et al. 2008). My problem is that I don't know how to estimate marginal effects and how to predict probabilities.

Bellow, you'll find more informations about my two SGMNL models :

** Changes in the database to run GMNL */

gen gid = _n
expand 3
sort gid
label list choix
list choice age gender gid in 1/9, sepby(gid) nolab

by gid: gen temp = _n

gen eau = (temp==choice)

drop choice
rename temp choice
label define choice_r 1 "Bottled water" 2 "Filtered Water" 3 "Tap water"
label value choice choice_r

list choice eau age gender in 1/9, sepby(gid)

bysort gid: gen alter = _n

xi i.alter,noomit
gen alter_1=_Ialter_1 /* ASC */
gen alter_2=_Ialter_2 /* ASC */
gen alter_3=_Ialter_3 /* ASC */
xi i.r_satis_2,noomit
gen r_sat_1=_Ir_satis_2_1 /* Individual is not satisfied concerning tap water */
gen r_sat_2=_Ir_satis_2_2 /* Individual is satisfied concerning tap water */
gen r_sat_3=_Ir_satis_2_3 /* Individual is very satisfied concerning tap water */

forvalues i = 1/3 {
gen alterXage_`i' = alter_`i'*age
}
forvalues i = 1/3 {
forvalue j = 1/3 {
gen alterXr_sat_`i'`j' = alter_`i'*r_sat_`j'
}
}

/* Residuals estimated with ologit model */
forvalues i = 1/3 {
forvalue j = 1/3 {
gen alterXrs_`i'`j' = alter_`i'*rs`j'
}
}



matrix scale=0,0,1,1,1,1,1,1

xi:gmnl eau alter_1 alter_2 alterXage_1 alterXage_2 alterXr_sat_12 alterXr_sat_13 alterXr_sat_22 alterXr_sat_23, group(gid) id(id) het(i.gregion) scale(scale) diff

matrix scale=0,0,1,1,1,1,1,1,1,1,1,1

xi:gmnl eau alter_1 alter_2 alterXage_1 alterXage_2 alterXr_sat_12 alterXr_sat_13 alterXr_sat_22 alterXr_sat_23 alterXrs_12 alterXrs_13 alterXrs_22 alterXrs_23, group(gid) id(id) het(i.gregion) scale(scale) diff



Any and all input on this problem would be highly appreciated.

Regards,
Paul

Viewing all 65025 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>