Quantcast
Channel: Statalist
Viewing all 65584 articles
Browse latest View live

FEs with xtivreg2

$
0
0
Hi,

I am using a panel data set that is unique on municipality and year and using the following code:

Code:
tsset origmun year, yearly
xi: ivreg2 y (x z)  ,  cluster(origmun) fe
where origmun is the municipality ID. I want to include a variable measured at the municipality level and add fixed effects at a higher admin (department) level but when I try to do this it omits the municipality level. Is xtivreg2 not the right command for this?

​​​​​​​Thanks

Stata 16: ERM for panel data

$
0
0
Hi all Statalisters

I'm eagerly updating myself on the new features of Stata 16 and it looks really great!

I'm interested in extended regression models (ERM) for panel data as I'm working on panel data analyses with a binary endogenous treatment. I think the instrumental variable (IV) design may be suitable. However, IV requires important assumptions (AIR '96) so I'm also interested in alternatives in case some conditions are violated.

From what I gather, ERM differs from IV (or encompasses IV and other approaches), so I'm working myself through Stata ERM Reference Manual 16 to find an explanation of the statistics behind ERM other than IV for my purpose. I'm specifically interested in the details of how ERM handles endogenous treatment assignment.

For example, briefly, the IV approach require some form of plausible exogenous variation that predict treatment with no other paths to the outcome other than through treatment, and we often use the 2SLS-estimator to obtain our estimate. I can't find some form of equivalent explanation for ERM, and I'm curious to know whether exogenous variation is a criteria for ERM or if non-random assignment is handled some other way. I can't seem to find details on this which may be due to me misunderstanding ERMs as something separate. I would highly appreciate input on this and references on the topic.

How to create the average value of a variable in the respective quarters over the past three years for each firm?

$
0
0
Here is part of my dataset:
gvkey(firm_id) fyear(year) qarter asset_growth(variable value)
001004 1995 1
001004 1995 2
001004 1995 3
001004 1995 4
001004 1996 1
001004 1996 2
001004 1996 3
001004 1996 4
001004 1997 1
001004 1997 2
001004 1997 3
001004 1997 4
001004 1998 1
001004 1998 2
001004 1998 3
001004 1998 4
......... ...... .
........ ....... .
001009 1995 1
001009 1995 2
001009 1995 3
001009 1995 4
001009 1996 1
001009 1996 2
001009 1996 3
001009 1996 4
001009 1997 1
001009 1997 2
001009 1997 3
001009 1997 4
001009 1998 1
001009 1998 2
001009 1998 3
001009 1998 4
......... ...... .
........ ....... .
What I need do is to use stata to calculate the mean of asset_growth in the respective quarters over the past three years for each firm. However, I don't know how to do it? Is there anyone can help me with this problem? Thank you so much in advance!

How to drop duplicate ID observation series with different variable values

$
0
0
Hi all,

I hava an panel dataset for firms and stock returns between 2005 and 2015. However, for some observations (sorted by FirmID and Date) I have duplicates with differing stock prices.
Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input float(DailyObservation FirmID) long Date double ClosingPrice float dup_obs
675 23 16439 7.46 1
  1 23 16439 2.95 2
  2 23 16440 2.96 1
676 23 16440 7.59 2
  3 23 16441 2.95 1
677 23 16441 7.58 2
  4 23 16442 2.96 1
678 23 16442 7.75 2
679 23 16443 7.76 1
  5 23 16443 3.25 2
680 23 16446 7.84 0
681 23 16447    8 1
  6 23 16447 2.95 2
682 23 16448  7.8 0
683 23 16449 7.72 1
  7 23 16449 2.95 2
684 23 16450 7.84 0
685 23 16453 7.93 1
  8 23 16453 2.95 2
  9 23 16454 2.95 1
686 23 16454 7.83 2
 10 23 16455 2.95 1
687 23 16455 7.85 2
 11 23 16456 2.95 1
688 23 16456 7.79 2
 12 23 16457 2.95 1
689 23 16457 7.78 2
 13 23 16460 3.02 1
690 23 16460 7.68 2
691 23 16461 7.68 1
 14 23 16461 2.95 2
692 23 16462 7.77 1
 15 23 16462 2.95 2
 16 23 16463 2.95 1
693 23 16463 7.85 2
694 23 16464 7.84 1
 17 23 16464 2.95 2
695 23 16467  7.9 1
 18 23 16467 2.95 2
696 23 16468 8.09 1
 19 23 16468 2.93 2
697 23 16469 8.07 1
 20 23 16469  3.1 2
 21 23 16470    3 1
698 23 16470 7.96 2
 22 23 16471 3.01 1
699 23 16471 8.02 2
700 23 16474 8.28 0
701 23 16475 8.36 1
 23 23 16475 2.94 2
 24 23 16476 3.31 1
702 23 16476 8.85 2
 25 23 16477 3.35 1
703 23 16477 8.76 2
704 23 16478 8.78 1
 26 23 16478  3.3 2
 27 23 16481 3.03 1
705 23 16481 8.59 2
 28 23 16482 3.29 1
706 23 16482 8.71 2
 29 23 16483 3.02 1
707 23 16483 8.74 2
 30 23 16484 3.03 1
708 23 16484  8.8 2
709 23 16485 8.57 1
 31 23 16485  3.2 2
710 23 16488 8.51 1
 32 23 16488 3.02 2
 33 23 16489 3.02 1
711 23 16489 8.36 2
 34 23 16490 3.02 1
712 23 16490 8.26 2
713 23 16491 8.23 0
 35 23 16492 3.02 1
714 23 16492 8.38 2
715 23 16495 8.46 0
716 23 16496 8.27 1
 36 23 16496 3.02 2
717 23 16497 8.39 1
 37 23 16497 3.03 2
718 23 16498 8.23 1
 38 23 16498 3.02 2
 39 23 16499 3.05 1
719 23 16499 8.31 2
 40 23 16502 3.02 1
720 23 16502 8.41 2
 41 23 16503 3.02 1
721 23 16503 8.51 2
 42 23 16504 2.93 1
722 23 16504 8.52 2
 43 23 16505 2.95 1
723 23 16505 8.42 2
 44 23 16506  3.2 1
724 23 16506 8.62 2
725 23 16509  8.4 1
 45 23 16509  3.2 2
726 23 16510 8.04 1
 46 23 16510  3.3 2
727 23 16511 7.96 0
 47 23 16512 3.35 1
end
format %d Date
Dup_Obs was derived using an ADO file and the code:
Code:
dup FirmID Date
I would like to drop one duplicate time series set, either the series with the lower closing prices or the series that does not start with DailyObservation #1. I am having trouble coming up with the code to drop one of the series. I have tried:
Code:
dup FirmID Date, drop
However this drops all dup_obs that are not equal to 0. This parts of both of the duplicate series to be dropped.
Using code such as
Code:
drop if dup_obs!=0 & DailyObservation>2865
also does not help because since it is an unbalanced panel data not all duplicate series go until such a high Observation number.

Regarding the question of the spmatrix import comand

Different markers with rcapsym

$
0
0
Hi,

I would like to have 2 different markers for the starting points and the end points. So for example, the 07 starting score should be a hollow circle and the 18 end score should be solid circle. The code that I have right now only gives me circles:

twoway (rcapsym v07 v18 indic, lwidth(thick) msize(medlarge) msymbol(o)) if ccode == 679, ylabel(-4(1)4, labsize(huge)) ymtick(, labsize(vlarge)) xlabel(, labsize(huge)) xmtick(, labsize(vlarge)) title(Yemen, size(huge)) xsize(1.5) ysize(4)


however I have tried different ways, adding more marker specifications to msymbol, but it does not work.

twoway (rcapsym v07 v18 indic, lwidth(thick) msize(medlarge) msymbol(o oh)) if ccode == 679, ylabel(-4(1)4, labsize(huge)) ymtick(, labsize(vlarge)) xlabel(, labsize(huge)) xmtick(, labsize(vlarge)) title(Yemen, size(huge)) xsize(1.5) ysize(4)


Is there even a way to do this?

Thank you,

Steffi



mixed models xtreg xtmixed

$
0
0
Hi All,

I have a question about xtreg and xtmixed.

If I enter the following command I get this result.

xtset interven2

xtreg difftot i.gender2


Random-effects GLS regression Number of obs = 85
Group variable: interven2 Number of groups = 3

R-sq: within = 0.0015 Obs per group: min = 13
between = 0.1840 avg = 28.3
overall = 0.0007 max = 55

Wald chi2(2) = 0.06
corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.9722

----------------------------------------------------------------------------------
difftot | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-----------------+----------------------------------------------------------------
gender2 |
Female | .0294211 .151446 0.19 0.846 -.2674076 .3262497
Non-bin | -.0499999 .4162483 -0.12 0.904 -.8658316 .7658317
|
_cons | .281 .0724595 3.88 0.000 .1389819 .4230181
-----------------+----------------------------------------------------------------
sigma_u | 0
sigma_e | .55706407
rho | 0 (fraction of variance due to u_i)
----------------------------------------------------------------------------------


However if I request mle with xtreg I get the following result.

xtreg difftot i.gender2, mle

Fitting constant-only model:
Iteration 0: log likelihood = -72.359585
Iteration 1: log likelihood = -72.330535
Iteration 2: log likelihood = -72.33045

Fitting full model:
Iteration 0: log likelihood = -72.427795
Iteration 1: log likelihood = -72.067824
Iteration 2: log likelihood = -72.017598
Iteration 3: log likelihood = -72.013335
Iteration 4: log likelihood = -72.013311

Random-effects ML regression Number of obs = 85
Group variable: interven2 Number of groups = 3

Random effects u_i ~ Gaussian Obs per group: min = 13
avg = 28.3
max = 55

LR chi2(2) = 0.63
Log likelihood = -72.013311 Prob > chi2 = 0.7282

----------------------------------------------------------------------------------
difftot | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-----------------+----------------------------------------------------------------
gender2 |
Female | -.0463684 .1571585 -0.30 0.768 -.3543934 .2616566
Non-bin | -.3585661 .4490142 -0.80 0.425 -1.238618 .5214855
|
_cons | .3761473 .1436128 2.62 0.009 .0946713 .6576232
-----------------+----------------------------------------------------------------
/sigma_u | .1827952 .1268277 .0469224 .7121132
/sigma_e | .5517788 .0434 .4729487 .6437482
rho | .0988951 .1260109 .0033984 .5520696
----------------------------------------------------------------------------------
Likelihood-ratio test of sigma_u=0: chibar2(01)= 1.44 Prob>=chibar2 = 0.115



Why should the coefficients and sigma_u variance change?
Similarly this also occurs with xtmixed as can be seen with the following results just below.

It has to do with the constant which I notice is mentioned in the first line of the just previous output but I don't understand how.

Also, the results for xtmixed seem to be opposite to the results from xtreg in the sense that where I request no constant with xtmixed, the results are
identical to where xtreg does not have a constant only model - the first model above..

An explanation of why this is happening would be much appreciated.

xtmixed difftot i.gender2 || interven2:, mle

Performing EM optimization:

Performing gradient-based optimization:

Iteration 0: log likelihood = -72.013312
Iteration 1: log likelihood = -72.013311

Computing standard errors:

Mixed-effects ML regression Number of obs = 85
Group variable: interven2 Number of groups = 3

Obs per group: min = 13
avg = 28.3
max = 55


Wald chi2(2) = 0.77
Log likelihood = -72.013311 Prob > chi2 = 0.6805

----------------------------------------------------------------------------------
difftot | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-----------------+----------------------------------------------------------------
gender2 |
Female | -.0463684 .1537466 -0.30 0.763 -.3477063 .2549695
Non-bin | -.3585661 .4175263 -0.86 0.390 -1.176903 .4597703
|
_cons | .3761473 .1364022 2.76 0.006 .1088039 .6434906
----------------------------------------------------------------------------------

------------------------------------------------------------------------------
Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
-----------------------------+------------------------------------------------
interven2: Identity |
sd(_cons) | .1827952 .1268308 .0469209 .7121365
-----------------------------+------------------------------------------------
sd(Residual) | .5517788 .0434 .4729487 .6437482
------------------------------------------------------------------------------
LR test vs. linear regression: chibar2(01) = 1.44 Prob >= chibar2 = 0.1151


xtmixed difftot i.gender2 || interven2:, mle noconstant

Note: all random-effects equations are empty; model is linear regression

Mixed-effects ML regression Number of obs = 85

Wald chi2(2) = 0.06
Log likelihood = -72.733388 Prob > chi2 = 0.9712

----------------------------------------------------------------------------------
difftot | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-----------------+----------------------------------------------------------------
gender2 |
Female | .0294211 .1487494 0.20 0.843 -.2621224 .3209645
Non-bin| -.0499999 .4088367 -0.12 0.903 -.8513052 .7513053
|
_cons | .281 .0711693 3.95 0.000 .1415107 .4204894
----------------------------------------------------------------------------------

------------------------------------------------------------------------------
Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
-----------------------------+------------------------------------------------
sd(Residual) | .5693547 .0436675 .4898902 .6617091
------------------------------------------------------------------------------


Thanks in advance,

Don

Constrained systems of sfcross regressions!

$
0
0
Hi everyone,

I'm trying to do something which I'm not sure is possible with Stata.

Basically, I'm trying to replicate a paper on rent control (Caudill 1993) which compared prices of controlled rental units to the inefficient output of a production function (where inputs are the hedonic characteristics of the units) and prices of uncontrolled units as the inefficient output of a cost function; the "frontier" level would be the equilibrium price in absence of controls. The comparison stems from the fact that controlled rents should be lower than the hypothetical equilibrium price in absence of controls (lower than the "frontier") while uncontrolled rents should be higher.

My dataset contains rent and housing characteristics on a sample of housing units, and contains both controlled and uncontrolled units.
So I need to estimate a stochastic frontier production function and a stochastic frontier cost function simultaneously on different groups of my dataset (but the Y variable and the X variables are the same!), imposing coefficients of the two models to be equal. The dependent variable Y is the rent and the independent variables X1 X2 X3 X4 are characteristics of the units. Units are controlled if C=1 and uncontrolled if C=0.
So to estimate the separate frontiers for the two groups (the production SF for controlled units and cost SF for uncontrolled) I'd do:

Code:
sfcross Y X1 X2 X3 X4 if c==1

sfcross Y X1 X2 X3 X4 if c==0, cost
Is there a way to estimate the two frontiers simultaneously, imposing the beta coefficients for the two models to be equal?

Or, as a second-best option, is there a way to estimate the first model first, and then estimate the second imposing the coefficients of the first one?

Hope it is clear enough.
Aurora




PCA vs MCA for index with survey weights

$
0
0
Dear All,

I am trying to calculate a wealth index using Principal Component Analysis. The quintiles derived from it are to be used as proxy for household socioeconomic status in a regression later.

I am using household income separately as an explanatory variable--my own reasoning being wealth and income are different ideas and it is also in accordance with the literature I am following.

Now, the variables for the index are all binary except for one ordinal categorical variable.

It is suggested that if only ordinal and nominal categorical data is used, multiple correspondence analysis is the apparent method.

However, I am using survey data, and while the user-written command -pca- accommodates for aweight, the other user-written -mca- does not.

What would be the best approach under these circumstances. Would going forward with -pca- be appropriate?

Observations in descriptive and multivariate analyses

$
0
0
Hi, I am just writing to ask something about the observations in descriptive and multivariate analyses in my paper. Do I Need to make them in all of the tests the same? Like if I have 2000 observations in the regression, do I need to add if e(sample) when I run the descriptive analyses?
Thank you in advance!

Summarizing special character of string variable*

$
0
0
Is there any command by which I can summarize special characters (e.g. *,-.) of string variable?

2x2 AB/BA crossover trial analysis resources/advice

$
0
0
Hi all.

I would like to conduct the analysis of a trial comparing two drugs (A and B) on reducing pain associated with a procedure.

Subjects are randomly exposed to either drug A or B then undergo a procedure and asked to score the pain associated with that procedure 0-10. After a washout period, they are exposed to drug B or A (the one they were not initially exposed to) and then undergo the same procedure, again being asked to score their pain 0-10. Subjects are also asked to score their satisfaction with each drug, again on score 0-10. Subjects are also asked to state their overall preference (drug A or B).

Although I can find resources on crossover trials (such as the textbook by Senn 2002, as well as some online explanations of crossover trials), I am unable to find a decent resource with regard to the analysis of one in Stata.

Using "help pkcross" provides some information, but is somewhat brief.

Can anyone help? Indeed, can pkcross be used for experiments such as this or is it more designed for pharmacokinetic experiments as the name suggests?

Regards

Ereturn Scalar on MI Estimate?

$
0
0
Hi All,

I currently am trying to perform a bootstrap on mi estimate: logistic. However, when I run my code, I get the following error:

Bootstrap replications (20)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
xxxxxxxxxxxxxxxxxxxx
insufficient observations to compute bootstrap standard errors
no results will be saved


My code is as follows:

Code:
mi set flong
mi register imputed asd age sex bmi smoke numlevels dosym
    
set seed 54321
    
    capture program drop myboot
    program myboot, eclass properties(mi)
        mi impute chained (logit) asd sex smoke (pmm) bmi dosym (ologit) numlevels, augment force add(60) 
        mi estimate: logistic asd age sex bmi smoke numlevels dosym
            
        ereturn scalar b_a = el(e(b_mi),1,1)
        ereturn scalar b_b = el(e(b_mi),1,2)
        ereturn scalar b_c = el(e(b_mi),1,3)
        ereturn scalar b_d = el(e(b_mi),1,4)
        ereturn scalar b_e = el(e(b_mi),1,5)
        ereturn scalar b_f = el(e(b_mi),1,6)
        ereturn scalar b_g = el(e(b_mi),1,7)
        
    end

bootstrap b_age=e(b_a) b_sex=e(b_b) b_bmi=e(b_c) b_smoke=e(b_d) b_lvls=e(b_e) b_dosym=e(b_f) b_int=e(b_g), reps(20) nodrop :myboot
I tried to look into why this might be happening and believe it is due to the ereturn function not recognizing my mi estimates as an eclass program for the following bootstrap command.

Any advice on how to address this or what else may be the issue? Thanks for your help in advance!

Mixed-effects logistic regression for panel data

$
0
0
Hello, I am using mixed-effects logistic regression for panel data in STATA 15, and I was wondering if my commands are correct.

My DV is a binary variable, and each respondent was surveyed once a year for five years. So, each respondent has five repeated measures. In the data, the respondent identifier is the variable ID. I also have time-varying covariates IV1, IV2, and time-invariant covariates IV3 and IV4, and a time variable Year. I want to use a mixed-effects logistic regression as follows:

Code:
melogit DV IV1 IV2 IV3 IV4 Year || ID: Year, cov(un)
But, this is kind of like a growth curve model except for the binary DV. I was wondering if my command is correct, and what is the difference between a growth curve model and a mixed-effects logistic model for panel data. Thank you very much!

Can Stata 16 frames be appended?

$
0
0
Fellow Statalsters (especially StataCorp)

Many thanks to StataCorp for Stata 16. The frames are a specially welcome addition, and I have been playing with these over the weekend so far. I definitely hope to frameify my resultsset-generating programs to make resultsframes.

One immediate query. Is it possible to append frames (as we append datasets using the append commnd)? I would find such a possibility very useful in for the Stata 16 version of the parmby module of my parmest package. I have looked in the help for frames and for append, but have so far found nothing about appending frames. (Am I just not looking in the right place?)

Best wishes

Roger

Blog entry on quickly setting up Python with Stata 16

Combining local macros in Stata

$
0
0
I am trying to create a loop that copies specific files (which have a specific identifier at the end of the filename) from a list of folders to another folder using Windows shell commands. However, I am having trouble working with the local macros in Stata. I have tried troubleshooting this with a variety of different ways but get various issues.

1) When I run the commands below:
local folder "folder"
local version "version10"
di "C:\...\`folder'\*`version'.*"

Stata shows this:
C:\...`folder'\*version10.*

It is unclear to me why the `folder' does not return value the assigned to the local macro and also why the backslash somehow disappears.

2) I tried to get around this by combining the local macros as follows:
local folder "folder"
local version "version10"
local path = "C:\..." + "`folder'" + "\*" + "`version'" + ".*"
di "`path'"

This works and returns the following path:
C:\...\folder\*version10.*

However, when I tried to create a loop to do this for all the folders as follows:
local names "a" "b" "c" (These would be the folder names)
local version "version10"
foreach folder of local names {
local path = "C:\..." + "`folder'" + "\*" + "`version'" + ".*"
di "`path'"
}

I get a "too few quotes" error, even though this is the same exact command that I ran previously that returns the correct path.

I would really appreciate any help with this.

Roommates' max expenditure with parents' educations

$
0
0
Dear All, I was asked this question here. The data set is
Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input int(id exp) long roomnumber byte(feduc meduc)
53 1700 105111 4 5
43 1800 105111 8 3
57 1500 105211 5 6
56 2000 105211 5 6
60 2100 105211 3 3
58 1321 105211 4 4
63 2500 105211 7 7
59  900 105212 6 5
62 1200 105212 6 3
72 1200 105212 5 7
end
Firstly, for each room (`roomnumber'), there are alternative numbers of roommates (with different `id'). We want to obtain a new variable for each roommate, say `max_exp', which is the maximum of `exp' in each `roomnumber', excluding himself. I have done this by ( ssc install asrol)
Code:
bys roomnumber: asrol exp, stat(max) xf(focal)
with result as
Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input int(id exp) long roomnumber byte(feduc meduc) double max_exp
53 1700 105111 4 5 1800
43 1800 105111 8 3 1700
57 1500 105211 5 6 2500
56 2000 105211 5 6 2500
60 2100 105211 3 3 2500
58 1321 105211 4 4 2500
63 2500 105211 7 7 2100
59  900 105212 6 5 1200
62 1200 105212 6 3 1200
72 1200 105212 5 7 1200
end
Secondly, I want to generate two additional variable, say `feduc1' and`meduc1', which is the `feduc' and`meduc' from the one with maximum expenditure. Taking the average of `feduc' and`meduc' if there are ties in the maximum expenditures.Any suggestions? Thanks.

Panel Data with same values/data for independent variables across multiple countries

$
0
0
I am analyzing the impact of US monetary policy variables on capital flows to emerging markets and intend to use panel data analysis. I have data on capital flows for 20 countries individually from 2000-2018 and US monetary policy variables for the same period.

What I don't understand is that the US monetary policy variables (inflation, industrial production, spead) - the independent variables would remain the same for every country from 2000-2018.

Is it correct to use panel regression in this case?

Python means no more calling RDBMSs to get the UTC offset

$
0
0
Updated my -tslog.ado-. See log file attached.
Viewing all 65584 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>