not estimable margins after OLS on panel data

March 3, 2017, 8:08 am

≫ Next: How to account for large number of FE when using suest command

≪ Previous: The least tedious method for calculating many autocorrelations?

Dear Statalisters,

I have a problem in computing the margins of an OLS regression with a panel data that I have not managed to solve by looking at previous posts - I apologies in case I overlooked similar issues that had already been discussed.

Background
I have a panel data of crime incidence (from now on, "main_rate") observed for each of the 600 municipalities of a country across 13 years and I evaluated successfully the impact of a reform that occurred in the 6th years. Now I would like to explore some heterogeneous effects, and in particular how the magnitude of the treatment coefficient changes with the size of each of the police districts the country is divided into. By "size" I mean the number of municipalities composing the police district.

The dummy "Treat" is equal to 1 for those municipalities were the reform was implemented and the dummy d turns on for the years during which the reform was enforced.

Code:

treatment=Treat*d

(using, instead, the interaction form ## would not change what follows)

The distribution of the size of the police (variable's name "sizeZP") district based on the value of the treatment variable is the following.

Code:

           |       treatment
    sizeZP |         0          1 |     Total
-----------+----------------------+----------
         1 |       504        120 |       624
         2 |       858        208 |     1,066
         3 |     1,446        192 |     1,638
         4 |       996        512 |     1,508
         5 |       915        320 |     1,235
         6 |       306        240 |       546
         7 |        91          0 |        91
         8 |        80        128 |       208
         9 |       135        216 |       351
        10 |       230        160 |       390
-----------+----------------------+----------
     Total |     5,561      2,096 |     7,657

Due to the fact that there are some imbalances in the distribution (in particular, police districts with 7 municipalities are only in the control group) and to other rather conceptual reasons, I decided to regroup the police districts in this way (however, I don't believe this is actually the source of my problem. Even by using "sizeZP" I would encounter the problem described below):

Code:

gen sizePD=1 if sizeZP==1
replace sizePD=2 if sizeZP&gt;1 &amp; sizeZP &lt;=4
replace sizePD=3 if sizeZP&gt;4 &amp; sizeZP&lt;=7
replace sizePD=4 if sizeZP&gt;7

The interaction term
I then proceed computing the interaction term

Code:

xtreg main_rate Treat d treatment##i.sizePD $controls i.year, fe vce(cluster INS)

This are the results I get:

Code:

Fixed-effects (within) regression               Number of obs      =      7655
Group variable: INS                             Number of groups   =       589

R-sq:  within  = 0.0943                         Obs per group: min =        11
       between = 0.2051                                        avg =      13.0
       overall = 0.1403                                        max =        13

                                                F(21,588)          =     17.38
corr(u_i, Xb)  = -0.6185                        Prob &gt; F           =    0.0000

                                      (Std. Err. adjusted for 589 clusters in INS)
----------------------------------------------------------------------------------
                 |               Robust
       main_rate |      Coef.   Std. Err.      t    P&gt;|t|     [95% Conf. Interval]
-----------------+----------------------------------------------------------------
             WAL |          0  (omitted)
               d |  -.0441572   .0209879    -2.10   0.036    -.0853775   -.0029369
     1.treatment |  -.1234831   .0265136    -4.66   0.000     -.175556   -.0714102
                 |
          sizePD |
              2  |          0  (omitted)
              3  |          0  (omitted)
              4  |          0  (omitted)
                 |
treatment#sizePD |
            1 2  |   .0595935   .0275074     2.17   0.031     .0055688    .1136182
            1 3  |   .0610467   .0297444     2.05   0.041     .0026285    .1194648
            1 4  |   .0920392   .0321078     2.87   0.004     .0289793    .1550992
                 |
             pop |  -2.26e-06   1.91e-06    -1.18   0.237    -6.02e-06    1.49e-06
         density |  -.0000497   .0000186    -2.67   0.008    -.0000863   -.0000132
     meanyxdecla |  -3.00e-06   3.94e-06    -0.76   0.446    -.0000107    4.73e-06
           unemp |   .0016622   .0049068     0.34   0.735    -.0079748    .0112993
         edu_low |   .0080593   .0069817     1.15   0.249    -.0056527    .0217713
                 |
            year |
           2001  |  -.0124893   .0083754    -1.49   0.136    -.0289387      .00396
           2002  |   .0212187   .0107612     1.97   0.049     .0000836    .0423537
           2003  |  -.0054469   .0142556    -0.38   0.703    -.0334449    .0225512
           2004  |  -.0272049   .0168966    -1.61   0.108      -.06039    .0059802
           2005  |          0  (omitted)
           2006  |  -.0055406   .0066912    -0.83   0.408    -.0186822    .0076009
           2007  |   .0142893   .0107971     1.32   0.186    -.0069164    .0354949
           2008  |   .0312251   .0146257     2.13   0.033     .0025001      .05995
           2009  |   .0184093   .0173059     1.06   0.288    -.0155796    .0523983
           2010  |   .0089712    .019733     0.45   0.650    -.0297845     .047727
           2011  |   .0410702   .0246737     1.66   0.097    -.0073891    .0895295
           2012  |   .0293179   .0309947     0.95   0.345    -.0315559    .0901918
                 |
           _cons |   3.700322   .2984158    12.40   0.000     3.114231    4.286412
-----------------+----------------------------------------------------------------
         sigma_u |  .51188845
         sigma_e |  .14790359
             rho |  .92294798   (fraction of variance due to u_i)
----------------------------------------------------------------------------------

The margins... and the problem
In computing margins, I tried several combinations of

Code:

margins treatment##sizePD

and obtained this:

Code:

. margins treatment##sizePD

Predictive margins                                Number of obs   =       7655
Model VCE    : Robust

Expression   : Linear prediction, predict()

----------------------------------------------------------------------------------
                 |            Delta-method
                 |     Margin   Std. Err.      z    P&gt;|z|     [95% Conf. Interval]
-----------------+----------------------------------------------------------------
       treatment |
              0  |          .  (not estimable)
              1  |          .  (not estimable)
                 |
          sizePD |
              1  |          .  (not estimable)
              2  |          .  (not estimable)
              3  |          .  (not estimable)
              4  |          .  (not estimable)
                 |
treatment#sizePD |
            0 1  |          .  (not estimable)
            0 2  |          .  (not estimable)
            0 3  |          .  (not estimable)
            0 4  |          .  (not estimable)
            1 1  |          .  (not estimable)
            1 2  |          .  (not estimable)
            1 3  |          .  (not estimable)
            1 4  |          .  (not estimable)
----------------------------------------------------------------------------------

I have the feeling I am missing something really basic detail, but I have been digging into it so much that I don't manage to step back and find a solution anymore.

Does anyone of you have a solution to this oddity? If you need more information about the type of data, please do not hesitate to ask me below.

Thank you in advance!

Andrea

↧

How to account for large number of FE when using suest command

March 3, 2017, 8:12 am

≫ Next: Multiple commands within a loop (clonevar, recode, label define...)

≪ Previous: not estimable margins after OLS on panel data

I am attempting to use suest with 10-20 linear equations, each of which have a large number of fixed effects and, not surprisingly, running into matsize problems. I believe that one viable alternative would be to first apply a "within transformation" and then estimate the models on the transformed X and Y without the FE. I have done this in the past, and realize that in order to conduct inference, one must adjust for degrees of freedom. In past work, I have simply manually adjusted s.e., multipling by a factor of { (NT-K)/ [ N(T-1)-K] } ^ 0.5.

But, I'm not entirely clear what, if anything, in addition, I should do in the context of suest.

I *suspect* that I should manually adjust the covariance matrix after each equation, and if/when these adjusted covariance matrices are fed into suest, all will be good. But this might not be correct. Also, I am not particularly facile at Mata or other matrix commands in Stata so I'm not sure how to do this manual adjustment.

Any suggestions appreciated.

↧

Multiple commands within a loop (clonevar, recode, label define...)

March 3, 2017, 8:36 am

≫ Next: Use local in global

≪ Previous: How to account for large number of FE when using suest command

Hello!

Here is my setup:

foreach var of varlist homeschlx schoicex-ssamsc seadplcx serepeat ///
sesusout-seexpel snetcrs sinstfee fssportx-fscounslr fsnotesx-fsphonchx ///
fhplace fostory2x-fohistx folibrayx-fosprtevx hdlearnx-hddeviepx hdspcled ///
hdlearn-hdfrnds cenglprg p1scint p1wrmtl p1hispan p1enrl p2guard p2scint ///
p2wrmtl p2hispan p2enrl p2lkwrk hwelftan-hsecn8 hvintrnt {
clonevar `var' = clone_`var'
recode clone_`var' = (1==2) (2==1)
label define clone_`var' 1 "No" 2 "Yes"
label value clone_`var' clone_`var'
}

I'm attempting to clone, recode, and create a new label for the cloned variables all within one loop. How close is my setup and what am I missing (I'm pretty sure a lot). I'm new to understanding how to loop with multiple commands. There is much to learn, but thank you for your help!

↧

Use local in global

March 3, 2017, 8:40 am

≫ Next: stata course

≪ Previous: Multiple commands within a loop (clonevar, recode, label define...)

Dear statalists,

I am not entirely sure whether "use local in global" is an appropriate description for my problem but here is what I would like to do:

I am running several regressions for different dependent variables. The dependent variables differ in the year they refer to.

Code:

        foreach year of numlist 1950(10)1990 {

            global control_dist1_`year'     weighted_dist_y`year'
            global control_dist2_`year'     weighted_dist_y`year' c.weighted_dist_y`year'#c.weighted_dist_y`year'
            global control_dist3_`year'     weighted_dist_y`year' c.weighted_dist_y`year'#c.weighted_dist_y`year' c.weighted_dist_y`year'#c.weighted_dist_y`year'#c.weighted_dist_y`year'

        }

        foreach year of numlist 1950(10)1990 {

            reg mig_total_`year' weighted_sc_initial10_y`year' $control_dist1_`year' $control_pop_`year', robust            // linear dist control
            reg mig_total_`year' weighted_sc_initial10_y`year' $control_dist2_`year' $control_pop_`year', robust            // quadratic d control
            reg mig_total_`year' weighted_sc_initial10_y`year' $control_dist3_`year' $control_pop_`year', robust            // cubic dist. control
            
        }

First, I would like to define for globals for different control variables. With a foreach loop, I define these for every decadal year between 1950 and 1990. Then, when running the actual regressions, I would like to use these controls but only for the year in the dependent variable.

The result is that I try to call upon a local (from foreach loop) within a global (previously defined). For example,

Code:

$control_dist1_`year'

This does not work. Stata gives the error:

1950 [i.e. the first year in the foreach loop] invalid name

.

Did I miss something obvious? Is this possible at all?

Many thanks,
Milan

↧

stata course

March 3, 2017, 9:05 am

≫ Next: How do you make pie chart by combining two variables in stata?

≪ Previous: Use local in global

Dear all,

I want to enhance my stata programming skills from basic to intermediate. I looked up online stata course such as

http://www.stata.com/netcourse/progr...dvanced-nc152/

I wonder if some of you have taken these courses
http://www.stata.com/netcourse/

and recommend them. If you have other suggestions, please let me know.

Thanks,

Rochelle

↧

How do you make pie chart by combining two variables in stata?

March 3, 2017, 9:10 am

≫ Next: Markov switching model for panel data--Help Please

≪ Previous: stata course

How do you make pie chart by combining two variables in stata?. For example, make pie charts by sex (male, female) and studies (yes, no).

↧

Markov switching model for panel data--Help Please

March 3, 2017, 9:44 am

≫ Next: marginsplot after mimrgns

≪ Previous: How do you make pie chart by combining two variables in stata?

Dear Statalists,

I need help with Markov switching model for panel data. Does any one know how can i do this?

thanks

arshad

↧

marginsplot after mimrgns

March 3, 2017, 10:52 am

≫ Next: Interpreting poisson regression coefficients

≪ Previous: Markov switching model for panel data--Help Please

Hello everyone,
I was trying to use marginsplot command after mimrgns, but it shows the following error "previous command was not margins". Does anyone know where I did wrong. Thanks.
Lei

↧

Interpreting poisson regression coefficients

March 3, 2017, 12:14 pm

≫ Next: Filling missing values

≪ Previous: marginsplot after mimrgns

Hi,

I would like to understand how I could interpret the coefficients generated by poisson regression (and zero-inflated poisson if different from poisson). Is it simply exp (beta coeff) as the multiplication factor of the mean dependent variable? The regression equation and results is as follow:

dependent variable=treatment + after + treatment*after + error

Both treatment and after is a binary indicator for being in treatment group and after a service introduction. Thanks.

Weekly Visits	2.6403*** (0.0794)
Weekly Quantity	2.6168*** (0.1049)

↧

Filling missing values

March 3, 2017, 12:16 pm

≫ Next: continuous variable in logistic regression

≪ Previous: Interpreting poisson regression coefficients

Hi dear all,

I'm new to stata and I’m having a problem with filling the missing data here. I'm looking into the relation of crime rates and the share of the young / share of the young male, and got dozens of data like this:

.

year	cr_uk	y	ym
1800	0.9	0.2073
1810	1.1
1820	0.9
1830	1.1	0.1518	0.0837
1840	1.0
1850	0.9	0.2985	0.1236
1860	1.0	0.3060	0.1209
1870	0.9	0.3566
1880	0.8
1890	0.8
1900	0.6
1910	0.6
1920	1.0
1930	0.9
1940	0.7
1950	0.5	0.2140	0.1053
1960	0.8	0.1910	0.0955
1970	1.5	0.1997	0.1009
1980	1.7	0.2142	0.1083
1990	1.6	0.2327	0.1166
2000	1.6	0.2071	0.1030
2010	1.1	0.1989	0.1001

or like this:

.

year	cr_dk	y	ym
1800		0.2385	0.1194
1810
1820
1830
1840
1850		0.2096	0.1042
1860		0.2596	0.1229
1870
1880
1890
1900
1910
1920	0.6
1930	0.7
1940	0.9
1950	0.8	0.2169	0.1079
1960	0.6	0.1895	0.0947
1970	0.8	0.2218	0.1136
1980	1.2	0.2262	0.1161
1990	1.2	0.2274	0.1170
2000	0.9	0.2106	0.1071
2010	0.8	0.1782	0.0899

where cr stands for crime rates, uk & dk for two countries, y for the share of age 20-29 of total population and ym for the share of male aged 20-29 of total population.

I need to fill in the blanks so as to run a simple regression subsequently (reg cr y; reg cr ym). I was told to use some interpolation. So I read some solutions about mi impute and linear interpolation, but only get more confused in front of various alternatives. What method and which option should I best obtain here, for example, for the above two datasets? Should I distinguish the two kinds or would it be simpler if I pool all the countries together and interpolate? I'm totally lost.

For y and ym, I have complete UN data from 1950-2015 by every 5 year. To match with the timeline of cr, only data of every 10 year are listed above. Somehow i feel it wrong. But should I include all the y and ym data, I would have more gaps to fill in crime rates (or is it better?).

I'm terribly sorry for my English expression. If there is anything unclear please do address it and throw it at my face. By the way I am using Stata 12 for Windows. Some syntax I found on line sadly does not work on my laptop.

Can anyone drop a hint? Many thanks in advance!

What I have found
http://wlm.userweb.mwn.de/Stata/wstamiss.htm
http://www.stata.com/manuals13/mimii...pdf#mimiimpute
http://www.stata.com/manuals13/mimixeq.pdf
http://stats.idre.ucla.edu/stata/sem...stata_pt1_new/
http://www.stata.com/support/faqs/da...issing-values/

↧

continuous variable in logistic regression

March 3, 2017, 12:35 pm

≫ Next: How to decompose Erreygers concentration index in Stata

≪ Previous: Filling missing values

Hi Statalist.
I'm running a mixed-effects logistic regression (melogit) with a continuous predictor (which is a volume).

Using the untransformed variable, the OR I got is the OR for one-unit increase of this volume. If I want a 10-units increase, I include my variable/10 in my model.
The fact is that I need to log-transform this variable
So, my question has two aspects:
- in Stata, how can I get the OR for different points of this continuous variable. I know the xblc command, but it seems to allow only "real" values of the variable
- how can I combine rescaling and transforming the variable. My understanding is that I should rescale (i.e. x/10) before transforming. But how can I back transform then?
I'm using Stata 14.1.
Thanks

↧

How to decompose Erreygers concentration index in Stata

March 3, 2017, 1:31 pm

≫ Next: r(603) error

≪ Previous: continuous variable in logistic regression

Hi there,

I am in the process of writing a research paper. I am using Sierra Leone womens DHS dataset. I have used conindex to compute the concentration index (Erreygers). The variable of interest is binary = Contraceptive use (0=No, 1=Yes).

I would like to know if there's anyone who might have a do-file which computes the decomposition of the Erreygers concentration index (using the Wealth Index and Education as the socioeconomic variables). I want to decompose the index to further illustrate the contribution of certain background factors to inequality.

If anyone has worked on something similar, please let me know - or preferably send your do-file so that I can try to adapt it for my research.

I have already used Conindex to get the concentration index.

Thank you.

↧

r(603) error

March 3, 2017, 1:33 pm

≫ Next: item response theory on items with different probability of guessing correctly by chance

≪ Previous: How to decompose Erreygers concentration index in Stata

I am estimating a rolling regression but when I use the option saving and the name of the file to be replaced I get the following error:

rolling _b _se, window(400) r keep(date) saving(\\SGMII1178348\sector_real.dta, replace): arch deltaTRM L.deltaTRM deltaREAL deltaCDS deltaPPC, earch(1/1) egarch(1/1) distribution(ged)

(running arch on estimation sample)
file "\SGMII1178348\sector_real.dta," could not be opened

↧

item response theory on items with different probability of guessing correctly by chance

March 3, 2017, 2:19 pm

≫ Next: How to get R-squared for a scobit (skewed regression) model?

≪ Previous: r(603) error

I am relatively new to irt, so I'd be grateful for any feedback that would help me to use this correctly, and I apologize if my question is overly-simplistic. I would like to do a 3pl model, but I have an instrument where about half the items could be guessed correctly simply by chance about 20% of the time, and half the items about 50% of the time. So in theory I would like to use the sepguessing option to correctly account for these different probabilities, but the irt 3pl stata documentation warns against this, saying that this version of the model has identification problems--I have taken a look at the research papers referenced by the documentation, but since this is not my area, I'm having trouble understanding them properly. I'm wondering if someone on this forum has a bit of concrete advice about things I could do that would help me to determine whether using the sepguessing option in the 3pl model in this case is a problem? Or recommendations about an alternate approach to analyzing the instrument that I am currently working with? Thanks for taking the time to read my post, and thanks in advance for any advice...

↧

How to get R-squared for a scobit (skewed regression) model?

March 3, 2017, 4:33 pm

≫ Next: Evaluating an Integral

≪ Previous: item response theory on items with different probability of guessing correctly by chance

Hello,

I'm working with Stata 13 and complex survey data. I'm using scobit regression for my skewed binary outcome (experience of intimate partner violence: Yes/No). Running my model does not return all the usual statistics one would get with a logit model, so I'm wondering how to get them; specifically R-squared, AIC and BIC.

Thanks in advance for your help.
Som

↧

Evaluating an Integral

March 3, 2017, 4:39 pm

≫ Next: How to keep stkcd (company) with at least 5 consecutive (annual) observations?

≪ Previous: How to get R-squared for a scobit (skewed regression) model?

Hi everyone,

I am attempting to program the following integral in STATA:

Array

with the following code:

program define test
args lnf alph Xb sigma_u
quietly replace `lnf' = integ (normal((log(0.5)-`alph'-`Xb')/`sigma_u')*normal((log(0.27)-`alph'-`Xb')/`sigma_u')))
end

However STATA says that "integ ()" is not recognized. Any ideas?

↧

How to keep stkcd (company) with at least 5 consecutive (annual) observations?

March 3, 2017, 8:08 pm

≫ Next: GSEM - log-likelihood "not concave"

≪ Previous: Evaluating an Integral

Suppose that I have 3 companies, each with 9-year observations on x (some are missing). How can I keep the company (in the following case, stkcd=2) with at least 5 consecutive annual observations? Thanks.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(stkcd year x)
1 2000 .
1 2001 1
1 2002 2
1 2003 .
1 2004 3
1 2005 4
1 2006 .
1 2007 5
1 2008 6
2 2000 .
2 2001 1
2 2002 2
2 2003 3
2 2004 3
2 2005 4
2 2006 .
2 2007 5
2 2008 6
3 2000 .
3 2001 1
3 2002 2
3 2003 .
3 2004 3
3 2005 4
3 2006 4
3 2007 5
3 2008 .
end

↧

GSEM - log-likelihood "not concave"

March 3, 2017, 9:58 pm

≫ Next: How to calculate the nearest distance from a point to a line?

≪ Previous: How to keep stkcd (company) with at least 5 consecutive (annual) observations?

Hello

I used sem builder to estimate a model running maximum likelihood algorithm. However, no results come up and iterations just keep on going endlessly. The syntax is as seen below:

Code:

gsem (GovExpUSD -> GDPgrowth, ) (TRevUSD -> GDPgrowth, ) (GovEff -> GovExpUSD, ) (GovEff -> TRevUSD, ) (GovEff -> CorrINDX, ) (TaxStruct -> GDPgrowth, ) (TaxStruct -> GovExpUSD, ) (TaxStruct -> TRevUSD, ) (TaxStruct -> CorrINDX, ) (PolStab -> CorrINDX, ) (IndrTaxGDP -> TaxStruct, ) (DirTaxGDP -> TaxStruct, ) (CorrINDX -> GDPgrowth, ) (RegQual -> TRevUS D, ) (Law -> GovExpUSD, ) (Law -> TRevUSD, ) (Law -> CorrINDX, ) (Voice -> GovExpUSD, ) (Voice -> CorrINDX, ) (LFPR -> GDPgrowth, ) (LFPR -> TRevUSD, ), cov( e.GovExpUSD*e.TRevUSD e.CorrINDX*e.GovExpUSD) nocapslatent

Must there be something wrong with the model? I tried the model previously using different variables (e.g.: instead of Government Expenditures USD, I used Government expenditures as a percentage of GDP) and it went well. How can this be remedied?

Hoping for a reply
Thank you

-Krizia

↧

How to calculate the nearest distance from a point to a line?

March 4, 2017, 12:10 am

≫ Next: Appropriate model for binary dependent and count independent variable

≪ Previous: GSEM - log-likelihood "not concave"

Hi all,

I understand that we can use geonear to calculate the nearest distance from two points (coordinates), however, how do we do that with a point and a line (say, a railway line)? Since the line has almost infinity points (coordinates) on it. Can we still use geonear to do this?

Thank you very much.

↧

Appropriate model for binary dependent and count independent variable

March 4, 2017, 3:55 am

≫ Next: Elasticity and Marginal Effect of Tobit Regression

≪ Previous: How to calculate the nearest distance from a point to a line?

Dear Stata users. I need some assistance. I have data where the outcome variable is stunting (dummy from z-scores) and the independent variables are years of insurance (ranging from 0 to 11) and others. I am trying to estimate what is the effect of insurance on stunting. I have used the transformation of Mullahy (1998) Count data incorporating Terza's 2SRI inclusion. Question: I am not sure if I am doing the right thing. I have read severally that Count data models are applicable where the outcome variable is the non-negative integer (as opposed to one like a dummy in me example). I am doing the right thing to model my data in this manner? what are the alternative methods i can use for robustness? Thank you

↧