Quantcast
Channel: Statalist
Viewing all 65017 articles
Browse latest View live

P-Values and F-statistics changing slightly between iterations of "test"

$
0
0
Hello,

I am using Stata 14 w/ 64M memory. Essentially, I am running a joint test of significance and after running it a few times, have noticed that my p-values and f-statistics are changing slightly iteration to iteration.

For more detail: I'm running a joint regression with a panel of stacked outcomes and then testing coefficients against each other, similar to what's described here. Essentially, I have stacked all my data across all my different outcomes into a single regression, and then created explanatory variables so that every single regression coefficient from all the separate regressions appears in this single regression. I then interact each of the explanatory variables with outcome type dummies, and also include the uninteracted outcome type dummies. This results in a dataset with 300k+ observations.

The code is something like:

Code:
regress y x1 x2 x3 dummies controls interaction terms

test x1 = x2
loc F_x1 = `r(F)'
loc p_x1 = `r(p)'

test x2 = x3
loc F_x2 = `r(F)'
loc p_x3 = `r(p)'


Then I store these local macros into a table using putexcel (v14). However, I noticed that when I ran this 3-4 times, I got slightly different results for my p-values and the F-stats. For example, on one iteration my F was 1.7588 and on another it was 1.7590. On another my p-value was 0.0694, and the second time I ran it it was 0.0692.

I have reviewed my code several times and I'm unsure of why this would be changing. Is there something changing in the way Stata is storing these values (perhaps it's using up so much memory that Stata changes the way it's rounding)? There are some dummies that are being dropped due to collinearity, and I thought maybe the random variables that got dropped were affecting the p's and F's,but after reading about this it doesn't seem like it should change the outcomes. Any other ideas?

Thank you!
Lucia

Matrix Size Adjustment

$
0
0
Dear all,

I have two questions on the Stata Matrix Size.

1. In general, if you set the matrix size = 11000, the Stata will perform using the size regardless of your actual needed matrix size (e.g. you actually only need matrix size 100 to run the regression). Therefore, it will take a much longer time for you to run a simple OLS regression if you set matsize 11000. Am I right on this?

2. If this is the case, is there anyway that Stata will automatically adjust its matrix size according to the need? Going back to the OLS example, if my OLS regression only needs Matrix size 100, but I do not know it and I set the matsize = 11000, is there any way that I will make Stata to adjust automatically according to the real need of the regression (i.e. matsize = 100)?

I am currently working on some regression that require hours for one iteration of the MLE and I figured out that the size of the matrix is one of the reasons. Therefore, I would like to reduce the matrix size to the min level to save MLE iteration, but I do not know the real matrix size needed.

Hope my question is not confusing. I look forward to hearing from you! Thank you very much in advance.

Best regards
Long

cannot understand source of error from coefplot

$
0
0
Hi, I am new to coefplot and have worked through many of the examples. My problem is that I do not see the source of the error in the following:

use anesN188,clear // data set of interval level vars of mean respondent ideology (avg_*) by income level, stacked by policy type and party of candidates (N= 47 states)
reg candid avg_low avg_mid avg_high if policy==1&pid==1
estimates store de
reg candid avg_low avg_mid avg_high if policy==1&pid==0
estimates store re
reg candid avg_low avg_mid avg_high if policy==2 & pid==1
estimates store ds
reg candid avg_low avg_mid avg_high if policy==2 & pid==0
estimates store rs

coefplot (de, label(Democrats)) (re, label(Republicans)), bylabel(Economic Policies) ///
|| ds rs, by(label(Social Policies)) ||, xline(0) drop(_cons)

***THEN I GET THIS ERROR
factor variables and time-series operators not allowed
r(101);

***BUT I DO NOT BELIEVE ANY OF THE VARIABLES ARE FACTOR OR TIME-SERIES.

Common elements between two lists of string variables

$
0
0
Hi everyone,

I am relatively new to Stata, so this is likely a basic question. I searched in Stata help file and the forum history, but couldn't find anything specific to string variables.

I have two datasets, one for quantities and one for prices. They have overlapping countries, but are not exactly the same. Under each country, there are industry data, and the industries are also different in each dataset. The good news is the country variables in both datasets use the same code, say country in quantities data contains: ARG BRA CAN MEX USA, and country in prices data contains: ARG CAN JPN USA

I would like to get a sense of how much overlap there is between the datasets. My thoughts are to start with the quantity dataset, and keep the values if a certain country is also in the prices dataset. Something like this:
Code:
sysuse quantity, clear
local price_country ARG CAN JPN USA 
keep if "the country is in the local price_country"
What is the Stata commend that can achieve the last line "keep if the country is in the local price_country"?

Appreciate your help.

using foreach to run regressions for consecutive time periods

$
0
0
Greetings,

I have a panel dataset (country/years) for years 1956-2014. I'd like to estimate a regression and save the output for each 10-yr period (e.g. starting with 1956-1955, then 1957-1956, and so). The intent being to explore how the model performs across time. My basic code structure is (I'm using version 13.1):

eststo model_56_65: quietly logit depvar var1 var2 var3 if year >= 1956 & year<= 1965
quietly estadd fitstat

Would I be able to use the foreach command to loop the estimations rather than copying the same block of code 50 times?

Apologies if this is an overly elementary question. I'm reading the online help on the foreach command, but I'm still a bit confused with how it works. Any advice would be much appreciated.

Generating quarterly dummy variables

$
0
0
Hi all:

I am trying to generate a set of quarterly dummy variables for a range of dates but can't seem to figure out how to do it. I have the date variable formatted as %tq in one column. I tried tabulate with gen but that didn't work because the 1s did not correspond to the correct year and quarter. I am assuming that using the min(date) as the beg date and incrementing it by _n+1 quarter until the max(date) in the variable would be the way to go but can't figure it out how to do it. I would appreciate is someone can show me the correct procedure to generate the dummies. Thanks.

r(503) conformability error when running margins command after svy: meoprobit

$
0
0
Good Evening Statalist,

I am attempting to run the margins postestimation command after running an meoprobit model using Stata 14 and continue to get this r(503) conformability error. I do not get the error when I do not include the svy statement.

My code looks something like this:

Code:
svyset varpsu, strata(varstr) || dupersid_panel, weight(longwt2) || _n, singleunit(scaled)
svy, subpop(if jobchanged==1 & DID2C==1): meoprobit IND i.dependent i.treatment rd1 rd2 rd3 rd4 rd5 || dupersid_panel:
margins, dydx(dependent) predict(outcome(5))
Here is the output from my margins command:


. margins, dydx(dependent) predict(outcome(5))
conformability error
r(503);


I've already tried omitting the rd5 variable and that does not make a difference, I still get the conformability error. I've also tried omitting all of the rd* variables, and that also did not make a difference. The only thing that seems to work is omitting the svy statement, but I cannot do this in practice for obvious reasons. Any help would be much appreciated.

Thanks!
Ryan

MI Impute Chained Error: Many &quot;Perfect Predictors&quot;

$
0
0
I am encountering an error using the mi impute chained command in Stata 14.1 to impute on a dataset with 1500 observations. I typed the following:
mi impute chained (ologit) guilty age educ income urbanicity (logit) male white black jewish protestant catholic other_christian non_judeochristian republican democrat independent northeast midwest farwest mountain, add(5)
Those are precisely the variables I will use to estimate "guilty."
I get this error:
Performing chained iterations ...
mi impute logit: perfect predictor(s) detected
Variables that perfectly predict an outcome were detected when logit
executed on the observed data. First, specify mi impute's option noisily to
identify the problem covariates. Then either remove perfect predictors from
the model or specify mi impute logit's option augment to perform augmented
regression; see
The issue of perfect prediction during imputation of
categorical data
in [MI] mi impute for details.
error occurred during imputation of guilty income urbanicity republican democrat
independent on m = 1

r(498);
I get the same error if I include any two or more of those six variables. Those are also the only variables with missing data -- so the whole point of the imputation is to have them predict each other. Eliminating all but one would defeat the purpose.

I have tried the augment option, but it takes a very long time to run, and this is a program I will need to run repeatedly, so I would like it to be fairly efficient.

Thank you so much for any advice on how I might fix this problem.

Transpose data to have variable names.

$
0
0
Hi,

I am using Stata 13 on Windows 7. I have a data set that has one row of observations and would like to transpose it so that the variable names could be the first column then the observations the second column.

Thanks.

Bounded Dependent Variable

$
0
0
Hello everyone, I have a question. I am running a panel regression for 29 cities and a period of 7 years. My dependent variable is CGI which measures the level of income segregation strictly having a continuous positive value from 0 to 1. However, after running the regression, I found the constant to be a positive value of 1.85 (greater than the maximum value of CGI), while the variable of over65 (fraction of population > 64 years old) to have a coefficient of -1.27, which is lower than the minimum value of CGI. Should there be specific treatments on dependent variables with such characteristics? I know the latest version of stata have the option of beta and fractional regression but I do not have access to it and I think logistic regression option seems implausible since the dependent variable have a continuous value from 0 to 1. Below I attached the result of the regression,

Code:
xtreg   cgi   gini  emp1 lowskill1  logpop   logmed  own hs25 sarjana25 eighteen over65 i.year,  fe  robust

Fixed-effects (within) regression               Number of obs      =       203
Group variable: id                              Number of groups   =        29

R-sq:  within  = 0.5303                         Obs per group: min =         7
       between = 0.1904                                        avg =       7.0
       overall = 0.0000                                        max =         7

                                                F(16,28)           =     32.74
corr(u_i, Xb)  = -0.6414                        Prob > F           =    0.0000

                                    (Std. Err. adjusted for 29 clusters in id)
------------------------------------------------------------------------------
             |               Robust
          cgi  |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        gini |   .2098422   .1119539     1.87   0.071    -.0194849    .4391694
        emp1 |  -.0026784   .0016391    -1.63   0.113     -.006036    .0006792
   lowskill1 |   .0852168   .0709025     1.20   0.239    -.0600204     .230454
      logpop |  -.0595715   .0732945    -0.81   0.423    -.2097085    .0905656
      logmed |  -.0617125   .0605833    -1.02   0.317    -.1858117    .0623867
         own |   .1888378   .0879009     2.15   0.040      .008781    .3688946
        hs25 |  -.0774792   .2259464    -0.34   0.734    -.5403094     .385351
   sarjana25 |   .5969337   .3611071     1.65   0.109    -.1427607    1.336628
    eighteen |   .0292786   .4995487     0.06   0.954    -.9940006    1.052558
      over65 |    -1.2718   .7602054    -1.67   0.105    -2.829011    .2854097
             |
        year |
       2006  |   .0143375   .0154605     0.93   0.362    -.0173319     .046007
       2007  |  -.0090962   .0205345    -0.44   0.661    -.0511593    .0329668
       2008  |   -.025067   .0349562    -0.72   0.479    -.0966716    .0465375
       2009  |   .0210048   .0308709     0.68   0.502    -.0422314    .0842409
       2010  |   .0274367   .0366644     0.75   0.461    -.0476669    .1025404
       2011  |   .0174045    .039606     0.44   0.664    -.0637247    .0985337
             |
       _cons |   1.849855   1.663708     1.11   0.276    -1.558097    5.257806
-------------+----------------------------------------------------------------
     sigma_u |  .09637064
     sigma_e |  .03855145
         rho |  .86204928   (fraction of variance due to u_i)
------------------------------------------------------------------------------
Thank you!

Can xtscc be used in the case without cross-sectional dependence?

$
0
0
I am manipulating an unbalanced macro panel with N=9 and T=9 (75 observations). The xttest3 shows that there is heteroskedasticity, and xtcsd (Pesaran Test) shows there is not cross-sectional dependence (Pr.=0.5757). So in this case, can I still use xtscc, instead of xtreg, fe vce(robust)?

I know from Hoechle's paper that if there is heteroskedasticity only, then use xtreg, fe robust. But some people suggest that xtscc is a comprehensive command and a recommended choice even if there is not cross-sectional dependence or if we are uncertain about heteroskedasticity and cross-sectional dependence in the model.

Actually, for one of my regressions, xtscc makes more variables significant (the coefficients are exactly the same as those in xtreg, fe robust). For other regressions, xtscc produces basically the same results as xtreg, fe robust (only trivial differences in p value)

Many thanks!

drop the largest observation by id

$
0
0
Hello, my data looks like this

id distance
1 300
1 20
2 500
2 450
3 6
3 780
4 9000
4 30

Each id appears twice because there are two distance values for each id. I would like to drop the larger distance value. For example, for id = 1, I want to keep 20 and drop 300.

I have tried combining the drop, by and sort commands, but so far nothing has worked for me.

Merge variables

$
0
0
I have two variables of this nature
age agem
. 49
. 49
. 48
. 49
16 46
. . 49
18 45
16 46
. 47

I need a command to replace only the missing values of age with the corresponding observation from agem. Please help. Thanks.

Why variables are insignificant

$
0
0
I am trying to identify the factors that most affects 'Access to electricity' using 24 countries over a two year span. Even though this is a short panel data set the article I am replicating used a similar approach.
My results are;
Code:
 xtreg accesstoelectricityofpopulatione loans renew gdp rents edu var24, re

Random-effects GLS regression                   Number of obs      =        48
Group variable: country                         Number of groups   =        24

R-sq:  within  = 0.7520                         Obs per group: min =         2
       between = 0.0001                                        avg =       2.0
       overall = 0.0011                                        max =         2

                                                Wald chi2(6)       =     58.62
corr(u_i, X)   = 0 (assumed)                    Prob > chi2        =    0.0000

------------------------------------------------------------------------------
accesstoel~e |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       loans |   .0287223   .0651532     0.44   0.659    -.0989756    .1564202
       renew |  -.0032136   .1823182    -0.02   0.986    -.3605507    .3541234
         gdp |  -.0009196   .0005703    -1.61   0.107    -.0020374    .0001982
       rents |  -.0590546   .0385032    -1.53   0.125    -.1345195    .0164103
         edu |  -.0051925   .0076635    -0.68   0.498    -.0202127    .0098277
       var24 |   2.309331   .3686467     6.26   0.000     1.586796    3.031865
       _cons |   72.37429    5.85379    12.36   0.000     60.90107     83.8475
-------------+----------------------------------------------------------------
     sigma_u |  28.521862
     sigma_e |  1.1818464
         rho |  .99828596   (fraction of variance due to u_i)
------------------------------------------------------------------------------
My question is not only the variables are insignificant the expected coefficients are signs are generating. Foe example GDP per capita (gdp in model) must be positively correlated with the dependent variable. However it is not true with my case.
Can someone please suggest me any solution?
data goes like this,
time country access loans renew gdp rents edu
1 1 41 3.9 0 1629 2.37 45
2 1 43 4.3 0 1933 1.75 48
1 2 52.2 66 0 2401 4.5 53
2 2 59.6 85 0 2763 3.8 60

STATA v14 and v13 give different results

$
0
0
Running melogit and gsem on STATA v14 provide results but when I re-run the same thing on STATA v13 (home installation) it gives me an error read "initial value not feasible". Is the mechanism behind version 14 and 13 different?

I have read STATA's convergence problem, changed number of iteration, obtain parameter value as new starting value, change integration method, but none helps.

Commands for regression in difference-in-difference design

$
0
0
Hi experts

I've searched, but couldn't find any threads about the actual execution of regression in a difference-in-difference design.

I'm interested in how (X) employee sees their leaders leadershipstyle's effect on (Y) employee sickness absence over a two-year period: before leadership training and after.
So, Perceived leadershipstyle --> sickness absence.

- For X (leadershipstyle) I have three indexes; one for each leadershipstyle, 0-100.
- For Y (sick abscence) I have a variable measured in number of days.
- I have a time-variable measuring 0: before treatment, 1: after treatment.
- I have a treatment-variable with the four groups: one for each of the three leadershipstyle and a control group.
- An id-variable with a unique number for each of the employees.
- Furthermore a range of control variables.

I've changed the dataset for panel data.

Now, how should I proceed using a difference-in-difference design?

Thanks!

Procedure (long-run) error correction model

$
0
0
Hi,

I am currently trying to create my long-run relationship (cointegrating equation) where I mainly look at the relationship between A and B. I have never done an ECM model before, but with the help of others and the internet this is the current procedure I came up with. Can you guys tell me if I am working in the right direction?

To start it is important to know if the variables that I am using are of the same order of integration. Therefore I test the dependent and independent variables that I possible want to include in my long-run relationship.

For example testing of for the order of integration of a control variable age I first look at the number of lags to use for the dfuller test (this case 2 on basis of selection criteria):

The maxlag is set at 36 because I have monthly data (total of 11 years). Verbeeks guide to modern econometrics recommends that with monthly data the max number of lags should be set at least to 36

Code:
varsoc age, maxlag(36)
dfuller age, lag(2)
Conclude by looking at the critical values if there is unit root or not.

Code:
varsoc d.age, maxlag(36)
dfuller d.age, lag(4)
Conclude by looking at the critical values for the order of integration.

I do this for all the variables that I want to potentially include in my long-run regression. If they are all of the same order of integration I start with "making" the long-run relationship.

This is done by simply starting with the simplest regression between the dependent and independent variable and then adding variables and look if they are significant --> if so keep them in.

This results in my case in:
Code:
reg mean_A mean_B age ltv fund_c_deposits i.dummy_year
Then I predict the residual and look if the residual is of I(0).

Code:
predict e, resid
varsoc e, maxlag(36)
Lag selection criteria --> include 5 lags

Code:
dfuller e, lags(5)
Then I look if the critical values of the residuals indicate I(0)

Can you guys tell me if this is anywhere close to the appropriate procedure for estimating the long-run relationship between A and B.

counting the number of categories of a given variable

$
0
0
Hi,

Is there a command that I could use if I want to count how many categories a given variable in my data set is comprised of? Rather than having to manually count them?

Thanks in advance

Using cmp for discrete/continuous estimation

$
0
0
Hi everyone,

I had another thread about this sort of problem, but received good advice about using the cmp command. The context is this. I am estimating a discrete-continuous choice model where individuals choose where to live and, given where they live, how much to work, consume, and "use" their house. Each location has a different amenity (pollution), which also affects their utility.

However, leisure, consumption, and housing are all endogenous, so I also instrument for each of them. cmp is extremely useful to allow for this possibility and still estimate a discrete choice - truly a remarkable command.

However, I got some problems when estimating it that I did not know how to diagnose completely. I am copying a subset of the output (not the estimates) below. But, I will direct attention towards some matrices being ill conditioned and the collinear regressors. However, all these variables work if I'm using reg3 -- there's no reason why they should be ill conditioned or collinear. The only thing I can think of is that it's an extremely tough problem to optimize. I haven't even added the fixed effects in yet...

cmp (lwage_hourly = lleisure lcons_nondur lpoll $aqX $aqstX) (lleisure=$ivweather) (lcons_nondur = $ivcons) (lhprice = lcons_house lcons_nondur lpoll $aqX $aqstX) (lcons_house = $ivhouse) (move = lleisure lcons_nondur lcons_house $aqX $aqstX) (location = lleisure lcons_nondur lcons_house $aqX $aqstX) [w=perwt],indicators($cmp_cont $cmp_cont $cmp_cont $cmp_cont $cmp_cont $cmp_probit $cmp_oprobit) cluster(county)
(sampling weights assumed)

Fitting individual models as starting point for full model fit.
Note: For programming reasons, these initial estimates may deviate from your specification.
For exact fits of each equation alone, run cmp separately on each.

-------------------------------------------------------------------------------

Warning: regressor matrix for lwage_hourly equation appears ill-conditioned. (Condition num
> ber = 1526747.4.)
This might prevent convergence. If it does, and if you have not done so already, you may ne
> ed to remove nearly
collinear regressors to achieve convergence. Or you may need to add a nrtolerance(#) or non
> rtolerance option to the command line.
See cmp tips.


----------------------------------------------------------------------------------

Warning: regressor matrix for lleisure equation appears ill-conditioned. (Condition number
> = 196065.85.)
This might prevent convergence. If it does, and if you have not done so already, you may ne
> ed to remove nearly
collinear regressors to achieve convergence. Or you may need to add a nrtolerance(#) or non
> rtolerance option to the command line.
See cmp tips.


--------------------------------------------------------------------------------------

Warning: regressor matrix for lcons_nondur equation appears ill-conditioned. (Condition num
> ber = 2253.5663.)
This might prevent convergence. If it does, and if you have not done so already, you may ne
> ed to remove nearly
collinear regressors to achieve convergence. Or you may need to add a nrtolerance(#) or non
> rtolerance option to the command line.
See cmp tips.

Source | SS df MS Number of obs =235581188
-------------+------------------------------ F( 34,235581153) = .
Model | 32507335.3 34 956098.096 Prob > F = 0.0000
Residual | 99403557.6235581153 .421950382 R-squared = 0.2464
-------------+------------------------------ Adj R-squared = 0.2464
Total | 131910893235581187 .559938145 Root MSE = .64958


-------------------------------------------------------------------------------

Warning: regressor matrix for lhprice equation appears ill-conditioned. (Condition number =
> 1263661.2.)
This might prevent convergence. If it does, and if you have not done so already, you may ne
> ed to remove nearly
collinear regressors to achieve convergence. Or you may need to add a nrtolerance(#) or non
> rtolerance option to the command line.
See cmp tips.

Source | SS df MS Number of obs =238163538
-------------+------------------------------ F( 1,238163536) =80511.68
Model | 10561.4387 1 10561.4387 Prob > F = 0.0000
Residual | 31242044.7238163536 .131178959 R-squared = 0.0003
-------------+------------------------------ Adj R-squared = 0.0003
Total | 31252606.2238163537 .131223304 Root MSE = .36219

------------------------------------------------------------------------------
lcons_house | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
ltrantime | .0088839 .0000313 283.75 0.000 .0088225 .0089453
_cons | 9.577865 .0001006 9.5e+04 0.000 9.577668 9.578062
------------------------------------------------------------------------------

Iteration 0: log likelihood = -1.279e+08
Iteration 1: log likelihood = -1.004e+08
Iteration 2: log likelihood = -98963302
Iteration 3: log likelihood = -98953638
Iteration 4: log likelihood = -98953634

Probit regression Number of obs = 1947523
LR chi2(34) = 5.79e+07
Prob > chi2 = 0.0000
Log likelihood = -98953634 Pseudo R2 = 0.2263


-------------------------------------------------------------------------------

Warning: regressor matrix for moved equation appears ill-conditioned. (Condition number = 1
> 110405.6.)
This might prevent convergence. If it does, and if you have not done so already, you may ne
> ed to remove nearly
collinear regressors to achieve convergence. Or you may need to add a nrtolerance(#) or non
> rtolerance option to the command line.
See cmp tips.

Iteration 0: log likelihood = -2.679e+08
Iteration 1: log likelihood = -2.405e+08
Iteration 2: log likelihood = -2.392e+08
Iteration 3: log likelihood = -2.392e+08
Iteration 4: log likelihood = -2.392e+08


Note: 14 observations completely determined. Standard errors questionable.

Warning: regressor matrix for _cmp_y7 equation appears ill-conditioned. (Condition number =
> 1110405.6.)
This might prevent convergence. If it does, and if you have not done so already, you may ne
> ed to remove nearly
collinear regressors to achieve convergence. Or you may need to add a nrtolerance(#) or non
> rtolerance option to the command line.
See cmp tips.

Fitting full model.

cmp_lnL(): 3499 halton2() not found
<istmt>: - function returned error
Mata run-time error
Mata run-time error

error (109) &quot;type mismatch&quot; when using nlcom after reg3 (using aidsills)

$
0
0
Hello Statalisters,

I'm using aidsills command to estimate a QUAIDS model (which uses reg3 to estimate a non-linear system). The command works well, but when I try to construct elasticities using the stored parameters using nlcom I get the error 109 "type mismatch".

Has anyone faced a similar problem with reg3? I am using the data and examples from the main article "Estimating almost-ideal demand systems with endogenous regressors" (2015). Here is the code:

webuse food.dta
aidsills w1-w4 , pri(p1-p4) exp(expfd) qua


*Then, I try to test nlcom over the first parameter on e(b)
nlcom _b[w1:gamma_lnp1]

*This is the report

_nl_1: _b[w1:gamma_lnp1]
type mismatch
r(109)



Any ideas would be very helpful!
Viewing all 65017 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>