Quantcast
Channel: Statalist
Viewing all 65124 articles
Browse latest View live

Xtfisher vs xtunitroot fisher

$
0
0
Hello

Can someone tell me the difference between these two Unit Root Tests (xtfisher and xtunit root fisher), or, at least, can someone confirm me that xtfisher is a second generation URT?

mimrgns: inconsistent estimation sample

$
0
0
Greetings-

I am trying to use the SSC command mimrgns to compute predicted probabilities for eprobit models. The models are ERMs using endogenous treatment effects, if that matters, and are estimated using:

Code:
mi estimate, saving(filename) esample(sampname) cmdok: svy: eprobit ...
I then use:

Code:
mimrgns using filename, esample(sampname) dydx(*) predict(pr)
This works for some of the models, but in several cases I get the following error:

inconsistent estimation sample levels 0 and 1 of factor VARNAME
an error occurred when mi estimate executed mimrgns_estimate on m=2

r(459);

Any advice will be most appreciated. Thanks.

new variable based on levels of 2 other variables

$
0
0
Hi,

I have the following 2 variables :


1. tab pdlast

pdlast | Freq. Percent Cum.
------------+-----------------------------------
1 | 28 1.02 1.02
1.5 | 9 0.33 1.35
2 | 252 9.20 10.55
2.5 | 126 4.60 15.15
3 | 1,263 46.09 61.24
3.5 | 231 8.43 69.67
4 | 429 15.66 85.33
4.5 | 57 2.08 87.41
5 | 170 6.20 93.61
5.5 | 22 0.80 94.42
6 | 111 4.05 98.47
6.5 | 1 0.04 98.50
7 | 23 0.84 99.34
8 | 7 0.26 99.60
9 | 10 0.36 99.96
12 | 1 0.04 100.00
------------+-----------------------------------



2. bop_deepest

5 | Freq. Percent Cum.
------------+-----------------------------------
0 | 351 87.31 87.31
1 | 51 12.69 100.00

------------+-----------------------------------

I would like to create a new variable with 2 levels and with the following conditions:

0 if pdlast is less than 5 and bop_deepest is 0 or 1 OR if pdlast is 5 or 5.5 and bop_deepest ==0

1
if pdlast is 5 or 5.5 and bop_deepest ==1 OR if
pdlast is 6 or higher and bop_deepest is 0 or 1

Any help would be appreciated.

Thank you,

Nikos

Which second generation unit root test should I use for unbalanced panels?

$
0
0
I have tried cips and pescadf but because my data has some gaps, i can't perform them correctly. I can use xtfisher with no problems, but since I have cross section dependence I don't know if the fisher test is adequate.
If someone can help me with this I would really appreciate, I've been struggling for hours.

Trouble generating random, *large* correlation matrices

$
0
0
Hi all,

I've written an .ado file to randomly generate a correlation matrix, in order to draw correlated vectors using drawnorm. (I looked for existing code doing this, but couldn't find anything.) I realize that for just a few vectors, it's easy to simply write in a made-up correlation matrix. But I would like to draw about 100 vectors, each standard normals, and all correlated with one another in a random fashion. It turns out that this task is much more difficult than I thought.

I have posted the code below, first for the code that uses my randcorr function, and then for randcorr.ado, which is basically just the mata sub-routine. And it works, for small correlation matrices... up to about 40x40. Diagonals are all 1, and correlation coefficients (off-diags) fall between -0.4 and 0.4, drawn from a normal distribution. But here is the problem: certain, randomly drawn correlation matrices will NOT be positive semi-definite, even though they are symmetric, full rank, and have all off-diagonal values between -1 and 1. They look like correlation matrixes, but they are not.

I dealt with this problem by checking the eigenvalues, after the construction of the correlation matrix, within the mata sub-routine of my .ado file. If the smallest eigenvalue is less than 0, I draw a new correlation matrix, so that the final correlation matrix handed back to randcorr will be PSD. That works... but as the matrix gets bigger (my parameter N) or as the vectors become more collinear (my parameter s, set in the code to be 0.1), the mata sub-routine has to draw more and more matrices before it finds a PSD one.

So, if you run the code below with N=10 through N=40, it runs quite quickly. If you run the code w/ N=43 it takes a bit, and N=45 goes forever. I suspect the non-PSD problem relates to the transitiveness of correlations... the 43rd correlation coefficient is surely defined largely by the 1st-42rd correlation coefficients, in a way that I am not modeling.

I'm wondering 2 things. (1) Has somebody else already written such a user command, such that I can give up my own effort on the matter? (2) If not, does anyone have ideas on how I can better generate the random correlation matrix, either in terms of better values (more mathematically correct) or just faster (so that I can more quickly go through all the non-PSD matrices).

Thanks! Code below, and randcorr.ado below that.

Code:
clear
clear all
set more off
set matsize 10000

local N = 10            /* set number of IVs / size of corr matrix */
matrix M = J(1,`N',0)    /* vector of means (all IVs centered at 0) */
randcorr `N' .1            /* correlation matrix generated by randcorr.ado */

global Z z1             /* global for all IVs to be drawn */
forval i = 2/`N' {
    global Z $Z z`i'
}

** Draw the N IVs, according to correlation matrix and mean vector
drawnorm $Z , n(1000) corr(corr) means(M)

Code:
program randcorr
version 14.2
    args n s
    mata: myfunction(`n',`s')

end

version 14.2
mata:
void myfunction(n,s)
      {
    iok=0
    while (iok==0) {
        corr = diag(J(1,n,1))
        for (j=1; j<=n; j++) {
                for (i=j+1; i<=n; i++) {
                    corr[i,j]= rnormal(1,1,0,s)
                    corr[j,i]= corr[i,j];
                }
        }
    
        rnk = rank(corr)
            eigenv =symeigenvalues(corr)
        imn = minmax(eigenv)[1,1]
        if (rnk ==n & imn>0) {
            iok=1
        }
    }
    st_matrix("corr", corr)
      }
end

Multiple choice questions combination

$
0
0
Hello everyone,

I am currently analysing a survey about the role of the state in Arab countries. I have 3 different questions which all refer to the role of the state. For example the first one is:
"If you have to choose only one, which one of the following statements would you choose as the most essential characteristics of a democracy?
1. Government narrows the gap between the rich and the poor.
2. People choose the government leaders in free and fair election.
3. Government does not waste any public money.
4. People are free to express their political views openly."

The problem is that the other two questions ask exactly the same thing but with different options for answers. What I would like is to combine the three questions in end up with a general ranking of the most important role of the state within the population.
Each individual in the dataset answered the three questions.

Therefore I would like to know if there is a trick or a statistical method to solve this ?

Best

rowsum of many columns

$
0
0
Dear All, I found this question here (http://bbs.pinggu.org/thread-6390737-1-1.html). Suppose that I have 4914 variables (say, v1, v2, ..., v4914). I'd like to obtain row sum of every 26 variables (s1=v1+...+v26; s2=v27+...+v52; ... ..., s189=v4889+...+v4914). Since the data set is too large, so there is no representative sample here. Thanks for any suggestions.

Issues getting simulations and replications working!

$
0
0
Hi there, I am trying to run a weibull simulation with replications in Stata. I keep getting errors, either as r(110), u is not defined or as 'sim' being an invalid name. I am trying to estimate lambda, defined as: (1/n*sum(x^k))^(1/k), where n = 20, k = 0.5 and lambdas true value is 1.

My code is attached:

clear all
cd "...."
program weibullsim
tempname sim
postfile `sim' mean var using results, replace
quietly {
forvalues i=1/20000 {
set obs 20
gen u=uniform()
gen weibull=1*(-ln([u]))^2
gen weibullpower=[weibull]^0.5
summarize weibullpower
post `sim' (r(mean)) (r(Var))
}
}
postclose `sim'
end

clear
set seed 123456
weibullsim
use results, clear
gen estimate=mean^2
summarize estimate
end

Thanks!

Panel Data Modeling - Please Help

$
0
0
I'm working with a small unbalanced panel dataset (N=24, T=30, Obs.=590). My goal is a typical one... to test hypothesized relationships between Xs and Y. When using a FE or RE framework, I've noticed that my errors are serially correlated. As a result, I'm thinking about making my model dynamic by adding a lag of the dependent variable as a regressor. I'm aware that doing so creates an endogeneity issue, leading to biased estimates. From what I've read, this bias diminishes as T increases.


Q1) Is T=30 large enough to ignore the endogeneity bias issue? Or do I need to address it via instrumentation?

Q2) More generally when are T and N considered "large/small"?

Q3) Can/should time fixed effects be used in a dynamic model?

Q4) How can I decide whether a single lag of the dependent variable enough? My dataset may be too small to support multiple lags but I'd still like to know.


Perhaps a dynamic model isn't the way to go. Alternatively, I could use first differencing or a time polynomial to alleviate nonstationarity issues.


Q5) How do I know which combination of these modeling techniques is most appropriate (dynamic methods, first-differencing, detrending, inclusion/exclusion of year dummies)?

Q6) At least one message board I found suggested that stationarity isn't a major concern when using panel data. I can't imagine how this could be true, as I think nonstationarity would lead to spurious results. Am I correct or am I missing something?


I've been reading message boards, online lecture notes, and academic papers for days but can't find practical answers to these questions.


If you can address ANY of these questions, I would greatly appreciate it. When doing so, please bear in mind that I'm looking for practical approaches and don't have the ability to understand highly technical/theoretical papers. Thank you!

constant Residual SS in nlsur

$
0
0
Hi,

I am using -nlsur- to estimate a system of equations. However, I find the Residual SS calculated in each iteration stays the same. Is this indicative of a problem in estimation? The following is what I get.
HTML Code:
Calculating NLS estimates...
Iteration 0:  Residual SS =  12537.91
Iteration 1:  Residual SS =  12537.91
Iteration 2:  Residual SS =  12537.91
Iteration 3:  Residual SS =  12537.91
Calculating FGNLS estimates...
Iteration 0:  Scaled RSS =  136761.4
Iteration 1:  Scaled RSS =  136761.4
Iteration 2:  Scaled RSS =  136761.4
Thanks!

Coefplot with interactions

$
0
0
I'm trying to display coefplot for multiple interactions. Here is the regression. When I try saving *mandatedclosure#* I on

poisson total ib1.mandatedclosure#i(73/84).timeg ib1.mandatedclosure#i(86/97).timeg did_other idblock#c.myear i.myear i.idblock, irr vce(cluster idblock)

coefplot, eform xline(1) ci(95) ciopts(recast(rcap)) keep(*timeg*)

This coefplot only shows only the first interactions (mandatedclosure=0*timeg) and not the second of interaction of (mandatedclosure=1*timeg)

Any suggestions would save me brute force of generated a database from coefficients and then using twoway graphs.

Class member function error

$
0
0
Hi all,

I've been receiving an error when I enter the following:
meglm LTL_change i.Capsule c.Age || Village:, family(gaussian) link(identity)

The error pops up as:
_optlist.new: class member function not found
r(4023);

Can anybody explain this error to me? Suggestions? This code used to work and still works in my supervisor's Stata system, but no longer works in mine. My Stata has been recently updated. The dataset has not changed.

Thanks,
Shannon

Generate new group by id and other group in panel data

$
0
0
Dear all,
I know that the title is confusing, but I had no idea how to better describe my issue.
So, I have the following dataset:
id group
1 -1
1 -1
1 -1
2 0
2 0
2 0
3 1
3 1
3 1
3 1
4 -1
4 1
5 -1
5 0
I would like to generate a new variable that would take different values, depending on the values from the group variable, per id.
So, if there are only -1, then the new variable will be 0 (or whatever number) for the respective person (in my example id 1 will have this value). If there are only 0 another number (id 2). If there are only -1 and 0 another number (id 5)... and so on.

Thank you,
Dimi


Reporting SDs of the groups in a Ttest - Commands &quot;estpost&quot; and &quot;esttab&quot;

$
0
0
Hello Everyone,

I want to report my descriptive statistics in a more telling way. I am trying to report a ttest of my main variables for two different samples (Regular and Non-Regular Recipients) in a Latex Table by using the commands estpost and esttab. My regressions are:

estpost ttest variable_1 variable_2 variable_3... variable_n-1 variable_n, by(regular_recipients)

esttab using $dir\Table0_1.tex, noobs unstack se label replace cells("mu_1(fmt(4)) mu_2(fmt(4)) b(star fmt(4))" "se_1(par) se_2(par) se(par)") collabels("Non-Regular Recipients" "Regular Recipients" "Difference") star(* 0.1 ** .05 *** 0.01) booktabs varwidth(35) wrap

The result effectively displays the mean of the n variables for each sample, the difference of the means and the standard error of the difference. However, I would like to report the standard deviations of each group in parentheses, and the "estpost ttest" command does not allow me to do it.

According to this, I want to know whether it is possible to report the standard deviations of each group in parentheses by using the commands "estpost" and "esttab", or it is necessary to use other commands (If so, which ones?)

Thank you very much in advance,


Best


Édgar Hernando Sánchez Cuevas
Research Assistant
Faculty of Economics
Universidad de los Andes
Bogota, Colombia

Specifying Appropriate PSU or Cluster ID in svy

$
0
0
Hi Everyone,

I am analyzing a multi-stage cluster sample and am attempting to appropriately calculate the design effect. The Primary Sampling Units are districts that were selected using PPS with replacement. The population sizes of the districts are large enough that multiple clusters are allocated to a district. Population information below the district is not available and the remaining stages are selected using SRS.
District (dis) Cluster Id (clus) Household Respondent in HH
1 1 1 3
1 1 2 1
1 1 3 4
1 1 4 1
1 2 1 1
1 2 2 3
1 2 3 2
1 2 4 1
2 3 1 1
etc.

When using the svyset command, should the PSU be specified as the district or should the PSU be specified as the cluster id?

For example, should it be:

svyset dis

or

svyset clus


Thank you for the help!




Interpreting stata result

$
0
0
Hello Everyone,
I would like to ask about how to interpreting my estimation result.
I am using panel data with GLS, Fixed Effect.

The model is :
logYit=logCAPit+logLFit+SPit+SSit+STit+e ,

whereas Y, CAP and LF are real number of GDP, Capital and Labor force;
and SP, SS, ST are, respectively, share of laborforce that completed primary, secondary and tertiary school.

the result is as below:
logY= 12.8288 + 0.0377063 logCAP + 0.3179409 logLF + 1.763282 SS + 1.510425 ST

sigma_u | .84891852
sigma_e | .08279592
rho | .99057732

SP is not significant.

Would you please show me how to interpret the result in such " 10% increase of..." and also how the logic works?
Thank you in advance.



Cointegration test for panel data Stata 13

$
0
0
Hi,

I have some questions, I am currently trying to do a resid based cointegration test for panel data in Stata 13. I have understood that its possible to do it manually without Stata 15, as the Kao test would be done with no problem, like the E-G test for time-series. But I need some validation regarding if I am using the right code or not.

Code:
xtreg y x1, fe vce (robust)
predict resid, u
xtunitroot ht resid, trend
Is this right or am I lost?

Best Regards, Filip Franzén

custom error bars with the lgraph package?

$
0
0
Hello Stata Gurus,

I have some panel data and am using the lgraph package to produce plots of the measured means over time

calling....

Code:
lgraph measurement timevar, errortype(sd)
produces a nice graph with standard deviations on the error bars. However, I would like the error bars to be 1/2 a standard deviation on each side of the mean instead.

Is anyone willing to share how to create these error bars instead?

Thanks in advance,

Jonathan Tward

Cross sectional regression commands &amp; Fama MacBeth regression (xtfmb) issues

$
0
0
Hello everyone,

I would like to make cross sectional regressions over 60 months following the Fama MacBeth procedure. However, I can't figure out how to run it correctly, when I enter "xtfmb x y" with x and y as my variables I get a series of "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xx". Is this problem familiar to someone ?

On the other hand do you know how to simply ask STATA to run a cross sectional regression on a particular date ? I mean, if I want to run a cross sectional regression for the month number 12 for example using my entire data set, is it possible ?

Thanks in advance (I am a begginer),

Geolien,

Unable to destring variable: generate and replace unresponsive

$
0
0
Hi,

I'm working on a project and was able to destring some of my variables but not one specific variable. I should first mention that I first replaced "n.a."s in this particular variable with blanks and then tried to convert the rest of the observations, which actually contain numbers, into numerics from strings. I did also try to recode this blanks into '0's but have been unable to because of the destringing issue. When I attempt to destring, it keeps mentioning "dealvalue contains nonnumeric characters; no replace" and dealvalue contains nonnumeric characters; no generate" when I do "destring dealvalue, replace" and "destring dealvalue, generate(dealval)," respectively (devalue is the name of my varlist.

Thanks
Viewing all 65124 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>