Quantcast
Channel: Statalist
Viewing all 65501 articles
Browse latest View live

Compare correlation levels

$
0
0
Hi,

I am working with a panel data with two periods and would like to test the correlation levels between a set of variables, by year:

correlate var1 var2 var3 if year==0
correlate var1 var2 var3 if year==1

I would like to know if there is a way to test if there is a statistical difference in correlation between variables by year (a t-test of correlation if you will).

Thank you

Residuals Normality in Fixed Effect Regression

$
0
0
Dears,

I am dealing with a fixed effect regression, which yelds 378 observations. Residuals are not normally distributed.
Is that an issue for hypothesis testing or my sample size is "large enough" so that the central limit theorem anyway ensures correct inference?

Also useful references on the topic would be greatly appreciated.

Many thanks









Interpreting main effects with interaction term of continuous variables

$
0
0
Dear Statalisters,

I am working with an unbalanced panel data (385 observations, T=21, N=21). I am trying to estimate a fixed effects IV regression:
Code:
 xi: xtivreg2 y var1 var2 var4 var5 i.year (var3 var2_var3 = z var2_z), fe gmm cluster(id) partial(i.year)
var1, var2, var4, var5 are exogenous independent variables, var3 is endogenous and I instrument it using z. var2_var3 = var2*var3 and is instrumented using var3_z = var2*z. (-xtivreg2 is not allowing me to use -fvarlist- so I am constructing interactions manually).

Given that I have interaction terms in my specification, the coefficient for main effects, e.g. for var2 alone, is interpreted as the marginal impact of var1 conditional on var3=0. However, var3=0 is not a sensible value in my context. Is there a way for me to calculate the marginal impact of var1 if, say var3 is equal to its median or mean value instead? I was exploring the possibility of estimating the above xtivreg2 command on data that is centred on means (according to https://www3.nd.edu/~rwilliam/stats2/l53.pdf).
Code:
foreach v of var var1 var2 var3 var3 var5 z {
    sum `v', meanonly
    gen c`v' = `v' - r(mean)
}

xi: xtivreg2 y cvar1 cvar2 cvar4 cvar5 i.year (cvar3 cvar2_cvar3 = cz cvar2_cz), fe gmm cluster(id) partial(i.year)
But I am not sure if this is correct. (1) should I demean ALL the variables or only those which are interacted (var2 and var3)? (2) Is there a more straightforward way to interpret standalone main effects in such a model?

Many thanks,
Mihir

Calculating marginal effect for interaction term with missing values

$
0
0
Hi,

I am running a logit model and want to calculate the marginal effect of an interaction term (between a factor and continuous variable). The continuous variable contains many missing values. When I run the logit command with the interaction effect, stata automatically throws out the observations with missing values. However, I want to simply code the missing values in the interaction term as 0 and run the logit model without throwing away the observations. The factor variable indicates whether you are on a list of wealthy individuals and the continuous variable indicates the tax payment you have made if you are on the list. The rationale for coding the missing values as 0 is that I want to measure the marginal effect of increasing the tax payments after account for the marginal effects of other variables.

So far, I have tried two unsatisfactory approaches.

First, I ran the logit model after omitting the missing values. In this case, the coefficient estimates are conditional on the fact that tax==1.

logit o25 i.tax i.tax#c.payment i.finance z_count i.poors i.who i.high

Logistic regression Number of obs = 6,024
LR chi2(6) = 13.88
Prob > chi2 = 0.0310
Log likelihood = -514.25903 Pseudo R2 = 0.0133

-------------------------------------------------------------------------------
o25 | Coef. Std. Err. z P>|z| [95% Conf. Interval]
--------------+----------------------------------------------------------------
1.tax | 0 (omitted)
|
tax#c.payment |
1 | 2.04e-06 1.18e-06 1.74 0.083 -2.65e-07 4.35e-06
|
1.finance | .1642993 .2261713 0.73 0.468 -.2789883 .607587
z_count | -.0093753 .0658552 -0.14 0.887 -.138449 .1196984
1.poors | .0469446 .2100186 0.22 0.823 -.3646842 .4585735
1.who | .0740624 .3320942 0.22 0.824 -.5768304 .7249551
1.high | -.3124992 .2309713 -1.35 0.176 -.7651946 .1401963
_cons | -3.931822 .2292796 -17.15 0.000 -4.381202 -3.482443
-------------------------------------------------------------------------------

Second, I created the interaction term denoted as tp and filled all the missing values as 0. In this case, the coefficient for tp doesn't account for the effect tax has on the interaction term.
Logistic regression Number of obs = 42,993
LR chi2(7) = 176.67
Prob > chi2 = 0.0000
Log likelihood = -1307.4252 Pseudo R2 = 0.0633

------------------------------------------------------------------------------
o25 | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
1.tax | 1.391375 .1456103 9.56 0.000 1.105984 1.676766
tp | 1.22e-06 9.96e-07 1.22 0.221 -7.34e-07 3.17e-06
1.finance | .3004204 .1635547 1.84 0.066 -.0201409 .6209816
z_count | .0523166 .0327925 1.60 0.111 -.0119555 .1165886
1.poors | .4336479 .1442931 3.01 0.003 .1508386 .7164571
1.who | .4105133 .239336 1.72 0.086 -.0585766 .8796032
1.high | -.3264569 .1621547 -2.01 0.044 -.6442743 -.0086394
_cons | -5.662968 .1661581 -34.08 0.000 -5.988632 -5.337304
------------------------------------------------------------------------------

Can you tell me a way so that I can obtain the correct coefficient for the interaction term? Thank you.

Execute(do) (Control+D) issue in program editor

$
0
0
Dear Statalist,

I have noticed that Execute command run all selected lines in different files with click on do button / shortcut key (Control+D). If there are more than one do files are open and selected some lines in every dofiles and by pressing Cotrol+D will run all selected lines instead of running lines from currently selected do file.

Has anyone noticed this? Is this a bug?
Thanks

Command Window: How to paste and execute?

$
0
0
Hello everyone,
I could not find online an answer to my doubt, but I'm pretty sure that it has been solved ages ago: Is there a shortcut to paste an instruction in the command window and run it? So far, I use <ctrl-v> and then hit enter to run the command, but I wonder if there's a shortcut to paste & run. Thank you.


Unable to download a package

$
0
0
Dear Community members,

I am a user of Stata14. I want to conduct bivariate nonparametric analysis for which I wanted to use Stata's bidensity package.

However, the ssc install command gives the result:

Code:
. ssc install bidensity
connection timed out -- see help r(2) for troubleshooting
http://fmwww.bc.edu/repec/bocode/b/ either
  1)  is not a valid URL, or
  2)  could not be contacted, or
  3)  is not a Stata download site (has no stata.toc file).
r(2);
Furthermore, I tried to download the .ado file from http://fmwww.bc.edu/repec/bocode/b/ but it opens an html text script. I do not know how to execute that text, or save it as a .ado file.

Kindly help, if you have suggestions.

Best,
Pranav

Help with interpreting time by categorical interaction term in a logistic regression model to see the effect of change over time.

$
0
0
Hi,

I am looking for some help regarding interpretation of some tables and graph I have attached. Particularly, what the co-efficients are in the first table and margins. Are these log odds or predictive probabilities?

each wave is a different year
wave 1 = 1996
wave 2 = 2006
wave 3 = 2016
city coded as 0 city 1 non city

The final table, is it correct to infer that this means that the wave variable is significant alone while the city variable is not, but when they interact it is significant and explains 10.57% of the variance in the model?

if anyone could give me an example of interpretation it would be greatly appreciated!

Kind Regards, Jodie.


Array

Array

Array

Array


Question about the significance level of Durbin-Watson Test in Stata Manual

$
0
0
On P5 of the Stata Manual of "regress postestimation time series", the interpretation of the result of -estat dwa- is based on 1% significance level in the Durbin-Watson Statistics Table (K=1 and n=22: dL=0.997). However, in some other materials people choose 5% significance level in the Durbin-Watson Statistics Table.

For my case, I estimate trend rate by fitting the equation using OLS, y=a+bx, where x is the time variable and b is the trend rate. The existence of autocorrelation depends on which significance level (1% or 5%) I use.

So I would like to ask how to choose the significance level for Durbin-Watson test.

Thank you very much!

diff package triple D showing values greater than expected

$
0
0
As shown below, I'm trying to model a variable that only has values of either zero or one (tfreq_bin) with the diff package to make a triple difference-in-difference. But for some reason, under the "tfreq~n" column, the command seems to show that the average values for tfreq_bin for all the control and treated groups is greater than one! Am I wrong to think that this column should be showing the average of the tfreq_bin values for each subgroup in the various post- and pre- treatment periods? I'm just confused about why these values are greater than one.


Array

extract weight from string variables

$
0
0

If I have a string variables Like this(1 gram Imported Pure Meth Domestic Australia) , is there a way to extract only the weight component?

applying inlist on variables

$
0
0
hello

In the following example, is there a better way than brute force looping to check whether, for each panel, when b is 1, a is also 1 (the converse need not be true).

Code:
clear
set seed 2333
set obs 2
g panel=_n
expand 10
sort panel
g a = runiformint(0,1)
bysort panel: g b = _n<3

g b_in_a=.
replace b_in_a=1 if panel==1
replace b_in_a=0 if panel==2

Using weights in Quantile regression

$
0
0
Dear all

I have unbalaced firm-year data from 15 countries. Observations from some countries are too many. In one case, almost 30% of the sample belongs to single country. I like to run quantile regression with weights to overcome this problem.But unfortunately qregcommands does not accept aweight option. PLease help how can I apply weights to quantile regression model.

Regards

Comparing mean vs summarize

$
0
0
I've recently noticed a difference in the behavior between mean and summarize. This code demonstrates the difference:

Code:
clear all
set more off
set obs 150

gen var1 = 0
replace var1 = 1 if _n < 111

gen var2 = .
replace var2 = 1 if _n < 74
replace var2 = 0 if _n > 73 & _n < 101

gen var3 = .
replace var3 = 1 if _n < 12
replace var3 = 0 if _n > 11 & _n < 16

gen var4 = 0
replace var4 = 1 if _n > 40

gen var5 = .
replace var5 = 1 if _n > 39
replace var5 = 0 if _n < 41

tab var1
tab var2
tab var3
tab var4
tab var5

sum *

mean *
The most illustrative output is:

Code:
. sum *

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
        var1 |        150    .7333333    .4436981          0          1
        var2 |        100         .73     .446196          0          1
        var3 |         15    .7333333    .4577377          0          1
        var4 |        150    .7333333    .4436981          0          1
        var5 |        150    .7333333    .4436981          0          1

.
. mean *

Mean estimation                   Number of obs   =         15

--------------------------------------------------------------
             |       Mean   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
        var1 |          1          0             .           .
        var2 |          1          0             .           .
        var3 |   .7333333   .1181874      .4798466      .98682
        var4 |          0  (omitted)
        var5 |          0  (omitted)
--------------------------------------------------------------
Output from the mean command for var1 var2 var4 and var5 do not match summarize output for the same vars.

Seeing clearly the reason for the different / conflicting output. When using mean to calculate statistics for multiple variables it limits its calculations to the observations with no missing data. Summarize does not limit its calculations in the same manner.

I checked the help docs to see if the mean command had an option that would not limit its calculations to the observations with no missing data values (I didn't find one), can anyone help me out if there might be an undocumented option for that)? Likewise I didn't find an option (documented) for the summarize command that would impose a limit - anyone on the list know of such an option?

EDIT: Fixed odd chars from copy-paste.

Problem with latent class estat lcmean

$
0
0
Hi all,

I am having a problem getting the command "estat lcmean" to work for a latent class model. I can get it to work for a very simple model, but for the somewhat more complex one I am trying to produce I simply get spinning....spinning. I have left this working for several hours with no output. (Note: there are some comments [but as yet no answer that I see] regarding an error that some users are getting. I have not gotten an error. Just...nothing. No output. It goes until I hit break.) The estat lcgof command works fine, as does estat lcprob (slow, minutes not hours).

Specifics: The model is a bit complex but not ridiculously so: 1555 cases, 21 binary variables. Code is below. I am running STATA/IC 15.1 on a Mac (actually two: have tried both desktop and laptop). I could not get this to work at all but for the nonrtolerance option I found in a separate topic. Not sure why that would be an issue here though.

[/CODE]gsem (v41safetymuslims v41moralsmuslims v41intolermuslims v41politicsmuslims v41commmuslims v41jobsmuslims v41welfaremuslims \\\
v41safetyjews v41moralsjews v41intolerjews v41politicsjews v41commjews v41jobsjews v41welfarejews \\\
v41safetyafams v41moralsafams v41intolerafams v41politicsafams v41commafams v41jobsafams v41welfareafams <- _cons) \\\
[pweight=weight1] if zWhiteR==1, logit startvalues(randomid, draws(8) seed(193693)) em(iter(5)) lclass(A 5) nodvheader nonrtolerance
[/QUOTE]

Thanks for any help/advice!

Visually depict the interactive effect of a survival analysis

$
0
0



Hello there,

I would like to get some assistance please, on how to visually depict the interactive effect in my survival analysis. This is a pretty long one, so kindly bear with me
For example, an estimation of the likelihood of a potential target firm being acquired, contingent on external factors such as industry competition (continuous variable) and level of industry sales (continuous variable) as well as internal factors of potential target firms such as assets (continuous variable) and number of employees (continuous variable), and also some interaction variables such as competition*assets and competition*#employees.

The analysis will involve streg (Parametric survival model):

stset LFE_REFYR, failure(acquired) id(OP_ID)

streg compet partner assets employ compet_assets compet_employ, dist(e) nohr

where 'acquired' is a dichotomous categorical variable indicating if the potential target firm is acquired (1 = acquired; 0 = not acquired)
LFE_REFYR = year
OP_ID is the target firm's unique ID
compet = industry competition
indsales = level of industry sales
assets = assets of potential target firms
employ = number of employees of potential target firms
compet_assets = interaction variable between industry competition and assets
compet_employ = interaction variable between industry competition and number of employees

The plot of the interactive effect will show how the effect of competition on likelihood of acquisition changes with the level of potential target firm assets:
y-axis: probability of being acquired
x-axis: industry competition represented in standard deviation units [mean +/- standard deviation units in 0.5 unit increments]

The plot will show two curves - one for high-asset target firms and the other for low-asset target firms; high-asset target firms are those whose assets are more than the top quartile, while low-asset target firms are those whose assets are less than the top quartile; all other variables will be held at their mean levels.



Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input float(LFE_REFYR OP_ID compet indsales) double assets int employ byte acquired float(compet_assets compet_employ)
2000 12341   .462056 29802180 135560.76120556414 11 0  62636.66  5.082616
2000 12342  .3484394 50860816  289909.5652173914 13 0  101015.9 4.5297117
2000 12343  .2283101 50080384 225511.36363636362 10 0  51486.52  2.283101
2000 12344  .3229231 71408080 1019090.2366863905 15 0  329087.8  4.843847
2000 12345 .40260035 17703958  873875.2642706131  8 0  351822.5  3.220803
2000 12346 .50641453  5886413  638726.6355140187 15 0  323460.4  7.596218
2000 12347  .6789437 11288420  2207725.587144623  7 0 1498921.4  4.752606
2000 12348  .8942683  6824116   93530.0586510264 12 0  83640.96  10.73122
end

Can you kindly help?
Thanks!

pc_simulate error messages

$
0
0
I am trying to use -pc_simulate- from SSC to do some power calculations for a Diff-In-Diff model. I have a balanced panel of 206 CBSAs and 67 periods of weekly data.

I am getting an error message that I am not sure how to interpret:

Code:
. xtdes;

    cbsa:  10420, 10580, ..., 49660                          n =        206
    week:  2016w38, 2016w39, ..., 2017w52                    T =         67
           Delta(week) = 1 week
           Span(week)  = 67 periods
           (cbsa*week uniquely identifies each observation)

Distribution of T_i:   min      5%     25%       50%       75%     95%     max
                        67      67      67        67        67      67      67

     Freq.  Percent    Cum. |  Pattern
 ---------------------------+---------------------------------------------------------------------
      206    100.00  100.00 |  1111111111111111111111111111111111111111111111111111111111111111111
 ---------------------------+---------------------------------------------------------------------
      206    100.00         |  XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

. isid cbsa week;

. pc_simulate ln_requests,
> model(DD) mde(.1)
> i(cbsa) t(week)
> p(0.5) n(206)
> tstart(`=tw(2017w50)')
> alpha(0.05)
> pre(64) post(3)
> nsim(500)
> vce(cluster cbsa)
> absorb(cbsa week)
> outfile(sem_pc.csv);

Error: Simulation dataset must be unique by cbsa and week
r(9);

Keyboard shortcut to launch the*variables manager window from the command window

$
0
0
Hello,
wondering if there is a keyboard shortcut to launch the variables manager window from the command window eg if I want to remind myself of names of variables in the current dataset without having to reach to the mouse to use the menu.

ps. tried to use the db command but could not find the name of the variables manager dialog to evoke.

Thanks,

ardl quarterly lag

$
0
0
I have a quarterly data for 11 years and 2 years in between is missing, so total observation is (11-2)*4=36 quarter.
When I run
Code:
reg y l.y x l.x
I get a regression with 33 observations, but when running
Code:
ardl y x
I get a regression with one lag but 24 observations. It seems ardl drops observations for a whole year instead of one quarter. Does anyone have a solution?

Multinomial logistic model with impupted data - Determine variance inflation factor (VIF) or other measure of collinearity

$
0
0
Dear Statalisters,

I have the following issue I hope you can help me out with.
I'm using Stata 15.1
Multinomial Logistic Model with imputed data
398 regions of 15 years, thus 5,970 observations

I want to get the variance inflation factor (VIF)

mi estimate: mlogit CLUBS LFPR EMPL_AQ EAST URBAN GDPCIN GDPDENSA, base(1)
estat vif

However estat vif is not valid after mlogit, because it is not linear.


The second option I tried was with the ado - colling pkg
Which does not require regression results to determine the VIF (amongst other diagnostics)

collin LFPR EMPL_AQ EAST URBAN GDPCIN GDPDENSA

The problem I encountered here, was the number of observations increased to 68,058, which seems incorrect to me.
The collin doesn't work in combination with mi estimate and mi estimate, cmdok.

Is it possible to the VIF with multiple imputed variables with the correct number of observations or can I use the results from the estimated VIF with the large number of observations?

Thanks in advance
Viewing all 65501 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>