Quantcast
Channel: Statalist
Viewing all 65096 articles
Browse latest View live

Error: numlist has too many elements

$
0
0
Hello,

I am trying to run an ROC analysis with 43,905 observations.

My code is:

dtroc compend sihi, detail graph using(si_all)

The output I get is:

---------------------------------------------------------------------
Valid cases (total): 43905
Area Under Curve: AUC = 0.58187 (95% CI: 0.57724 to 0.58649) (Exact)
---------------------------------------------------------------------

Likelihood Predictive Values
Ratios for SAMPLE Prevalence
Youden ------------- ---------------------
CUT-OFF Se(%) Sp(%) Eff(%) J(%) LR+ 1/LR- PV+(%) PV-(%)
---------- ------ ------ ------ ------ ------ ------ --------- ---------invalid numlist has too many elements


Is there a work around for this?

Thanks!
Elissa Butler

Conditional coloring on a histogram

$
0
0
Hi,

I would like to have a histogram when the coloring is, for example, red if the values are higher than a constant C. I found an old answer to the same question with a solution using frequencies:

https://www.stata.com/statalist/arch.../msg00922.html

Any pointer to do it using fraction instead of freq?

Thanks.

estadd scalar not appearing in output table

$
0
0
I am trying to create a regression output table with estout or esttab. I am running three regressions and after each regression I am running a post-regression test and storing the p-value. I would like to add these 3 p-values to the combined regression output table. I'm trying to use estadd for this. Here's an example of what I'm trying to do:

Code:
eststo clear
sysuse auto, clear
eststo f1:  reg price mpg if foreign==1
eststo f0:  reg price mpg if foreign==0
suest f1 f0
test [f1_mean]mpg=[f0_mean]mpg
estadd scalar r(p)
esttab, stats(p_diff)
Version information:
(Stata MP 14.2. The commands are SSC installed.)

Code:
. which estadd
/Users/SSB/Library/Application Support/Stata/ado/plus/e/estadd.ado
*! version 2.3.5  05feb2016  Ben Jann

. which estout
/Users/SSB/Library/Application Support/Stata/ado/plus/e/estout.ado
*! version 3.21  19aug2016  Ben Jann

. which esttab
/Users/SSB/Library/Application Support/Stata/ado/plus/e/esttab.ado
*! version 2.0.9  06feb2016  Ben Jann
*! wrapper for estout
I recently did -- ssc install estout, replace -- to get the latest version but that did not to help fix estadd scalar.

Code:
. ssc install estout, replace
checking estout consistency and verifying not already installed...

the following files will be replaced:
    /Users/SSB/Library/Application Support/Stata/ado/plus/e/estadd.ado
    /Users/SSB/Library/Application Support/Stata/ado/plus/e/estadd.hlp
    /Users/SSB/Library/Application Support/Stata/ado/plus/e/estout.ado
    /Users/SSB/Library/Application Support/Stata/ado/plus/e/estout.hlp
    /Users/SSB/Library/Application Support/Stata/ado/plus/e/eststo.hlp
    /Users/SSB/Library/Application Support/Stata/ado/plus/e/estpost.ado
    /Users/SSB/Library/Application Support/Stata/ado/plus/e/estpost.hlp
    /Users/SSB/Library/Application Support/Stata/ado/plus/e/esttab.ado
    /Users/SSB/Library/Application Support/Stata/ado/plus/e/esttab.hlp

installing into /Users/SSB/Library/Application Support/Stata/ado/plus/...
installation complete.
I'll really appreciate some advice! Also, if you think I'm better off using outreg2 for my purposes, please let me know.

Thanks,
Saika

Multilevel Model with Individual fixed effects

$
0
0
Hi Statalisters,

I have a panel dataset with around 30,000 individuals observed for four years all over the US. My dependent variable is a binary variable that depends on individual characteristics and state characteristics. I want to isolate the variance in my dependent variable across states while controlling for individual fixed effects. My approach has been to fit a multilevel model using the -mixed- command and -predict- to obtain the BLUP at the state level according to the following, where the individual identifier is ID and the state identifier is state:

Code:
mixed dependent_variable ID  || state:
predict  blup_state , reffects
My first question is whether this is actually the correct specification for the above stated purpose or whether I have to model the individual fixed effects also as random variables to obtain the correct variance decomposition.

Secondly, since I have worked with a different version of stata before, which didn't allow me to specify a model using the -mixed- command with such a high number of categorical variables to control for, I have partialed out the individual fixed effects manually and used the -mixed- command with the predicted residuals according to:

Code:
areg dependent_variable, absorb(ID)
predict dependent_variable_resid, resid
 
mixed dependent_variable_resid  || state:
predict  blup_resid_state, reffects
However, while I expected the standard errors to be incorrect I was under the assumption that the predicted results should be the same when compared with the results of the first specified model. But it turns out they are significantly different. Hence, my second question would be whether I missed something obvious here or misspecified the model.

Comments are greatly appreciated.

Converting a variable to a quarterly data

$
0
0
Hey there,

I imported data which was formatted like Q12007 and then reshaped from wide to long, eliminated the "Q". Now I want to convert the variable called quarter into a useful quarterly data that can be merged with another database.

Any idea how to do that? I tried to create a new date variable based on "quarter" but it was not working.

Here is an excerpt from the data (some data for Q is missing but that does not relate to the question).


Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input str61 Description str7 ID_S str9 Datatypes float quarter str18 Q
"DR HOENLE AG - NET PROCEEDS FROM SALE/ISSUE C" "D13410M" "WC04251A" 12007 ""    
"DR HOENLE AG - EARNINGS BEF INTEREST & TAXES"  "D13410M" "WC18191A" 12013 "1490"
"DR HOENLE AG - NET PROCEEDS FROM SALE/ISSUE C" "D13410M" "WC04251A" 22014 ""    
"DR HOENLE AG - RESEARCH & DEVELOPMENT"         "D13410M" "WC01201A" 12015 "236" 
"DR HOENLE AG - PROPERTY, PLANT & EQUIP - NET"  "D13410M" "WC02501"  22010 "6027"
"DR HOENLE AG - PROPERTY, PLANT & EQUIP - NET"  "D13410M" "WC02501"  12009 "6210"
end

panel data xtabs first and last occurrence

$
0
0
Hello,

I am new to the Stata community and was hoping for some help with panel data analysis. I have health care data with 'clientid' as the panel variable. However, a client may have multiple visits on the same day with different or missing data recorded on the outcome variable (surveyid provides a unique identifier for observations on the same day). Here is a sample of my dataset:

clientid date medfreq severity surveyid
1 01/02/18 1 4 12345
1 02/21/18 4 12789
1 03/08/18 12890
1 06/15/18 3 2 12990
2 01/02/18 4 12681
2 01/02/18 2 3 12688
2 03/08/18 1 3 13450
2 06/15/18 3 3 13560

I have declared my dataset as panel data without indicating the time variable since there are repeated dates, which I want to preserve. Is this the correct approach?

I want to identify the first and last occurrences that are ≥3 months apart with data recorded on "medfreq." I would then like to run a xtab to see how clients changed over time from their first to last occurrence. I didn't find any literature on this specific issue but would be grateful for any suggested code or if someone can kindly refer me to an existing reference.

Thank you,
Samira

Time fixed effects - once again

$
0
0
Array Array Hi guys,

writing my master thesis I am having lots of issues implementing time effects (standard in previous literature).

Parameters:

5900 Observations
Group of three countries


my do-file so far:

import excel "path" firstrow

format CDS_Financials %12.0g
format CDS_Sovereign %12.0g

egen countries = group(Country)

xtset countries Date, daily

xtreg Relative_Repo CDS_Sovereign CDS_Financials OIS i.Date, fe robust (dealing with high cross-correlation)


When i try to execute xtreg w/e i.Date everything works perfectly fine, as soon as I add it STATA asks me to expand set matsize 3074
and my programme (STATA 13) crashes.
I am dealing with high cross-sectional correlation, so I really need to control for that and thought GLS was a good idea.

Hope you can help me!

Best regards,

Matthias

Merging multiple cross tabulations that use a common grouped variable

$
0
0
Greetings. I'm trying to merge 4-5 cross tabulations, that use a common group variable as one of the 2 variables in the cross-tabs, into one table. Additional context -assume the framework is variable "A" with 3 categories forming the columns, while 4-5 other variables, also with 3-4 subcategories, form rows. Can this be created by a commands similar to "tab2," or is this better addressed via command series associated with program such as "tabout"?

Line continuation in the shell command

$
0
0
I am looking for a way to continue a shell escape on another line. Here is an example of what happens when I use /// to continue the line:

. ! echo asdf ///
asdf ///


As you can see, the /// seems to be taken as part of the shell command. I also tried /*<cr>*/. and note that -winexec- and -shell- behave in the same way. There is a space before the /// and none after, so that isn't the problem. Interestingly enough, the text of the command is scanned for macros, and macro substitution does take place. I am working with Stata 15 in Windows and Linux.

.

Static or dynamic panel data regression, test for serial correlation

$
0
0
Dear Statalist,

I am running a fixed effect model (with clustering at firm level) to model investment behaviour of firms in response to their past performance and past industry performance and I am unsure of whether I should include a lagged dependent variable in the regression (i.e. investment at t-1).

My supervisor now told me to start by reporting the results from the static regressions and potentially move onward to a dynamic regression.

I read in Cameron and Miller (2015) that the best rationale for deciding between a static vs a dynamic model is to run a test of serial correlation. They refer to Inoue and Solon 2006, a Portmanteau test .

Now, I already ran a xtserial test and it rejected the H0 of no serial correlation. Do I have to run an additional command like xtistest? Or would it essentially do the same work as xtserial?

Secondly, does clustering at firm level correct for serial correlation? I thought it does, but it would help me to get some confirmation here.

Many thanks for any advice on this!
Katharina

Need help to interpret the interaction term in longitudinal data

$
0
0
Hi statisticians,

I am new to longitudinal data analysis, and currently learning by myself to be able to carry out hypothesis testing.

I have a short longitudinal data in terms of limited repeated measures (only baseline and time-1 post baseline are available). Also there is a group indicator ("studygroup") to suggest control versus intervention group. The goal of analysis is to evaluate if the trend of outcome (from baseline to time-1) differs between groups (control vs. intervention).

Variables:
  • outcome: meanscore_sds (calculated based on a scale, ranging from 1-5)
  • time variable: time (0: baseline; 1: time-1)
  • group indicator: studygroup (0: control; 1: intervention)
  • sex: (0: boy; 1: girl)
  • id variable: surid
Code:
global restrict = "t1dropcase == 0 & t2dropcase == 0" // limit to eligible analytical sample 
mixed meanscore_sds time##studygroup if $restrict || surid: , residual(uns, t(time)) var ml
Results from above model:
Code:
                                                Wald chi2(3)      =      17.65
Log likelihood = -6160.3764                     Prob > chi2       =     0.0005

---------------------------------------------------------------------------------
  meanscore_sds |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
----------------+----------------------------------------------------------------
         2.time |   .0660626   .0308159     2.14   0.032     .0056645    .1264606
                |
     studygroup |
  Intervention  |   .0048707     .03559     0.14   0.891    -.0648845    .0746259
                |
time#studygroup |
2#Intervention  |   .0393069   .0432313     0.91   0.363     -.045425    .1240387
                |
          _cons |   4.276469   .0253691   168.57   0.000     4.226747    4.326191
---------------------------------------------------------------------------------

------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
surid: Identity              |
                  var(_cons) |   .0984436   1.990941      6.00e-19    1.61e+16
-----------------------------+------------------------------------------------
Residual: Unstructured       |
                     var(e1) |    .682875   1.991058      .0022516    207.1039
                     var(e2) |   .5772798   1.991008      .0006693     497.893
                  cov(e1,e2) |   .0536583   1.990978     -3.848587    3.955904
------------------------------------------------------------------------------
LR test vs. linear model: chi2(3) = 123.58                Prob > chi2 = 0.0000
My interpretations:
  • coefficient for "time": among control group, outcome change at time1 compared to baseline
  • coefficient for "studygroup": outcome in intervention compared to control control at baseline
  • but I am not sure how to interpret the coefficient for interaction term. Is it the difference of baseline-time1 trend between two groups? Or is it the difference of outcome at time1 between intervention and control?
Thank you so much!!

Best,
Mengmeng

Determining the P value for trends of continuous and categorical variables over the years.

$
0
0
Dear All,

I would really be grateful if you could help me with this as I have spent more than a month figuring out the correct way to do this but still am confused.

I’m looking at the trends in the individual comorbidities, death, length of stay of a particular diagnosis over the years.

I found out that the trends in proportions of comorbidities over years is by “ptrend”. But the way the data should be entered is very specific for this command. The problem is I can’t manually enter the data that is in millions. So I try to do it in 2 following ways: For example, for CAD comorbidity: I calculate individual proportion of CAD each year (n1) and then remaining (100-n1) would be n2 and then put this data in a separate excel to be analyzed by STATA for trend over the years by command “ptrend n1 n2 Year”. Other way is- I create a dummy variable of “n2=1 if n1==0” in the existing data and again ran the analysis with the command “ptrend n1 n2 Year”. The problem is, I’m getting a different P value but these methods. So I’m not sure if I’m doing either of them correctly or not.

One of my colleagues said “nptrend” and also be used to assess the trends of categorical variable over the years. But my understanding was in “nptrend” is for looking at the trends in the continuous variables only (for example, length of stay in days). Is that correct?

Plus, my understanding is I have to expand the weight variable before running either of these commands. Is that correct?

I have read all the blogs and online questions available on this topic of trend analysis in STATA but everything lead to so much confusion. I just want to use the “ptrend” and the “nptrend” commands correctly in order to get the correct p values of trends. In summary, can I just use the “nptrend” or “vwls” command for both categorical (0/1) variables and continuous variables, after expanding the weights? Please guide.

What observations are at the conditional quantile that STATA is estimating

$
0
0
Hello,

I am trying to figure out whether there is a procedure that would provide me with an indicator of which observations are being used for a certain quantile estimation by the qreg command.


For example:


Code:
 input var1 var2
var1 var2
1 0
2 0
3 1
4 1
5 1
1 0
2 1
3 1
4 0
5 0
 end
 clear

 qreg var1 var2, q(.5)

**Some method to identify median values and generate indicator q50 variable below
var1 var2 q50
1 0
2 0
3 1 1
4 1
5 1
1 0
2 1
3 1 1
4 0
5 0
The tricky part is when the qreg command is run with covariates. Then I think the estimation is at q(.5) at each of the different strata of the covariates (e.g. q(.5) of var1 for females and males separately).

So how do I find the q(.5) for each covariate strata that is being used on the command?

e(sample) rightly tells me the whole sample is being used but that is not the answer I'm looking for...


Thanks in advance!

views Creating box plots of the gap between two groups by deciles

$
0
0
Hello and thanks for taking your time to read an answer my question,

I am working with Stata and I have math grades for two different groups: A and B. I want to see the gap that exists between both groups in each decile. In addition I want to do a box plot of this gap for each decile (I want to have 10 box plots, one for each decile which shows the gap between group grades).

What I first did was to compute the deciles using xtile for both groups:

xtile decileA= mat if group==1, nq(10)

xtile decileB= mat if group==0, nq(10)


However, my observations of group A and B do not have the same number of observations nor distribution. I thought of computing quantiles for each decile and group and subtracting them to get the difference in each decile at each quartile to create the boxplot. However I do not know how to proceed afterwards to create the graph and given that I hace a different number of observations in each group decile I do not know if it is correct to proceed this way.

Now, if I try to use the pctile option and compute the difference at each decile I loose all the variance in the data inside each decile. I only get median differences and not all the quantiles I want.

Ex:

pctile decileA= mat if group==1, nq(10)

pctile decileB= mat if group==0, nq(10)

gen qdiff= decileA- decileB if _n<10

gen qtau=_n/10 if _n<10

graph box qdiff, over(qtau)

I want to know if there is a way to do the graph I am intending to and if there is I would really appreciate your help.

Thanks, Karla

Linear regression

$
0
0
I have squared root the dependent variable to deal with the non-normally distributed residuals. The dependent variable is measured in centimetres, so the values for the dependent variable are greater than 0. All the independent variables are dummy variables. My sample size is 1000. After running the linear regression and estimating the margins, I transformed the margins back to the original values. I have used this command:

generate generate sqrtindep1=sqrt(indep1)
repress dependent i.sqrtindep1 i.indep2 i.indep3 i.indep4
margins sqrtindep1, expression(predict(xb)^2)"

I have read also instead of transforming the dependent variable I can use glm with link power and then estimate margins. However, the estimated margins from glm are different from the liner regression. Any advice why the margins are different and which model is more appropriate?

glm dependent i.indep1 i.indep2 i.indep3 i.indep4 , link(power 0.5)
margins indep1

Reghdfe Help

$
0
0
Hi guys, i'm trying to run a muilti-fixed effects regression using reghdfe
when i type my code:

reghdfe log_tobin DiD1 $xlist if fisc_year>=2014 & fisc_year<=2017, a( co_code fisc_year sic_code) vce(robust)

I get an error saying "class FixedEffects Undefined"

How would i go about solving this issue?

chow test for mixed models

$
0
0
Hi,

I have a question about the chow test for comparing coefficients in mixed models.

On this Stata FAQ pages, it talks about the chow test:

"You can include the dummy variables in a regression of the full model and then use the test command on those dummies. You could also run each of the models and then write down the appropriate numbers and calculate the statistic by hand—you also have access to functions to get appropriate p-values."

I am running two growth-curve models by males and females separately using stata mixed procedure. I then use chow test to see if the coefficients for the two mixed models are significantly different by males and females. Can I do the following:

My code: g2 is my gender variable with g2: 1=female; 0=male

Code:
mixed cesd3w i.w1noprarg_2c##i.g2 c.ctage1##i.g2 c.ctage1#c.ctage1##i.g2  ///
                       c.ctage1#c.ctage1#c.ctage1##i.g2 ///
                       i.w1noprarg_2c#c.ctage1##i.g2  i.w1noprarg_2c#c.ctage1#c.ctage1##i.g2 ///
                      b1.w1raceth##i.g2 b2.w1predu_4c##i.g2 b1.w1famst_4c##i.g2 b2.w34phyabuse3##i.g2 ///
                      b2.w34sexabuse3##i.g2 i.w1sleeprblm##i.g2 c.w1234si_tt##i.g2 ///
                       if `f'==1 [pweight=w1wt_fmch3] || aid: ctage1, pweight(schwt1)  ///
                       pwscale(size) nolog cov(un) mle variance

contrast g2 w1noprarg_2c#g2 g2#c.ctage1  g2#c.ctage1#c.ctage1 g2#c.ctage1#c.ctage1#c.ctage1 ///
               w1noprarg_2c#g2#c.ctage1 w1noprarg_2c#g2#c.ctage1#c.ctage1, overall
Thanks,

Alice

Non-linear graphs comparison

$
0
0
Good morning or evening, dear community,
I have a struggle with how to better approach a comparison of two non-linear graphs based on logistic model. The first logistic model has ordinary X variable, while the second has logged X variable (I have used log(X.var + 1) as log(0) is not defined).
I would really appreciate if you would help me to answer questions I have for my research process:
How would you describe non-linear relationship in the first graph and the second graph, without relying of statistical output?

Kind regards,
John G.

multiple fixed effect wiith areg

$
0
0
hi.
i run this code, but i faced with an error.
please help me.
xi: areg inv RE_value index_state yr*, a(gvkey) cl(id)
error: variable yr* not found
although i run it yesterday, now error is created.
note: regression must include firm fixed effect and year fixed effect .
note2: areg must be used.
note: this regression like above with industry and state fixed effect is run without error.
xi: areg REAL_ESTATE0 qsset2-qsset5 qqroa2-qqroa5 qqage1 st1* st2* st3* st4* st5* st6* st7* st8* st9* if year==1993, a(sic2)

Setting e(sample) to compute boostrap standard errors

$
0
0
Hello,

I am trying to obtain direct and indirect effects for multiple mediators with multiply imputed data using the inverse odds weighting approach. I’ve been working with the following code (from Sheikh, M. A., Abelsen, B., & Olsen, J. A. (2017). Education and health and well-being: direct and indirect effects with multiple mediators and interactions with multiple imputed data in Stata. Journal of Epidemiology and Community Health, 71(11), 1037-1045. doi:10.1136/jech-2016-208671):

Code:
program IOWMI , rclass
 
capture drop linpred predprob inverseodds wt_iow

* Fit a logistic regression model for IV (0, 1) conditional on the mediators and confounding variables.
mi estimate, saving(miest): logit IV M1 M2 M3 M4 confounding_variables

*Calculate linear prediction for each observation and use that to calculate predicted probabilities and inverse odds;
mi predict linpred using miest, xb
mi passive: gen predprob = exp(linpred)/(1+exp(linpred))
mi passive: gen inverseodds = ((1-predprob)/predprob)

*Calculate inverse odds weights, assign the IOW of each observation in the unexposed group (IV= 0) equal to 1.
mi passive: gen wt_iow = 1 if IV==0

* Compute an IOW by taking the inverse of the predicted log odds for each observation in the exposed group (IV=1).
mi passive: replace wt_iow = inverseodds if IV==1
 
*Estimate the total effect of IV using a generalized linear model (family=Poisson) of the regression of the outcome on IV and confounding variables, with link=log function;
mi estimate, saving(miest1) eform post: glm outcome IV confounding_variables , fam(poisson) link(log) vce(robust)
matrix bb_total= e(b_mi)
scalar b_total=(bb_total[1,1])
return scalar b_total=bb_total[1,1]
 
* Estimate the natural direct effect of IV via weighted generalized linear model *(family=Poisson) of the regression of the outcome on IV and confounding factors, with *link=log function and the weights obtained earlier
mi estimate, saving(miest2) eform post: glm outcome IV confounding_variables [pweight=wt_iow], fam(poisson) link(log) vce(robust)
matrix bb_direct = e(b_mi)
scalar b_direct=(bb_direct[1,1])
return scalar b_direct=bb_direct[1,1]
 
* Calculate the natural indirect effects of IV on the outcome via the proposed mediators by subtracting the direct effects from the total effects as;
return scalar b_indirect = b_total-b_direct
end
 
*Estimate 95% confidence intervals with bootstrapping
bootstrap exp(r(b_indirect)) exp(r(b_direct)) exp(r(b_total)), seed(12345) reps(100): IOWMI
estat bootstrap, all
Every time I run the code, I get the following warning:

Warning: Because IOWMI is not an estimation command or does not set e(sample), bootstrap has no way to determine which observations are used in calculating the statistics and so assumes that all observations are used. This means that no observations will be excluded from the resampling because of missing values or other reasons.

If the assumption is not true, press Break, save the data, and drop the observations that are to be excluded. Be sure that the dataset in memory contains only the relevant data.
And the following error code:

insufficient observations to compute bootstrap standard errors no results will be saved
r(2000);
I’ve tried including esample(total) and esample(direct) in the respective estimation codes to save the estimation samples e.g.

Code:
* Estimate the natural direct effect of IV via weighted generalized linear model *(family=Poisson) of the
regression of the outcome on IV and confounding factors, with *link=log function and the weights
obtained earlier
mi estimate, saving(miest2) eform  esample(direct) post: glm outcome IV confounding_variables [pweight=wt_iow], fam(poisson) link(log) vce(robust)
matrix bb_direct = e(b_mi)
scalar b_direct=(bb_direct[1,1])
return scalar b_direct=bb_direct[1,1]
However, I still get the same warning and error code.

When I check to see how the e(sample) is set using
Code:
. estimates esample
I get the following message:
e(sample) not set (0 assumed)
I know that with regular estimation you can set e(sample) using
Code:
. estimates esample: exp
But I don’t know how to adapt it for this situation. I’d appreciate any help you can provide.

Thanks,
Wumi
Viewing all 65096 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>