reg with -if var==1- or -inlist- omits more observations than satisfy the criteria

May 11, 2017, 5:53 pm

≫ Next: Diff-in-diff: Can a pre-treatment placebo test be used to control for the anticipatory effect?

Please ignore, I discovered my error

↧

Diff-in-diff: Can a pre-treatment placebo test be used to control for the anticipatory effect?

May 11, 2017, 6:12 pm

≫ Next: With -bayesmh-, what is the proper syntax for a custom prior density for a factor variable's parameter that is a function of the parameter?

≪ Previous: reg with -if var==1- or -inlist- omits more observations than satisfy the criteria

Hi.

I am examining whether a policy change impacted return on assets. The policy was announced in 2002 but implemented in 2005. Can you examine the anticipatory effect by performing a diff-in-diff using 2002 as the placebo date?

Would including a lead or lag in my baseline regression aid the detection of a potential anticipatory effect?

Many thanks!

↧

With -bayesmh-, what is the proper syntax for a custom prior density for a factor variable's parameter that is a function of the parameter?

May 11, 2017, 8:05 pm

≫ Next: virtual joins

≪ Previous: Diff-in-diff: Can a pre-treatment placebo test be used to control for the anticipatory effect?

With bayesmh, you can create a custom prior density with the density() option of prior().

And that probability density function can be a function of the parameter (see first example below).

And a parameter for a factor variable can have a custom density (see second example below).

But when I try to create a custom density that combines both of these, that is, is a function of the parameter for a factor variable, I get an error message whose complaints I've ruled out (third example below).

Here's the code (I have also attached a do-file containing the complete code for two reproducible examples)

Code:

quietly sysuse auto

generate byte k = mod(_n, 2)

/* Not a problem for the custom density to be a function of a parameter (or its linear combination) */
bayesmh foreign, likelihood(logistic) ///
    prior({foreign:_cons}, density(exp({foreign:_cons}) / (1 + exp({foreign:_cons}))^2)) ///
    nomodelsummary

/* Not a problem for a factor variable to have custom density prior */
bayesmh foreign i.k, likelihood(logistic) ///
    prior({foreign:_cons}, normal(0, 5)) ///
    prior({foreign:i1.k}, density(1)) ///
    nomodelsummary

/* Problem when custom density prior is a function of the factor variable's
   parameter (or its linear combination) */
bayesmh foreign i.k, likelihood(logistic) ///
    prior({foreign:_cons}, normal(0, 5)) ///
    prior({foreign:i1.k}, density(exp({foreign:i1.k}) / (1 + exp({foreign:i1.k}))^2))

exit

and here's the result (I've also attached an SMCL log file that contains the result).

.ÿ/*ÿNotÿaÿproblemÿforÿtheÿcustomÿdensityÿtoÿbeÿaÿfunctionÿofÿaÿparameterÿ(orÿitsÿlinearÿcombi
>ÿnation)ÿ*/
.ÿbayesmhÿforeign,ÿlikelihood(logistic)ÿ///
>ÿÿÿÿÿÿÿÿÿprior({foreign:_cons},ÿdensity(exp({foreign:_cons})ÿ/ÿ(1ÿ+ÿexp({foreign:_cons}))^2))
>ÿÿ///
>ÿÿÿÿÿÿÿÿÿnomodelsummary
ÿÿ
Burn-inÿ...
Simulationÿ...

BayesianÿlogisticÿregressionÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿMCMCÿiterationsÿÿ=ÿÿÿÿÿ12,500
Random-walkÿMetropolis-HastingsÿsamplingÿÿÿÿÿÿÿÿÿBurn-inÿÿÿÿÿÿÿÿÿÿ=ÿÿÿÿÿÿ2,500
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿMCMCÿsampleÿsizeÿ=ÿÿÿÿÿ10,000
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿobsÿÿÿÿ=ÿÿÿÿÿÿÿÿÿ74
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿAcceptanceÿrateÿÿ=ÿÿÿÿÿÿ.4486
Logÿmarginalÿlikelihoodÿ=ÿ-47.010554ÿÿÿÿÿÿÿÿÿÿÿÿÿEfficiencyÿÿÿÿÿÿÿ=ÿÿÿÿÿÿ.2163
ÿ
------------------------------------------------------------------------------
ÿÿÿÿÿÿÿÿÿÿÿÿÿ|ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿEqual-tailed
ÿÿÿÿÿforeignÿ|ÿÿÿÿÿÿMeanÿÿÿStd.ÿDev.ÿÿÿÿÿMCSEÿÿÿÿÿMedianÿÿ[95%ÿCred.ÿInterval]
-------------+----------------------------------------------------------------
ÿÿÿÿÿÿÿ_consÿ|ÿ-.8417671ÿÿÿ.2630011ÿÿÿ.005654ÿÿ-.8390754ÿÿ-1.384746ÿÿ-.3504125
------------------------------------------------------------------------------

.ÿ
.ÿ/*ÿNotÿaÿproblemÿforÿaÿfactorÿvariableÿtoÿhaveÿcustomÿdensityÿpriorÿ*/
.ÿbayesmhÿforeignÿi.k,ÿlikelihood(logistic)ÿ///
>ÿÿÿÿÿÿÿÿÿprior({foreign:_cons},ÿnormal(0,ÿ5))ÿ///
>ÿÿÿÿÿÿÿÿÿprior({foreign:i1.k},ÿdensity(1))ÿ///
>ÿÿÿÿÿÿÿÿÿnomodelsummary
ÿÿ
Burn-inÿ...
Simulationÿ...

BayesianÿlogisticÿregressionÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿMCMCÿiterationsÿÿ=ÿÿÿÿÿ12,500
Random-walkÿMetropolis-HastingsÿsamplingÿÿÿÿÿÿÿÿÿBurn-inÿÿÿÿÿÿÿÿÿÿ=ÿÿÿÿÿÿ2,500
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿMCMCÿsampleÿsizeÿ=ÿÿÿÿÿ10,000
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿobsÿÿÿÿ=ÿÿÿÿÿÿÿÿÿ74
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿAcceptanceÿrateÿÿ=ÿÿÿÿÿÿ.2666
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿEfficiency:ÿÿminÿ=ÿÿÿÿÿÿÿ.136
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿavgÿ=ÿÿÿÿÿÿ.1436
Logÿmarginalÿlikelihoodÿ=ÿ-47.085067ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿmaxÿ=ÿÿÿÿÿÿ.1513
ÿ
------------------------------------------------------------------------------
ÿÿÿÿÿÿÿÿÿÿÿÿÿ|ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿEqual-tailed
ÿÿÿÿÿforeignÿ|ÿÿÿÿÿÿMeanÿÿÿStd.ÿDev.ÿÿÿÿÿMCSEÿÿÿÿÿMedianÿÿ[95%ÿCred.ÿInterval]
-------------+----------------------------------------------------------------
ÿÿÿÿÿÿÿÿÿ1.kÿ|ÿ-.0355851ÿÿÿ.4936124ÿÿÿÿ.01269ÿÿ-.0221173ÿÿ-.9684661ÿÿÿ.9078394
ÿÿÿÿÿÿÿ_consÿ|ÿ-.8504277ÿÿÿ.3452926ÿÿÿ.009364ÿÿ-.8448572ÿÿÿ-1.54843ÿÿÿ-.212137
------------------------------------------------------------------------------

.ÿ
.ÿ/*ÿProblemÿwhenÿcustomÿdensityÿpriorÿisÿaÿfunctionÿofÿtheÿfactorÿvariable'sÿ
>ÿÿÿÿparameterÿ(orÿitsÿlinearÿcombination)ÿ*/
.ÿbayesmhÿforeignÿi.k,ÿlikelihood(logistic)ÿ///
>ÿÿÿÿÿÿÿÿÿprior({foreign:_cons},ÿnormal(0,ÿ5))ÿ///
>ÿÿÿÿÿÿÿÿÿprior({foreign:i1.k},ÿdensity(exp({foreign:i1.k})ÿ/ÿ(1ÿ+ÿexp({foreign:i1.k}))^2))
noÿpriorÿspecifiedÿforÿforeign:i1.k
ÿÿÿÿPriorÿdistributionsÿmustÿbeÿspecifiedÿforÿallÿmodelÿparameters.ÿYouÿmayÿhaveÿomitted
ÿÿÿÿoptionÿprior()ÿforÿsomeÿofÿtheÿparametersÿorÿmistypedÿtheÿnamesÿofÿtheÿparametersÿinÿthe
ÿÿÿÿspecifiedÿprior()ÿoptions.
r(198);

↧

virtual joins

May 12, 2017, 6:20 am

≫ Next: Stata 15 release

≪ Previous: With -bayesmh-, what is the proper syntax for a custom prior density for a factor variable's parameter that is a function of the parameter?

I noticed JMP 13 and Spotfire 7.5 have virtual joins or similar for multiple data tables. So you don't have to physically join or merge data tables to pull data together with a unique identifier. Does Stata have this capability? Should it? It is supposed to save memory and make data changes faster in large data sets in some situations.

↧

Stata 15 release

May 12, 2017, 6:39 am

≫ Next: Clustering Diff-in-Diff

≪ Previous: virtual joins

Good morning,

There were a lot of reports (and excitement!) that Stata 15 was going to be released this week, e.g. www.econjobrumors.com/topic/stata-15-will-be-released-on-tuesday, and many, many, more. Obviously this did not happen, so a lot of users are understandably very anxious. Does StataCorp have an update on the release date?

Thanks, E.

↧

Clustering Diff-in-Diff

May 12, 2017, 8:37 am

≫ Next: Saving Tempfiles

≪ Previous: Stata 15 release

Hi.

I am working on a diff-in-diff analysis on a policy intervention on wage gaps in two different sectors. I know that the sector and time-specific random effects could generate a possible clustering problem, which may ruin the inference of my analysis. I only have two clusters in the two sectors. However, what would be the consequence of cluster the standard errors on an individual level?

Thank you.

Fredrik

↧

Saving Tempfiles

May 12, 2017, 9:06 am

≫ Next: Differential effects of a continuous variable

≪ Previous: Clustering Diff-in-Diff

Hi there,

Using the code below, I've appended all Excel files in my "ImportantFiles" folder to a tempfile called datasets.

Code:

cd C:/Documents/ImportantFiles

local myfiles : dir . files "*.xlsx"
display `"`myfiles'"'

tempfile datasets

foreach file of local myfiles {
     import excel `"`file'"', firstrow clear
     capture append using "`datasets'"
     save "`datasets'", replace
}

How do I save the content in the tempfile as a .dta file in C:/Documents?

Thanks!
Erika

↧

Differential effects of a continuous variable

May 12, 2017, 9:34 am

≫ Next: generate quantile regression graphs for dummy variable

≪ Previous: Saving Tempfiles

I am looking at the effect of private schooling on partisanship and have found an interesting result: Some private schooling makes individuals appear to be decidedly more likely than those with no private school, but the liberalizing effect slows dramatically after 4 years of private school and actually reverses after 6 years of private school. To illustrate:
Array
My guess is that the "liberalizing" effect is being driven by students who attend private elementary schools. I do have variables telling me whether the individual was in a public or private school at each grade level, so this should be testable. Is there a more practical way other than creating dummies to test my hypothesis?

↧

generate quantile regression graphs for dummy variable

May 12, 2017, 9:35 am

≫ Next: spmap error

≪ Previous: Differential effects of a continuous variable

Hi all

I am trying to generate quantile regression coefficients graph using grqreg command. I run the following commands: (Note: y is my dependent variable, gender, x1, x2, are independent variables. gender is dummy variable which assumes value 1 for female and 0 otherwise.)

qui sqreg y i.gender x1 x2 x3, quantile(.05 .10 .25 .50 .75 .90.95)
grqreg, ols olsci

After I run these commands, I get the following message

1b.gender invalid name
r(198);

Is it that this command does not accommodate the graphs for dummy variables, or I am doing any mistake?
Kindly help

Regards
Karim

↧

spmap error

May 12, 2017, 12:04 pm

≫ Next: Multiplying or dividing coefficients to create risk score

≪ Previous: generate quantile regression graphs for dummy variable

I am looking at how a specific policy diffuses through states. I put together a very basic data set with 50 observations, one for each state, and no missing data. It's in wide for with a binary variable for whether the policy exists for each year (e.g., p2009, p2010... p2014). I am trying to create an individual map for each of these to show which states had already adopted the policy in each year. I am running the following code:

spmap p2009 using uscoord if statename!="Alaska" & statename!="Hawaii", id(id) fcolor(Blues)

for years 2009 through 2014. Each one brings up the map with the correct data, except for 2011 and 2014. On these two years I receive the following error:

When no attribute variable is specified, option fcolor() does not accept palette names
r(198);

Even when I drop the fcolor specification (e.g., spmap p2011 using uscoord if statename!="Alaska" & statename!="Hawaii", id(id)), I get the exact same error. There doesn't appear to be anything different about the data in these two years, so I cannot figure out why only 2 of the 6 maps won't show up. I'm new to spmap, so any advice would help. I didn't see any similar issues in other spmap posts, so I hope this is not a duplicate question.

Thanks,
James

↧

Multiplying or dividing coefficients to create risk score

May 12, 2017, 12:38 pm

≫ Next: Testing Fama-Macbeth; Shanken-Correction

≪ Previous: spmap error

Greetings statalist,
I'm planning to do a similar analyses found in: Balkus, Jennifer E., et al. "An empiric HIV risk scoring tool to predict HIV-1 acquisition in African women." JAIDS Journal of Acquired Immune Deficiency Syndromes 72.3 (2016): 333-343. (https://pdfs.semanticscholar.org/54e...30ffe9e5d5.pdf). Taken from the paper:

"To identify the combination of factors that best predicted HIV risk, we used forward and backward stepwise Cox proportional hazards model that evaluated the inclusion or exclusion of potential predictors at each step. All models were stratiﬁed by study site. The model with the lowest Akaike information criterion was chosen as the ﬁnal model for the risk score. Individual predictors included in the ﬁnal model were assigned a score by dividing the coefﬁcient for the predictor in the ﬁnal model by the lowest coefﬁcient among all predictors in the model and rounding to the nearest integer. The sum of the values for each predictor represented the total score for each participant, and the HIV incidence for each total score category was calculated. The predictive ability of the total score and each predictor was assessed by calculating area under the receiver operating characteristic curve. The score was internally validated using 10-fold cross- validation, and the area under the curve (AUC) for the ﬁnal model was compared with the mean AUC of the 10 different models. Additional performance characteristics (sensitivity, speciﬁcity, positive predictive value, and negative predictive value) were calculated using risk score cut-points that corresponded to an HIV incidence in the risk score category of approximately >3% and >5%. Incidence curves were generated to assess cumulative HIV incidence by risk score cut-point."

Conceptually, I understand what was done. Unfortunately, I do not know how to execute the analyses after the regression models. I've read similar papers that multiplied the regression coefficient by 10 and then round to the nearest integer. I would greatly appreciate assistance in identifying the STATA commands associated with executing this: "Individual predictors included in the ﬁnal model were assigned a score by dividing the coefﬁcient for the predictor in the ﬁnal model by the lowest coefﬁcient among all predictors in the model and rounding to the nearest integer." as well as "The score was internally validated using 10-fold cross- validation, and the area under the curve (AUC) for the ﬁnal model was compared with the mean AUC of the 10 different models."

Thanks so much for your assistance.

↧

Testing Fama-Macbeth; Shanken-Correction

May 12, 2017, 12:43 pm

≫ Next: Goodness-of-fit tests and variable selection for a zero-inflated negative binomial model

≪ Previous: Multiplying or dividing coefficients to create risk score

Hi all,

I made some asset pricing models with Fama-Macbeth method and I would like to apply the Shanken-correction but how could I possibly add this to my code?

Code:


quietly {

foreach var of varlist BH1 BL1 SH1 SL1 BH2 BL2 SH2 SL2 BH3 BL3 SH3 SL3 BH4 BL4 SH4 SL4 BH5 BL5 SH5 SL5 BH6 BL6 SH6 SL6 BH7 BL7 SH7 SL7 {

replace `var'=`var'-Rf

}

}

** Number of observations defining **

local obs=14802

tempname betas

postfile `betas' beta_c beta_HML_Factor using "D:\betas.dta", replace

quietly {

foreach y of varlist BH1 BL1 SH1 SL1 BH2 BL2 SH2 SL2 BH3 BL3 SH3 SL3 BH4 BL4 SH4 SL4 BH5 BL5 SH5 SL5 BH6 BL6 SH6 SL6 BH7 BL7 SH7 SL7 {
regress `y' HML_Factor if fyear>=1962 & fyear<=2017
scalar beta_c=_b[_cons]
scalar beta_HML_Factor=_b[HML_Factor]
post `betas' (beta_c) (beta_HML_Factor)
}
}

postclose `betas'

drop HML_Factor

xpose, clear varname

drop in 1

merge 1:1 _n using "D:\betas.dta"

tempname lambdas

local obs=14802

postfile `lambdas' lambda_c lambda_HML_Factor lambda_r2 using "D:\lambdas.dta", replace

quietly {

forvalues i=1(1)`obs' {

regress v`i' beta_HML_Factor
scalar lambda_c=_b[_cons]
scalar lambda_HML_Factor=_b[beta_HML_Factor]
scalar lambda_r2=e(r2)

post `lambdas' (lambda_c) (lambda_HML_Factor)(lambda_r2)
}
}

postclose `lambdas'


snapshot restore 1
use "D:\lambdas.dta", clear
local obs=14802

quietly {

foreach var of varlist (lambda_c) (lambda_HML_Factor) {
summarize `var'
scalar `var'_mean=r(mean)

scalar `var'_tratio=sqrt(`obs')*r(mean)/r(sd)
}
summarize lambda_r2
scalar lambda_r2_mean=r(mean)
}

display lambda_c_mean
display lambda_c_tratio
display lambda_HML_Factor_mean
display lambda_HML_Factor_tratio
display lambda lambda_r2_mean

Thanks,
Thomas Jansen

↧

Goodness-of-fit tests and variable selection for a zero-inflated negative binomial model

May 12, 2017, 2:19 pm

≫ Next: Padding names - string match

≪ Previous: Testing Fama-Macbeth; Shanken-Correction

Hello - I am working on an analysis of over-dispersed count data using zero-inflated binomial regression and am having difficulty figuring out the appropriate goodness-of-fit test(s) to use and selecting the parameters for the model. Here is an overview of my data:

Dependent variable: "phys_hlth," which is the number of days in the last month when respondent's self-reported physical health was not good (0-30)
Main predictor: "DOV_LGBT," which is LGBT identity (0=not LGBT-identified, 1=LGBT-identified)
Other possible predictors/controls: age (continuous; "PPAGE" in the commands below), and recent experience of discrimination (binary; "discrim" in the commands below), as well as others such as race (5 categories), gender (binary), insurance status (binary), etc.

The histogram of phys_hlth shows a large spike at days=0 that largely tapers off, except for a much smaller bump at days=30; here's an overview of the phys_hlth variable:

Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
phys_hlth | 1,727 4.10249 8.11829 0 30

Because of the size of the variance relative to the mean, I moved from a ZIP model to ZINB. Putting some of the predictors mentioned above into the ZINB model returns good numbers for overall chi2, alpha, and the Vuong test, as well as for my main predictor (DOV_LGBT):

. zinb phys_hlth DOV_LGBT discrim PPAGE, inflate (PPAGE DOV_LGBT) vuong zip

Zero-inflated negative binomial regression Number of obs = 1,554
Nonzero obs = 634
Zero obs = 920

Inflation model = logit LR chi2(3) = 12.45
Log likelihood = -3113.399 Prob > chi2 = 0.0060

------------------------------------------------------------------------------
phys_hlth | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
phys_hlth |
DOV_LGBT | .2159001 .096089 2.25 0.025 .0275691 .404231
discrim | .2479829 .1694438 1.46 0.143 -.0841209 .5800867
PPAGE | .0087431 .0030843 2.83 0.005 .0026981 .0147882
_cons | 1.592725 .1875341 8.49 0.000 1.225165 1.960286
-------------+----------------------------------------------------------------
inflate |
PPAGE | .0126378 .0039695 3.18 0.001 .0048578 .0204179
DOV_LGBT | -.4053969 .1249087 -3.25 0.001 -.6502134 -.1605805
_cons | -.4047859 .2509455 -1.61 0.107 -.8966301 .0870582
-------------+----------------------------------------------------------------
/lnalpha | .2658189 .104051 2.55 0.011 .0618826 .4697553
-------------+----------------------------------------------------------------
alpha | 1.304499 .1357345 1.063837 1.599603
------------------------------------------------------------------------------
Likelihood-ratio test of alpha=0: chibar2(01) = 3673.90 Pr>=chibar2 = 0.0000
Vuong test of zinb vs. standard negative binomial: z = 6.47 Pr>z = 0.0000

What I am struggling with is the following:

1) Are there other goodness-of-fit tests that I should be running to ensure that the ZINB model is a good fit? I have used the margins command to estimate the predicted means after doing a robust ZINB regression, and these estimates are close to the actual means (even without the analytical weight), but I want to make sure I'm not stumbling blindly into the ZINB model because I can't think of any other approaches:

. margins DOV_LGBT

Predictive margins Number of obs = 1,554
Model VCE : Robust

Expression : Predicted number of events, predict()

------------------------------------------------------------------------------
| Delta-method
| Margin Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
DOV_LGBT |
0 | 3.432679 .2549347 13.46 0.000 2.933016 3.932342
LGBT | 5.237526 .35971 14.56 0.000 4.532507 5.942545
------------------------------------------------------------------------------

Actual means:

. mean phys_hlth [aw=weight_1], over(DOV_LGBT)

Mean estimation Number of obs = 1,727

_subpop_1: DOV_LGBT = 0
LGBT: DOV_LGBT = LGBT

--------------------------------------------------------------
Over | Mean Std. Err. [95% Conf. Interval]
-------------+------------------------------------------------
phys_hlth |
_subpop_1 | 3.555177 .2547231 3.055579 4.054776
LGBT | 4.984303 .3125431 4.3713 5.597306
--------------------------------------------------------------

2) Is there a method to selecting which variables to inflate in the ZINB model?

3) Relatedly, what is the best method to use with ZINB regression to select which variables to include in the model at all? E.g., I have taken gender out because it was consistently showing up as nonsignificant no matter whether I inflated it or not, but is testing each of my ~15 possible independent variable one-by-one like that my only option? Can I use something like <gvselect> or forwards/backwards selection with a ZINB model, and if so, how?

Thank you!

↧

Padding names - string match

May 12, 2017, 2:42 pm

≫ Next: Stuck on effect sizes

≪ Previous: Goodness-of-fit tests and variable selection for a zero-inflated negative binomial model

Hey all,

I had an earlier post http://www.statalist.org/forums/foru...-large-dataset

in which one of the lister shared a portion of his code (#3). He suggested padding the names before doing name cleaning ,

Code:

* Make sure that we have a space at the beginning and end so that we can
* search words delimited by spaces.

    replace shortname = " " + shortname + " "
    listdiff shortname, reset

replace shortname = subinstr(shortname," INCORPORATED ", " ",.)

suppose the content of name (shortname)

Code:

First Eagle Corp

would this only insert a blank before First, and one blank after Corp?

or would it insert the blanks before and after for each word ?

A more general question, if our dataset is commercial database which include firm names, is it always necessary to pad the names before we do a task such as

Code:

replace shortname = subinstr(shortname," INCORPORATED ", " ",.)

thanks,

Rochelle

↧

Stuck on effect sizes

May 12, 2017, 3:55 pm

≫ Next: Looking for advice on computing the difference between two variables with different scales to compute a third variable

≪ Previous: Padding names - string match

Hi all!

I have two independent groups who went through the same treatment. I want to see the effect size. I know the following code is often used:

esize twosample math, by(grouptype) cohensd hedgesg glassdelta

However, there is a problem. The study has a pre-test/post-test set up. For the pre-test scores, one group was significantly higher than the other. Therefore, it seems like looking at the post-test effect size isn't helpful. It might be more helpful to find an effect size for the differences in scores. However, for the life of me I can't figure out how to do this on Stata. I'd love any advice. Thanks in advance for the help!

↧

Looking for advice on computing the difference between two variables with different scales to compute a third variable

May 12, 2017, 4:00 pm

≫ Next: Help: Graph Supply and Demand from Regression

≪ Previous: Stuck on effect sizes

Hello

I was hoping someone could help me with the below query, if so it would be greatly appreciated:

I have two variables with separate scales/metrics: (i) self-report extroversion (possible score of 1-5); (ii) friends report of extroversion (possible score of 1-8)
I would like to calculate the meaningful difference between "self-report extroversion" and "friends report of extroversion" for each participant in my sample but I need the two variables to be on the same scale to do this. (I would like to create a new variable based on this difference called "extroversion congruence").
Would anybody be able to advise me on the best way to handle the above in stata?
I hope this makes sense but if not I am happy to clarify

Thank you very much

All the best
Conal

↧

Help: Graph Supply and Demand from Regression

May 12, 2017, 5:51 pm

≫ Next: Order in Bar Graphs

≪ Previous: Looking for advice on computing the difference between two variables with different scales to compute a third variable

Reg. With IV.do

Hey!
I'm pretty new with stata and have been trying lately to obtain the demand and supply of natural gas for the US (short and long term)
I have been successful with my regressions so far (log(Ct)=log (Ct-1) + log (Pt)+Weather+date+ui, with absorbed fixed effects (by month and region), robust error and using an instrumental variable for the price). Also, my regressions are separated between winter and summer elasticities (with dummies) for consumption to find the difference in winter and summer demand.

areg log_con_total hat_log_spot_price winter_hat lag_log_con_total w_lag_log_con_total hdd date, absorb(region_x_mes) robust

areg log_prod_total hat_log_spot_price winter_hat w_lag_log_prod_total lag_log_prod_total date hdd, absorb (region_x_mes) robust

(hat_log_spot_price is the IV, w_ implies the winter dummy, hdd is for weather)

Now I want to graph the supply and demand for those regressions, but do not know how (how do I get the supply and demand equations from results?); I especially want to work with short term supply and demand (so I don't care about long term elasticities) and separate winter and summer demand.

Any comments might help,
Thanks for your time

↧

Order in Bar Graphs

May 12, 2017, 7:35 pm

≫ Next: Data Cleaning

≪ Previous: Help: Graph Supply and Demand from Regression

I am having difficulty putting my bars in a specific order. It automatically orders the bars from left to right alphabetically but I want each president in chronological order. How do I do this?My current code is this:

graph bar laborforce, over(president) ytitle("Unemployment Rate") title("Average Monthly Unemployment Rate") subtitle("for each U.S. President")

↧

Data Cleaning

May 13, 2017, 7:19 am

≫ Next: Recursive regression with fixed training window

≪ Previous: Order in Bar Graphs

I have a variable service_number. I know it HAS TO have 10 characters. How can I generate an indicator for all the observations for which service_number != 10. Also, how can I delete these observations in one command?

↧

Recursive regression with fixed training window

May 13, 2017, 7:20 am

≪ Previous: Data Cleaning

Dear All,

Once again I am encountering some specific regression which cannot correctly execute. The idea is, lets say we have time series sample 1930-2010, we split this sample at 1980 and use prior observations as a fixed window, t=49, {1,...,t} and run regression on some variable lets say. At year 1980, we start running recursive regressions and using this window as base, thus now we run regression with {1,..,t+1} and so on until we exhaust all the remaining time series, until 2010, i.e {1,..,t+1},..,{1,..,t+30}. Maybe someone has some ideas or had similar issue?

Thank you in advance!
Marijus

↧