Quantcast
Channel: Statalist
Viewing all 65489 articles
Browse latest View live

Forest plot of odd ratios in case control study (not derived from logistic regression)

$
0
0
Dear Stata experts,
I usually perform Forest plots for graphical display of results of logistic regression using COEFPLOT. which works perfect.
I wonder how would it be possible to generate a simmilar looking forest plot for manually introdced data without need to use METAN function.
For example, when i do case control study, i have my results in Odd ratios, but i want to graphically display these odd ratios in a forest plot. COEFPLOT wont work since the odd ratios here are not derived from logistic regression.
Appreciate your help
Sincerely
Abugroun

Dropping observation with survey weighted data

$
0
0
I would like to run an analyses on all youth included in the 2010-2014 Hospital Cost and Utilization Project National Emergency Department data. These datasets are too large for my computer to run simple regression codes despite dropping any variable I do not need. Because I am only interested youth, I would like to be able to drop the adult observations while still using the survey weights. I am unable to run analyses using the subpop option given how much memory is required. I understand the standard errors will be incorrect if you drop observations from survey data. HCUP provides some guidance for subsetting data "The alternate method for calculating appropriate standard errors is to subset the nationwide database to the observations of interest. Then, append one "dummy" observation for each of the hospitals included in the nationwide database that is not represented in the subset. The dummy observations ensure that all the hospitals in the sample are taken into account, resulting in the accurate calculation of standard error," however, they use the SAS codes. Anyone familiar with using a subset of survey data in Stata while still getting correct SEs and not using subpop or able to point me in the right direction?
Please let me know if any of this is unclear, I apologize in advance.
Thank you!

Interpretation of fixed effects constant

$
0
0
Dear statalists,

i am looking at the development of corporate Investment over a period of time (17 years). My data is an unbalanced Panel with about 6500 firm year observations.
I declared my data to be time series data by the tsset command (yearly).
To investigate the relationship between my dependent variable and my Independent variables in different time periods, i divided the 17 years into 3 separate periods and used OLS and fixed effects(was supported after hausman test) regressions for every period separately.
Code:
reg depvar idepvar1 idepvar2 idepvar3 idepvar4 idepvar5 idepvar6 idepvar7, cluster (firm)
xtreg depvar idepvar1 idepvar2 idepvar3 idepvar4 idepvar5 idepvar6 idepvar7, fe cluster(firm)
As i analysed the results i found that the constants of the OLS Regressions declinend between the 3 periods while the constants of the fixed effects Regression increased.
Now i am searching for a possible explanation for that.

1. Might an increase of time invariant effects during these periods provide an answer ?

2 Can one interpret that there is less variablity of the depvar during the later periods ?

I attached the results of the regressions.

Thank you for your help !

Multiple imputation: problem with storage method

$
0
0
Hi,

I am experiencing a problem with the mi command.

First of all I am running the following commands:

Code:
mi set mlong
mi register outcome
mi impute chained (regress) outcome = covariates, orderasis add(5) burnin(100) rseed(012345) dots augment
The outcome of my multiple imputation is showed below for a single observation:

Array

The problem is that I would like the observed value (which in the example showed above is the outcome .325 at t=0, i.e. the very first observation) to be included in each imputed dataset.

My intuition reading this UCLA Stata link is that the standard storage method for mi should include the observed outcome at t=0 within each imputed datastet.

I need this because I am using the
Code:
mi passive:
command to generate some variables with the imputed values but I also need the observed values to do so.

Any thought?

Thanks,

Lukas

observed predicted curves using competing risk regression

$
0
0
Dear all

I have developed a model to predict hepatic complications using variables selected through competing risk regression models. I am trying to calibrate the model but I cannot find a way to recreate the observed and predicted curves on a model based on competing risk regression analysis. Any ideas on how to do this?
Many thanks

Help forecasting out of sample

$
0
0
Hello everyone
I am currently trying to forecast the sales of a company using time series models. More specifically, ARIMA/SARIMA models.
I am able to successfully run the models and obtain the model predictions for the values within the sample, but i cannot seem to find a way to predict OUT of the sample.
By typing "predict var_prediction, xb" i obtain the third column i show down below (see attached file).

What i would like is for stata to forecast beyond t = 23, for instance up to t=30. I simply cannot find a way to do this.

Thanks everyone in advance!

Xtabond2 for system GMM. Please help me for coding

$
0
0
Dear all,

I am working with xtabond2 command in Stata to solve the endogenity problem of my estimation. I read the construction of doing xtabond2 from David Roodman. However, I am still confused that my coding is right or wrong. Could you please help?

Specifically, I want to write a code with xtabond2 command for system GMM as follows:

" lag2 and lag3 of the levels of firm performance variable, corporate governance variables (female, nonexe, dual, lnsize) and control variables (fsize lev) are employed as GMM-type instrumental variables for the first-differenced equation. Meanwhile, first lagged differences of firm performance, corporate governance, and control variables are used as GMM-type instruments for the levels equation.

Thank you very ,much in advance.

Regards,
Celine

diameter sign in graph

$
0
0
I would like to use a diameter sign (Unicode 2300) in a graph but I am not able to do so with the unicode code. Of course, using {c O/} is a good way around, but nonethless I wonder why Stata 15.1 gives me:
Array

And with the -grtext- (from ssc) tool I get:

Array

This does not look like the sign I want to have...The technical documentation of unicode shows that it should look like:
Array

Is there another good way to get this original unicode diameter sign into the graph?

how to calculate a proportion with a complex condition

$
0
0
Hello, I am using Stata 15.2 to analyze survey data. I want to calculate the household size among households that have at least 1 person who is formally employed. I don't want to double count households with multiple people formally employed. My vars are hhid, hhsize, employment_group==5. Please help!

How to extend cross sectional units in a panel data

$
0
0
Dear all,


I have a dataset in the current format:

ID Year X Y Z
1 2000 3 1 9
1 2001 4 2 5
1 2002 12 15 21
2 2000 0 7 4
2 2001 ...
2 2002
3 2001
3 2002
3 2003
4 2000
4 2001
4 2002




And I would like to extend each cross section unit, for each year, 3 times as follows:

ID Year X Y Z
1 2000 3 1 9
1 2000 3 1 9
1 2000 3 1 9
1 2001 4 2 5
1 2001 4 2 5
1 2001 4 2 5
1 2002 4 2 5
1 2002 12 15 21
1 2002 12 15 21
2 2000 12 15 21
2 2000 0 7 4
2 2000 0 7 4
2 2001 0 7 4
2 2001 ...
2 2001
2 2002
2 2002
2 2002
3 2000
3 2000
3 2000
3 2001
3 2001
3 2001
3 2002
3 2002
3 2002
4 2000
4 2000
4 2000
4 2001
4 2001
4 2001
4 2002
4 2002
4 2002



I tried with several commands and interpolations, but so far I did not manage to get what I wanted. Could you please help me with this?

Many thanks.

Kodi

Create dummies for percentiles

$
0
0
Hello, please help create the following.

To create the dummies according to the percentiles such as <25, 25<>50, 50<>75, and 75<.

I tried:

replace c_total_dum50 = 1 if c_total_assets < r(p50)
replace c_total_dum50 = 0 if c_total_assets < r(p25)

to create 25<>50 but did not work. Please help.

Matching 2 databases on industry (SIC 2-digit code) and size (total assets) for a study on audit fees

$
0
0
Hallo, I currently have a sample of 143 public firms that were involved in Merger and Acquisition (M&A) behavior in 2016. I am studying the effect of this merger and acquisition behavior on Audit Fees.
I have created 2 databases of listed firms, one with the 143 public firms that were associated with M&A behavior and one with 2478 public firms NOT associated with M&A behavior. Both of these databases have Ticker Symbol as company identifier.

I would like to match the 2 databases, so that every public firm with M&A behavior has 1 uniquely "matched" (without replacement) observervation, ultimately doubling my M&A database with non-M&A observervations.
I want them to match having equal 2-digit SIC codes (Standard Industry Classification Codes) and as close as possible Total assets while still keeping all my variables from my databases (like audit fees, auditor, audit opinon, total inventories etc).

Is there any way this can actually be done?
Someone suggested I tried using "Calipmatch" which I have installed using ssc install calipmatch, but I can't figure out how it actually works.

I've tried the following code: (Note AT = assets total, acquirortic is the company identifyer/tickler)

Code:
  
use "C:\Users\Gebruiker\Desktop\Thesis samples\2016 M&A final.dta", clear
rename at case_at
rename acquirortic case_acquirortic  
joinby sic2 using "C:\Users\Gebruiker\Desktop\Thesis samples\matching sample 2017.dta"

gen delta = abs(at - case_at)
drop if delta >= 500
drop if delta < 1  

set seed 15
gen double shuffle1 = runiform()
gen double shuffle2 = runiform()

by acquirortic (delta shuffle1 shuffle2), sort: keep if _n == 1

by case_acquirortic (delta shuffle1 shuffle2), sort: keep if _n == 1
drop delta shuffle1 shuffle2
This does appear to be matching the 2 databases, but gets rid of the majority of my secondary database's variables and does not create new observations but just places them in the columns behind the current observations.

Any suggestions/help/code which helps me improve this? I hope my question is clear, any help would be appreciated.

Thank you in advance for any help!

Robin

Propensity score Matching

$
0
0
Good Morning, i am a student at the university of messina. In My thesis i am analyzing the differences in the level of job satisfaction between italians and foreigners. The problem is that i have many missing values of foreigners' satisfaction. With My professor we thought of using the matching technique to generate these values.i Will explain shortly My thesis:
- creation of a dummy 'dumjs', which is equal to 0 if the individual has not answered the question on job satisfaction, equal to 1 vice versa.
-probit on the possibility of reply (dependent variable: dummy job satisfaction: independent: sex,age,nationality,marital status, life satisfaction).
-calculation propensity score.

Now i would like to generate the missing values of job satisfaction (to proceed with other calculations i'm not going to explain) through the propensity score matching.

Can someone explain to me the command to be used on stata?

Thank you.

Re-arrange individual observations into a family setting

$
0
0
Hi,
I'm struggling to find a way to transfer observations on an individual level to rows equal to family levels. You can find a picture of the data attached. Most lables are straightforward, famnr is the individual family number based on city and family, fampos is the position of the individual within the family (1=child, 2=spouse, 3=household head).

The final arrangement should include information on the age of the individual person (mother, father or child) so that each family can be displayed in one row. And related to children, there should be variables sex_first child, age_first child, sex_second child, age_second child and so on.

Does anybody have an idea how to start?

Thanks in advance!

Cheers,
NM

estimate wtp with doubleb command - interpretation

$
0
0
Hi,
I am estimating WTP for health using the doubleb command. The command line and the results I draw are shown below

doubleb BID1 BID2 ANSWER1 ANSWER2

initial: log likelihood = -<inf> (could not be evaluated)
feasible: log likelihood = -8086.9392
rescale: log likelihood = -992.84364
rescale eq: log likelihood = -905.60652
Iteration 0: log likelihood = -905.60652
Iteration 1: log likelihood = -880.20383
Iteration 2: log likelihood = -877.53287
Iteration 3: log likelihood = -877.5177
Iteration 4: log likelihood = -877.51769

Number of obs = 318
Wald chi2(0) = .
Log likelihood = -877.51769 Prob > chi2 = .

------------------------------------------------------------------------------
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
Beta |
_cons | 2093.911 55.61786 37.65 0.000 1984.902 2202.92
-------------+----------------------------------------------------------------
Sigma |
_cons | 939.4247 43.78531 21.46 0.000 853.6071 1025.242
------------------------------------------------------------------------------

First-Bid Variable: BID1
Second-Bid Variable: BID2
First-Response Dummy Variable: ANSWER1
Second-Response Dummy Variable: ANSWER2

If I am right, the beta coefficient is the WTP (that is 2093 euro).

Now I want to add control variables in my model. I have categorical variables for income (1=very high, 2=high 3=low 4=very low), education (1=primary 2=secondary 3=tertiary), working status (1=unemployed, 2= employed) etc.

Can I run the following command? Does it make any sense taking into account the type of my data?

doubleb BID1 BID2 ANSWER1 ANSWER2 age education work

If I do this, I take the following table

initial: log likelihood = -<inf> (could not be evaluated)
feasible: log likelihood = -14499.403
rescale: log likelihood = -1203.9853
rescale eq: log likelihood = -1169.0702
Iteration 0: log likelihood = -1169.0702
Iteration 1: log likelihood = -1144.0228
Iteration 2: log likelihood = -1140.6283
Iteration 3: log likelihood = -1140.6185
Iteration 4: log likelihood = -1140.6185

Number of obs = 318
Wald chi2(3) = 8.11
Log likelihood = -1140.6185 Prob > chi2 = 0.0437

--------------------------------------------------------------------------------
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
---------------+----------------------------------------------------------------
Beta |
income | 34.71665 55.11304 0.63 0.529 -73.30293 142.7362
education | 153.331 89.25955 1.72 0.086 -21.61448 328.2765
work | 56.95611 31.21453 1.82 0.068 -4.223234 118.1355
_cons | 1387.917 337.3133 4.11 0.000 726.7952 2049.039
---------------+----------------------------------------------------------------
Sigma |
_cons | 918.901 42.87693 21.43 0.000 834.8637 1002.938
--------------------------------------------------------------------------------

First-Bid Variable: BID1
Second-Bid Variable: BID2
First-Response Dummy Variable: ANSWER1
Second-Response Dummy Variable: ANSWER2

If the command I used is right, how do I interpret the coefficients? As education "increases" the WTP increases by 153 euros?

Any help would be precious!
Thanks!

Esttab: Show multiple Regressions in one column/ Delete empty cells

$
0
0
Hi all,

for my master thesis I try to display 70 logit regression outcomes in one table. Please note that for each regression I only want to display the main independent variable (i.e. I drop all other control variables etc. in esttab). I have 7 different outcomes and for each outcome I have 10 specifications, each with a different independent variable. Ideally I would like to have a 10rows*7collums table, showing only the "beta1" coefficients of each regression. However, I only manage to get a 10rows*70columns table with the 70 different coefficients (displayed as 7 diagonals for each outcome) and 630 empty cells.

The relevant code looks like this:


...

Code:
foreach var of global subjects {
    foreach independent in ue_rate n_ue_rate n_ue_rate_change p50 p75 p90 recession gdpgrowthrate DEUREC svrat {
    eststo : qui logit `var' `independent' par_uni_educ female migration grade i.state i.year, vce(cluster year)
}
}

noisily : esttab, se compress no type  ///
keep (ue_rate n_ue_rate n_ue_rate_change p50 p75 p90 recession gdpgrowthrate DEUREC svrat)

esttab using "latex\tables\majorshares_comparison_logit.tex", se compress no type ///
keep (ue_rate n_ue_rate n_ue_rate_change p50 p75 p90 recession gdpgrowthrate DEUREC svrat) replace
...


What I would like to do is either delete all empty cells or 'compress' all coefficients in one column for each outcome.

I already tried https://www.stata.com/statalist/arch.../msg00636.html but I'm not sure if it works for multivariate regressions ( i get a factor variable error).

Maybe there is an easier way, e.g. merging the tables via outreg?



Any help is very much appreciated!

Thank you
Andi

error help: at level for factor var not present in estimation

$
0
0
Hi everyone,

I'm having trouble understanding what an error code means when running the margins command for a logistic regression I have ran.
The logistic and margins code is:
Code:
logistic r_ks4_level2_em_37 i.r_mo_ligmodheavyalc_9 i.r_mo_highestedqual_4  i.r_mo_drugsyn_9 i.r_mo_nosmokedlast2wk_3 i.r_mo_alccon1stmove_3 i.r_mo_prenataldrugyn_3  i.r_mo_familyincomeweek_9 i.r_mo_anxietydr_9 i.r_mo_depressiondr_9 c.r_mz028b i.cr_mo_ethnicgrp_4 i.r_mo_ligmodheavyalc_9#i.r_mo_highestedqual_4

margins r_mo_ligmodheavyalc_9, at(r_mo_highestedqual_4=(1(1)6)) 
marginsplot, noci ytitle(KS4 outcomes) name(alcoholqual6gcse5ACengmath)
The error says
Code:
  at level for factor r_mo_highestedqual_4 not present in estimation
I've looked at the Stata guide and looked online but I can't seem to make sense of this error. The size of the regression is 5, 700 observations. The alcohol category has 3 category's (no drinking, light drinking, moderate drinking and heavy drinking) and the qualifications variable has 6 categories (no qualifications, CSE, skilled qual, O-level, A-level or equivalent and degree).

Hope this is enough information,

Emily

Editing individual markers in graph dot

$
0
0
Hello list readers.

I'm constructing a dot chart in Stata using the graph dot command. I'd like to be able to edit individual markers, so as to change their size, colour, as appropriate. However if I change one, all change. The following example is taken from the auto data often used for illustration purposes.

sysuse auto, clear
graph dot (count), over(rep78) over(foreign)

Is it possible to edit individual markers, or will I have to construct the graph differently? Thank you for any advice you can offer.

Incidence Rate Ratio using Poisson Regression in Stata

$
0
0
I have a dataset and I am trying to model rates of infection over 5 years.

The dataset is similar to this one below with data going up to December 2015
month and year (yearmo) number of infections caused by organism x
(infx)
total number of infections
(inftot)
jan 2010 54 664
feb 2010 44 566
........etc etc
december 2015 114 894

I am trying to model this count data using a Poisson regression to get the rates of infection caused by organism x over the 5 years. As the number of infections reported varies from each year, I have used the offset command to take this into account which requires me to log the number of total infections. I have therefore logged the total number of infections to make a variable called loginftot for my below regression.

My current code looks like this:
HTML Code:
poisson infx yearmo, irr offset(loginftot)
My questions are the following and would really appreciate if anyone could help answer them:
1) The resulting incidence rate ratio will be the change in rate of infection caused by organism x per month. Am I correct in interpreting this? For example, if my rate ratio is 0.98, does that mean that the number of infections per month caused by organism decrease by a factor of 0.98?

2) I was kindly wondering whether this was the correct use of the offset command. As previously mentioned, I use the offset command to account for the fact that different numbers of infections are reported per year.

3)I was finally wondering whether this coding was correct in Stata?

Thanks for your help in advance!

2 stage model with interaction terms

$
0
0
Dear Stata Users,

I have two questions regarding 2-stage models and interaction terms:

1. Assume model (1) y = a1x1 + a2x2 + a3(x1*x2)

with x1 endogenous and x2 exogenous variables. If I have an instrument z1 for x1 can I run the first stage without the interaction term (x1*x2), calculate the prediction of x1 (name it x1hat) and then run a 2-stage for model (1) with instruments z1 and x1hat*x2 and correct the standard errors with bootstrapping? Or using straight z1 and z1*x2 as instruments is also correct?

2. Also assuming the same model with both x1 and x2 being endogenous and that I have two instruments z1 and z2 for each one of them. Can I use z1*z2 as a third instrument and run a 2-stage model with three endogenous variables x1, x2 and x1*x2 and three instruments z1, z2 and z1*z2?

I looked it up and found quite a few threads that suggested different approaches.

Any suggestions are highly appreciated.

Emmanouil

Emmanouil Avgerinos, Ph.D.
Assistant Professor in Decision Sciences
Operations & Technology Area
IE Business School (Instituto de Empresa)
María de Molina, 12 - 5th Floor
28006 Madrid
Viewing all 65489 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>