Quantcast
Channel: Statalist
Viewing all 65129 articles
Browse latest View live

Converting data in national currencies in panel data set into one common currency

$
0
0
Hello everyone,

I am currently working on my Master Thesis and I would need a help with conversion of data in national currency into unified one/ euro.
My data set consists of a number of European countries. In some of them the national currency is euro, in some of them the national currency is something else (Denmark, Great Britain, etc.). Would you please help me figure out, how to convert the non-Euro currencies in the easiest way, I would like to avoid doing it manually as it creates too much space for a mistake.

I have found Stata module for Importing Exchange rates (Version 1.4.0: 2013/02/22) by Damian C. Clarke & Pavel Luengas Sierra, however I am not totally sure how to use it.

I will be grateful for any advice!

Thank you very much and have a lovely day.

Natalia

Trying to calculate an index

$
0
0
Hi,

I'm aware that the -duncan- command exists, but I need to do this manually for some other reasons.

So I'm trying to calculate the Duncan's dissimilarity index using this formula here:

https://en.m.wikipedia.org/wiki/Dunc...regation_Index



I have the variables mi, fi, M and F. However, I'm struggling how to incorporate that sigma notation in an equation for Stata. Completely at a loss here.

Any help would be much appreciated!

Alternatives to ttest using svy? Comparison of means between male and female respondents

$
0
0
I am using Stata 14.1, I have a dataset obtained via survey, where I have circa. 10 variables which consist of a 1-5 ranking (answers to questions such as: "from 1 to 5, how much do you identify the following statement with Y party?"), circa. 5 which consist on a 0-10 ranking ("from 0 left to 10 right, where do you position Y party?") and demographic variables (gender, age, self positioning on the left-right scale...)
Since the demographic of respondents was not representative in terms of proportions (too many males, too left leaning...) but I had a good amount of responses (circa. 6000) I have also created a weight variable, where it is stated how much weight each observation should have, taking into account known data of gender and left right self positioning.

I want to perform a t testto find out if there is significant difference in means due to gender and membership to a party; and between male and female members of the party. I have come to understand that t test cannot be performed with weights, but I am not aware of any alternatives.

I have tried the following code:

Code:
 svyset [iw=gndrlrweight]
svy: ttest lrPP, by(gndr)
svy: ttest liberalism if member==2, by(gndr)
and

Code:
svyset [iw=gndrlrweight]
svy: regttest lrPP, by(gndr)
I have read of the possibility of using the following code, but I do not know if it is equivalent to a ttest or how to interpret it (how to find out significance).

Code:
svyset [iw=gndrlrweight]
test [lrPP]Male = [lrPP]Female

Any help? Thanks in advance!

Diff-in-Diff

$
0
0
I am trying to do a diff-in-diff analysis on the effect of endorsement on a product. This is what my data set looks like:

dates times price YeezyShoe date_of_purchase Yandhi_Release Ye_Release Kanye_Chicago
April 2, 2019 3:55PM 128 0 21641 0 0 0

At the moment, I have these commands in order to create treatment date ranges:
gen date_of_purchase = date(dates, "MDY")
gen byte Yandhi_Release=inrange(date_of_purchase,td(29sep20 18),td(3october2018))
gen byte Ye_Release=inrange(date_of_purchase,td(01june2018) ,td(05june2018))
gen byte Kanye_Chicago=inrange(date_of_purchase,td(15august 2018), td(19august2018))

If the shoe is endorsed, YeezyShoe=1, and if the shoe is the control shoe, it is YeezyShoe=0.

This is what my current diff-in-diff regression looks like; however, I am having a hard time isolating this effect and formulating a regression.

gen treatXafter = Yandhi_Release * YeezyShoe

reg __ YeezyShoe Yandhi_Release treatXafter

DID difference in difference for count data using nbreg (negativ binomial)

$
0
0
Dear statalists,

I have a data set of treated (employees purchasing stocks through a firm´s stock option scheme) and non-treated (employees not purchasing stocks thorugh a firm´s stock option scheme) individuals with two periods (before and after treatment) and several controls.

Only those employees that purchased stock-options for the first time are considered in the treatment group. Hence, the data set is very unbalances as the control group (non-treatment) is several times larger than the treatment group.

My dependend variable is a count variable of ideas issued to an idea suggestion scheme - so we are interested in whether employees owning stocks are issuing more ideas that employees not owning stocks in the firm.

Variables are:
DV: newidea_a_did_1
treatment dummy: did_eso_treatment
period dummy: period
tnteraction period x treatment: treatment_X_period
+ several controls

I have attached and excerpt of my data below

The question is, can I run a difference in difference regression using nbreg just as I would do it with the common reg command? I think nbreg is more appropriate due to having count data extremely skewed to the left?

reg command: reg newidea_a_did_1 period did_eso_treatment treatment_X_period year fulltime_did_1 size_did_1 dummy_function_1_did_1 dummy_level_1_did_1, vce(robust)
nbreg command: nbreg newidea_a_did_1 period did_eso_treatment treatment_X_period year fulltime_did_1 size_did_1 dummy_function_1_did_1 dummy_level_1_did_1


Thanks for your help!
Felix


Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input byte period long newid int(year newidea_a_did_1) byte(fulltime_did_1 dummy_function_1_did_1 dummy_level_1_did_1) int size_did_1 byte did_eso_treatment float treatment_X_period
0 164876 2014 0 1 0 0 1 0 0
1  12837 2014 0 1 0 0 1 0 0
1 136451 2015 0 1 0 0 1 0 0
1  95503 2013 0 1 0 0 1 0 0
0 148296 2013 0 1 0 0 1 0 0
0 164616 2014 0 1 0 0 1 0 0
1  79008 2011 0 1 0 0 1 0 0
1 113462 2015 0 1 0 0 1 0 0
1 104390 2012 0 1 0 0 1 0 0
0   5472 2012 4 1 0 0 1 0 0
0 129275 2015 0 1 0 0 1 0 0
1  47902 2015 0 1 0 0 1 0 0
1  89282 2013 0 1 0 0 1 0 0
1 154119 2013 0 1 0 0 1 0 0
0  80340 2013 0 1 0 0 1 0 0
0   5542 2014 3 1 0 0 1 0 0
1 159958 2014 0 1 0 0 1 0 0
1  30037 2015 0 1 0 0 1 0 0
1  68050 2015 0 1 0 0 1 0 0
0  26429 2014 0 1 0 0 1 0 0
0  18680 2013 0 1 0 0 1 0 0
0 127988 2015 1 1 0 0 1 0 0
1  55030 2013 0 1 0 0 1 0 0
0   1123 2014 1 1 0 0 1 0 0
0   7311 2012 0 1 0 0 1 0 0
1 132880 2012 0 1 0 1 1 0 0
0 114821 2015 0 0 0 0 1 0 0
0  12697 2015 1 1 0 0 1 0 0
1  22619 2011 0 1 0 0 1 0 0
1  13878 2014 0 1 0 0 1 0 0
1  21819 2014 0 1 0 0 1 0 0
0 108467 2013 0 1 0 0 1 0 0
0  23320 2013 0 1 0 0 1 0 0
0  38465 2015 0 1 0 0 1 0 0
1  67225 2011 0 1 0 0 1 0 0
1 108023 2013 0 1 0 0 1 0 0
1  78626 2015 1 1 0 0 1 0 0
1 162525 2015 1 1 0 0 1 0 0
0  88884 2014 0 1 0 0 1 0 0
1  21763 2013 0 1 0 0 1 0 0
0  13552 2011 0 1 0 0 1 0 0
1  68124 2015 0 1 0 0 1 0 0
0  13595 2011 0 1 0 0 1 0 0
0 140693 2012 0 1 0 1 1 0 0
1  68069 2014 0 1 0 0 1 0 0
0  69566 2013 0 1 0 0 1 0 0
1 116535 2012 0 1 0 0 1 0 0
0   5935 2011 0 1 0 0 1 0 0
0  37895 2012 0 1 1 0 1 0 0
1 124789 2011 0 1 0 0 1 0 0
1  53398 2013 0 1 0 0 1 0 0
1 145305 2015 3 1 0 0 1 0 0
0   5975 2013 0 1 0 0 1 0 0
0   5991 2011 0 1 0 0 1 0 0
0   5991 2013 0 1 0 0 1 0 0
1  13616 2015 0 1 0 0 1 0 0
0   5999 2015 0 1 0 0 1 0 0
0 150473 2012 0 1 0 0 1 0 0
1 164520 2011 0 1 0 0 1 0 0
0 147783 2015 0 1 0 0 1 0 0
0  79014 2015 0 1 0 0 1 0 0
0 154112 2011 0 1 0 0 1 0 0
0  32056 2015 0 1 0 0 1 0 0
1  77614 2012 0 1 0 1 1 0 0
0  78915 2015 0 1 0 0 1 0 0
1 125923 2011 0 1 1 0 1 0 0
0  22439 2014 2 1 0 0 1 0 0
1 127811 2015 0 1 0 0 1 0 0
0   5542 2015 0 1 0 0 1 0 0
1  79014 2012 0 1 0 0 1 0 0
0 160757 2013 0 1 0 0 1 0 0
0 133963 2015 0 1 0 0 1 0 0
0  69247 2011 0 1 0 0 1 0 0
0 108467 2014 1 1 0 0 1 0 0
0   6297 2014 0 1 0 0 1 0 0
1 109064 2015 0 1 0 0 1 0 0
0   5944 2015 0 1 0 0 1 0 0
1  13399 2014 0 1 0 0 1 0 0
1  67948 2013 0 1 0 0 1 0 0
0  26745 2013 0 1 0 0 1 0 0
1 124741 2014 0 1 0 0 1 0 0
1  80122 2014 3 1 0 0 1 0 0
1 131175 2012 0 1 0 0 1 0 0
0 164807 2014 0 1 0 0 1 0 0
1  33422 2011 0 1 1 0 1 0 0
1 115241 2011 0 1 0 0 1 0 0
0 127199 2013 0 1 0 0 1 0 0
0  13878 2015 0 1 0 0 1 0 0
0 147779 2015 0 1 0 0 1 0 0
0  87807 2011 0 1 0 0 1 0 0
0   6627 2015 0 0 0 0 1 0 0
0  96166 2013 0 1 0 0 1 0 0
0   6661 2011 0 1 0 0 1 0 0
0   6664 2014 0 1 0 0 1 0 0
1 161085 2012 0 1 0 0 1 0 0
0  21961 2015 0 1 0 0 1 0 0
1 160757 2012 0 1 0 0 1 0 0
1  22469 2011 0 1 0 0 1 0 0
0   6804 2011 0 1 0 0 1 0 0
0   6804 2013 0 1 0 0 1 0 0
end

Predict command

$
0
0
Hi,

I estimated the probability of default of loans using borrower characteristics. First, I run a probit regression. The dependent variable is a dummy variable that equals one if the loan is defaulted and equals zero otherwise. The independent variables are borrower characteristics, such as homeowner dummy, credit history length, and etc. I use the "predict" command in Stata to estimate the probability of default of each loan as percentages.

I'm now trying to run the probit regression on only half of loans in my sample and use the coefficients generated in this regression to predict the probability of default as percentages for the whole sample. How can I do it? Please see the following data sample.

input float(default_1 homeowner_1) long amount_delinquent float bankcard_utilization long revolving_balance float credit_history_length byte delinquencies_over60_days
0 0 0 .83 22144 19.164955 0
0 1 0 .45 23427 21.08145 3
1 1 0 .4 29815 26.55989 0
0 1 0 .2 7484 16.347708 0
0 1 0 .75 3622 21.141684 0
0 0 0 .2 331 14.557153 0
1 1 0 .67 67001 18.283367 0
0 1 0 .95 62094 17.18549 0
0 1 0 0 1092 12.364134 0
0 1 0 .5 15593 23.140314 1
0 1 0 .33 14145 28.249144 0
0 1 0 .77 50439 27.54278 0
0 1 0 .69 27053 24.0219 1
1 1 0 .71 15149 33.99589 2
0 1 232 .75 58265 19.28268 0
0 0 0 .01 29 12.427105 0
1 0 0 .59 55194 22.29432 0

Rolling percentiles for large dataset

$
0
0
Hi,

I have a large dataset containing more than 30m observations.
The dataset aggregates daily trading data for about 35000 individuals (variables "date" and "ind").
I need to compute a rolling 1-, 5-, 95-, and 99-percentile of a certain variable X for each of those individuals over the respective past 60 days.
So, something like

Code:
xtset ind date
mvsumm X, stat(p1) win(60) gen(p1)  end
would do the job over a 60-day window. However, this command is relatively slow. It runs for more than a day. Since I need this multiple times, I am in need of a quicker option.
Is there any other, faster way to do this?
My impression is that rangestat is faster.
Unfortunately, there seems to be no option to compute percentiles with rangestat.

Thanks for any input!

coefplot - creating a vertical coefficient plot with multiple lines across the entire x-axis

$
0
0
Dear Stata users,

I'm having a problem creating a coefficient plot with Stata that produces vertical lines across the entire x-axis.

The code works fine, but it somehow clusters my coefplot-lines to the left of the x-axis, leaving most of the axis unused.

It is mainly this line of code I am concerned with, which uses some data produced in the loop at the bottom of my text.

Code:
coefplot `all_stores',  keep(sibling_1_female) saving(COEFPLOT_ALL_SEPARATELY_`sex', replace) ytitle("") xlabel(`xlabels') xscale(range(1(1)6)) ylabel(-3(1)3) yscale(range(`lower_bound' `upper_bound')) scheme(s2mono) graphregion(color(white)) vertical yline(0) legend(off)        //create one coefficient plot for each dataset
And the code above produces the following figure:
Array

What I want it to look like, though, is one vertical line aligned with each of the x-axis labels. So that the first line is above BHPSUKHLS, the second is above CFPS, the third above HILDA, etc.

Does anyone know what I'm missing? Maybe there is an option I should be adding to the coefplot function?

Please let me know if anything is unclear and I will try to provide more detailed information as needed.

Many thanks,
Thomas


Below is the loop that creates the variables/data that's used in the code above

Code:
replace age = .                        //first, we need to make sure we use the appropriate age variable: we replace age with the respective age variable for each dependent variable !!!
replace age = age_risk_tol
replace age = age_gamblerisk_tol if age == .
replace age = age_finarisk_tol if age == .
replace age_squared = age^2
replace age_cubed = age^3

local depvar Z_risk                //then, we need to make sure we use the appropriate dependent variable: define the local to equal to the name of the respective dependent variable !!!
global depvar Z_risk

quietly reg $depvar female if $sample_restrictions , robust
quietly ttest $depvar if $sample_restrictions, by(female)
global mmean= round(r(mu_1), .001)
global wmean= round(r(mu_2), .001)
global msd= round(r(sd_1), .001)
global wsd= round(r(sd_2), .001)
local pval= r(p)

global title         "Risk Tolerance [0-10] "    
global summary        "Summary statistics of first-borns' $title"

global dataset        BHPSUKHLS CFPS HILDA IFLS LISS MCS NLSY79 SOEP                 //must also change the list of datasets for each dependent variable !!!


forval i=1/2{    //begin gender loop
    local sex : word `i' of $gender
    local j = `i' -1

global Title        "First-born `sex''s $title"

//all datasets combined in one.
quietly reg $depvar sibling_1_female $good_controls if female==`j' & $sample_restrictions, robust
est store ALL_`sex'
outreg2 using `depvar'_`sex', title(The Effect of Having a Younger Sister on $Title ) ctitle(`sex') stats(coef se ci) paren(se) bracket(ci) bdec(3) sdec(3) cdec(3) adjr2 nocons addtext(Dataset, ALL) nonotes addnote(*** p<0.01 ** p<0.05 * p<0.1. , Robust standard errors in parentheses and confidence intervals in brackets. , "Controlling for subject's age, age spacing between siblings and parents, " , $summary, Men: mean=$mmean stdev=$msd . Women: mean=$wmean stdev=$wsd . DiM p-value=$pval .) replace

local lower_bound = ( _b[sibling_1_female] - invttail(e(df_r),0.025)*_se[sibling_1_female] ) *2        //must change 0.025 to another value if not estimated for level(95)
local upper_bound = ( _b[sibling_1_female] + invttail(e(df_r),0.025)*_se[sibling_1_female] ) *2

coefplot ALL_`sex', title(The Effect of Having a Younger Sister on $Title ) xtitle("All datasets") keep(sibling_1_female) saving(COEFPLOT_ALL_`sex', replace) ytitle("") yscale(range(`lower_bound' `upper_bound')) scheme(s2mono) graphregion(color(white)) vertical yline(0) legend(off)
    
    //for each dataset separately.
    forval i=1/6{    //begin dataset loop        --NOTE: must change the loop counter to equal the amount of datasets used (see dataset list above)
    local data : word `i' of $dataset
    
    quietly reg $depvar sibling_1_female $good_controls if female==`j' & dataset=="`data'" & $sample_restrictions, robust
    est store `data'_`sex'                //stored regression results used for coefplot below
    outreg2 using `depvar'_`sex', ctitle(`sex') stats(coef se ci) paren(se) bracket(ci) bdec(3) sdec(3) cdec(3) adjr2 nocons addtext(Dataset, `data') append excel        //append the output into the same Excel file
    
    coefplot `data'_`sex',  keep(sibling_1_female) saving(COEFPLOT_`data'_`sex', replace) xtitle("`data'") ytitle("") nolabels yscale(range(`lower_bound' `upper_bound')) scheme(s2mono) graphregion(color(white)) vertical yline(0) legend(off)        //create one coefficient plot for each dataset
    
    local coefplotcombine "`coefplotcombine' COEFPLOT_`data'_`sex'.gph"
    local all_stores `all_stores' `data'_`sex'
    local xlabels `xlabels' `i' "`data'"

    }    //end dataset loop incl controls
}    //end gender loop

Clusteff command for panel data

$
0
0
Dear Statalisters, could anyone tell me whether the command "clusteff" can also be applied to panel data? I have been looking everywhere and I could not find a conclusive answer.
Any help would be much appreciated!
Kind regards
Karolin

Survey data: Make a decision between Poisson regression and Negative binomial regression

$
0
0
Dear Stata users,

I hold a question here: I am using survey data and deciding which regression model to use for a count variable. Right now I narrow the options to Poisson regression and Negative binomial regression. However, I don't know the Stata code to test the equality/inequality of mean and variance of the observed counts. Can anyone let me the code? Thanks a lot!

Piccolo

Marginal probability change induced by a one-standard deviation change in the independent variables (probit)

$
0
0
Hello everyone,

I would like to run a probit regression in which target indicates whether a firm is a hedge fund target and which equals 1 if it is a target and 0 otherwise (it is my dependent variable). My explanatory variables are mkvalt (market value), B/M (market to book ratio), SALESGROWTH, CF (cash flow over assets), LEV (book leverage), DIVYLD (dividend Yield), RND (R&D over assets), CAPEX (capital expenditures over assets).

I would like to know the impact on probability of being targeted of one standard deviation increase/decrease in predictor variables.
Is there any way of doing so?

Thank you in advance!

Stacked charts

$
0
0
Hi, I am trying to represent my data in a stacked chart. I have data on export and import. What I am trying to show is the percentage of the observations in which i have observations for both export and import, either export or import and no observations at all. To be more precise, I'm trying to visualise it in a similar manner as Helpman et al. (2008) does in their "Estimating Trade Flows: Trading Partners and Trading Volumes", which looks like this
Array

I have created two three variables. TradeBoth is = 1 given that both export and import has a positive value. TradeOne is 1 given that one of export or import has a positive value, but the other one is unobserved. A variable "missing" that is 1 given that both export and import is missing. If these variables are not 1, then I have them as "."


I tried initially to run something like this to then built upon it. However, I can't get percent to work.

graph bar (percent) TradeBoth TradeOne missing, over(year) stack title("Data distribution")
Im trying to do this through year 1996 to 2017.

Thank you for any helpful answers!

Sincerely,
Jonathan

McDonald's omega for measurement model

$
0
0
With a view to SEM, I want to evaluate a measurement model with six items (expressing behaviour frequencies) and 596 observations. The items are heavily skewed and moderately to weakly correlated (Spearman rho 0.01-0.5). To assess model reliability I have so far used Cronbach's alpha which is easily obtained through the alpha command. However, McDonald's omega may be a more appropriate statistic. How can I calculate it, and what could be the citation to justify it? I've found a recommendation to calculate it as [(sum of factor loadings)^2 / (sum of uniquenesses + (sum of factor loadings)^2)], but I can't figure out if the sum of loadings should include only the first factor, or several factors, and I still lack a relevant citation.

Best regards,
Jan

How to prevent incorrect (Windows) file time attribute of graphic files produced with graph export

$
0
0
saving a graph using graph export for the first time under a name such as graphtime.wmf saves the graphic file (windows metafile format in this case) with correct file date and time as is displayed e.g. in windows explorer.
When re-runing the same code the graph file is updated properly but the file date and time is not updated.
How can I make Stata to update the file attribute filetime each time a graphic file is replaced by an updated version?

Code:
sysuse auto
hist foreign,  note("Filename: graphtime.wmf, Filedate and time: `c(current_date)' `c(current_time)'")  
graph export "graphtime.wmf", replace
I am using
Stata/SE 15.1 for Windows (64-bit x86-64)
Windows 10 Pro, Version 1809

I'd be grateful to receive a hint
Stephan Brosig

set n(#) in a kdensity loop with different sample sizes

$
0
0
Hi Statalist, I am new here but this forum is been helpful several times.

I am using stata15, and there's my problem:

I have a large amount of datasets, and I am looping them in order to produce kdensities and compare them. Each dataset has a different sample size.
I am entering the following command inside my loop:

kdensity variable, generate(x y) n() nograph

I am trying to find a way to put automatically the sample size inside the option n().

If I write:

count
kdensity variable, generate(x y) n(r(N)) nograph

stata says: option n() not allowed r(198);
It happens the same if I try to use a local with r(N)

How can I do? I would avoid to make them one-by-one



How to interact a variable with Fixed Effect. Help!

$
0
0
Hi!
I am working on my final paper and could need som help.
What i'm trying to do is to investigate if a tax who was implemented on 2018 second quarter had any effect on total passengers traveling. I want to include GDP for the start period 2015 first quarter and interact that variable with my quarterly fix effects. That is, if the value of GDP for a country was 1300 in the first quarter of my data, I want to interact that with each quarterly dummy - this would give me a value that will exist for each quarter. How can I do it in Stata?

SEM for longitudinal Data!

$
0
0
Hi to all,
I am new to stata and have to build an SEM for my research, but i am struggling to figure out how to get it done.
my research is a longitudinal one conducted on 3 organizations. Employees filled a questionnaire of 35 questions, where each question is a var in stata and each set of vars represents a factor for me (saying from question 1 to 8 is factor 1) and so on. The employees filled the questionnaire in 2017 and again in 2018, but i don't know the identities of the employees i only have the company as an identity. I tried to build the SEM where the factors (i have 5 are considered the latent vars for me) and linked them to the questions, but i am not how to deal with the year issue. I am considering the CFA for my data analysis.
Can anybody please help me?

Dummy variable interpretation

$
0
0
Code:
     Source |       SS           df       MS      Number of obs   =     1,315
-------------+----------------------------------   F(28, 1286)     =    273.58
       Model |  946328.028        28  33797.4296   Prob > F        =    0.0000
    Residual |  158872.156     1,286  123.539779   R-squared       =    0.8563
-------------+----------------------------------   Adj R-squared   =    0.8531
       Total |  1105200.18     1,314  841.096031   Root MSE        =    11.115

------------------------------------------------------------------------------
       enrol |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         CPI |   .1847218   .0313612     5.89   0.000     .1231971    .2462465
 newmarriage |    -.32916   .0818746    -4.02   0.000    -.4897824   -.1685375
      depend |  -.2282612   .0387019    -5.90   0.000     -.304187   -.1523355
     edspend |   2.111124   .2682445     7.87   0.000     1.584879    2.637368
   mortality |  -.3557503   .0344384   -10.33   0.000     -.423312   -.2881887
 newfemteach |   .3355659   .0259151    12.95   0.000     .2847253    .3864065
       urban |   .1179031   .0243665     4.84   0.000     .0701007    .1657055
       lnGDP |   2.257544   .9004769     2.51   0.012     .4909787    4.024109
          UM |  -2.232604   1.267144    -1.76   0.078    -4.718501    .2532922
          LM |  -1.696402   1.918191    -0.88   0.377    -5.459528    2.066725
           L |  -2.078713   2.783371    -0.75   0.455    -7.539159    3.381733
             |
        year |
       2001  |   .2627674   1.840499     0.14   0.886    -3.347943    3.873477
       2002  |  -.1156184   1.834911    -0.06   0.950    -3.715366     3.48413
       2003  |   .1621663   1.842056     0.09   0.930    -3.451598     3.77593
       2004  |  -.7602889   1.842209    -0.41   0.680    -4.374354    2.853777
       2005  |  -.8418906    1.84367    -0.46   0.648    -4.458822    2.775041
       2006  |  -1.357138   1.844013    -0.74   0.462    -4.974742    2.260465
       2007  |  -1.241886   1.846762    -0.67   0.501    -4.864884    2.381112
       2008  |  -1.327063   1.850116    -0.72   0.473    -4.956639    2.302513
       2009  |  -1.505653   1.861639    -0.81   0.419    -5.157836    2.146531
       2010  |  -.7093271   1.864264    -0.38   0.704     -4.36666    2.948006
       2011  |  -.2339199   1.863438    -0.13   0.900    -3.889632    3.421792
       2012  |  -.5687192   1.864502    -0.31   0.760    -4.226518     3.08908
       2013  |     2.6685   1.867584     1.43   0.153    -.9953454    6.332345
       2014  |   3.751158    1.87445     2.00   0.046     .0738434    7.428473
       2015  |   4.425729   1.869453     2.37   0.018     .7582165    8.093242
       2016  |   5.518136   1.867712     2.95   0.003      1.85404    9.182233
       2017  |   5.279886   1.869216     2.82   0.005     1.612839    8.946933
             |
       _cons |    50.3543   9.066823     5.55   0.000     32.56691    68.14169
-----------------------------------------------------------------------------

Help with regression of (unbalanced) panel data - xtreg, statsby/regressby generating difference results

$
0
0
Hi Statalist!

First of all, thank you for all the help you give on a daily basis! It's been very helpful to lurk here, but now I find myself in a situation where I simply need to ask.

I'm trying to regress Return (depvar) on RMRF SMB HML LIQ (indepvars) using panel data. My panel data set is companynum and date (monthly data 2007-03 to 2016-12), however, it is unbalanced.

Now, when I use the different xtreg options (fe, re, et cetera) and xtgls, I get different results than when I use statsby or regressby. Furthermore, my regression results are quite different than expected, but less so when using statsby or regressby.

I'm now asking: 1) which method to use (xt or statsby), and 2) if I'm using them incorrectly?
I'm posting a smaller sample of my data and my regression commands. Not sure if the dataex output has been formatting correctly for your use, but the columns are date, companynum, RMRF, SMB, HML, LIQ. Date was reformatted in dataex, but is normally in a YYYYmm format (e.g. 2007m3).

The different commands I've been using:
. xtreg Return RMRF SMB HML LIQ, re vce(robust)
statsby _b, by(companynum): reg Return RMRF SMB HML LIQ, robust noconstant

The means of my coefficients from statsby are larger than the coefficients from xtreg. Shouldn't the mean of the coefficients from statsby be equal to the coefficients in the other reg commands, or is it wrong to think of it that way?

Please let me know if you find anything that is incorrectly posted and I will correct accordingly.
Many thanks!

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input float(date companynum) double(RMRF SMB HML LIQ)
677 1  -.03567645   .031857625   .035097148 -.0038981533515944945
678 1   .05532546  .0022137638  .0075254366  .0028282892888514353
679 1  .024670409  .0083233304  .0063983239   .048738782080504486
680 1  .017835448   .025180005  -.006078342    .02920958419799065
681 1 -.009465334  .0084097693   .012648277  -.011058678221505731
682 1  .014410426  .0064111673  -.012598168   -.01494227358556188
683 1  .029204069  -.026093747   .018178428  -.014232915409939827
566 2  .052961133   .021395301 -.0039642137  -.036748350946197265
567 2  .064811036   -.02813188  -.015092619  -.017005912917779073
568 2  .016662389  -.046779487   .004310017   .003553448955587013
569 2 -.024995016   .011651467  -.011661232  -.023518709608667893
570 2 -.017945765  .0046412922    .03236885  .0056249091016424176
571 2 -.029406879 -.0016394301 -.0072570862   .025491101902319215
end
format %tm date


Logistic Regression with Clustering

$
0
0
Hello users,

I am new to the forum, but hoped you would be able to settle a small dispute we are having. Currently we are analyzing data from a hospital setting. Patients have been recruited based on the presence of respiratory symptoms (yes/no) e.g. cough, sore throat, inspiratory wheezing etc.
The patients have been recruited from 6 different hospitals over the course of four influenza seasons (winter of 2010, 2011, 2012, 2013). On this, we are planning to run a logistic model that investigate the associations of the different respiratory symptoms and the disease of interest (outcome e.g Human Metapneumovirus). Since it is a hospital setting, all patients are sick and the once that are negative to Human Metapneumovirus will have some other respiratory disease. Which other disease(s) that are most prevalent among the “controls” will differ somewhat depending on the season (e.g season one was a strong influenza season, while season two was more dominated by Respiratory Syncytial Virus).

The discussion have ended up with four different views on how to address this problem.
  1. Keeping it simple: Using a standard logistic regression with hospital and season as independent variables in the model:
    logistic disease cough i.hospital i.season
  2. Clustering 1: Taking the possible clustering arising from having different hospitals into account by using melogit (or possibly xtgee).
  3. Clustering 2: Taking the possible clustering arising from both different hospitals and season into account by using melogit
  4. Clustering 2: Since it is the same 4 season and same hospitals in all seasons there was a suggestion of viewing this as a “two-way crossed random effects" situation (not sure how to do this in STATA).

This is still at the discussion level, so unfortunately I cannot give an example of the data. But I am interested in hearing what other outside our small cluster think before the argument heats up when the data arrives…

Best wishes,
Jon
Viewing all 65129 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>