Quantcast
Channel: Statalist
Viewing all 65052 articles
Browse latest View live

Complex merge or perhaps other strategy?

$
0
0
Dear all,

Brief background. Some areas in a resort have employed a new service strategy. The main goal is to see if this new service reduce the number of guest complaints. So the aim is to compare rates of complaints before and after the “program”.

I have 2 datasets. One is the area movement of the guests (a guest can move to different rooms/areas during their stays) and a complaint dataset. The areas started the new service program at different times. I used the movement data to flag guests that stay in the program areas after the start date of the program so the moving file only show movement that happened in the program areas after the start date of the program (participants).

I could flag which complaints happened before or after just using the start date of the program, the problem is that after the program a guest can be moved to an non-program area, so if the complaint happened during that period (non program area), we should not count that complaint. For example, this guest was in a non-program area during 17 and 18 of may (after the start date of the program) and report a complaint. Then we was sent again to a program area.

Stay Area In Out Programdate
33 Lower1 12-May-16 13-May-16 9-May-16
33 Fac1 13-May-16 16-May-16 9-May-16
33 Lower1 19-May-16 30-Jun-16 9-May-16

Stay ComplaintID Date
33 1 10-Apr-16
33 2 22-Apr-16
33 3 28-Apr-16
33 4 15-May-16
33 5 17-May-16
33 6 25-May-16

What would be the right merge strategy for this task? Is there a way to merge based on time parameters so the first 3 complaints happened before the program and complaint 4 correspond to the second movement and complaint 6 correspond to the third movement? Or perhaps just append the datasets and then work my way to assign the complaints to the specific movements?

Code:
clear
input byte Stay str6 Area int(In Out Programdate)
33 "Lower1" 20586 20587 20583
33 "Fac1"   20587 20590 20583
33 "Lower1" 20593 20635 20583
end
format %tddd-Mon-YY In
format %tddd-Mon-YY Out
format %tddd-Mon-YY Programdate
Code:
clear
input byte(Stay ComplaintID) int Date
33 1 20554
33 2 20566
33 3 20572
33 4 20589
33 5 20591
33 6 20599
end
format %tddd-Mon-YY Date
Thank you

Computing and Storing Variance, Covariance, ANOVA

$
0
0
Hi Everyone,

I have a very straightforward question, but I am stuck, therefore would appreciate your help on this. I want to calculate and store variance/covariance matrices and ANOVA numbers (e.g., SSA, SSE, MSA, MSE) in the attached file. I would appreciate if you help me in getting the Stata code for this.

Thanks,

Moeen

Multilevel Model with xtimixed command on Stata, how do i weight it?

$
0
0
Good morning fellows, currently i'm working on a multilevel model in Stata and the data base i'm using is a House hold survey, one per each year. My question is if i should weight the model with the expasion factor that comes with each survey, or not. When is it proper to do so?
Thank you!

Error - "repeated time values in sample" - varsoc command

$
0
0
Hi everybody! I am using strongly balanced panel data of 14 countries from 1870-2008 containing information on financial aggregates and economic indicators.

egen ccode = group(iso)
tab iso, gen(ccode)
sort ccode year
xtset ccode year


panel variable: ccode (strongly balanced)
time variable: year, 1870 to 2008
delta: 1 unit

When trying to determine the optimal lag length between credit and real GDP, the following error message appears: "repeated time values in sample".

After following the instruction given here
http://www.stata.com/support/faqs/data-management/repeated-time-values/ , I find that my panel is correctly identified and does not contain duplicates. Next, I deleted the missing observations for the variables to be included in the varsoc command, which reduces my sample by 13.8%. Nonetheless, the error message remains.





Are there any other reason for this error warning???


Update: reducing the number of countries from 14 to 1 makes the command work perfectly fine

Marginal effect of a median change?

$
0
0
Dear Statalist,

I'm currently writing my MSc thesis. I estimate the effect of a tax increase on investment. The data is cross-country and at the country-level.

I run the command

reg inv LD.tax i.year i.id_country other_controls,cluster(id_country)


I do not fully understand how to calculate the marginal effects, especially because I was asked to have a graph on it showing the median tax increase.
Am I correct in doing:

lincom 'median of LD.tax' * LD.tax

Thank you very much in advance.

Greetings from Cologne
Lennart

How to save means as a new dataset/matrix

$
0
0

Hey there, I was wondering how to store estimates of mean values of several subpopulations as a new variable in my dataset or as matrix or anything that I can load insert into another dataset. This is the command for the mean-values I am using: mean trust, over(region) I then get all the mean values for trust in all the 200 different regions and now want to use these in another dataset. How do I use them? Storing them doesn't get me any further. Ideally I'd like to ssave them seperately as a .sav file. Is there a way of doing so? Thank you very much for helping. Chris

Need help on xtgraph!

$
0
0
Hello Statalist,

I found the problem on the xtgraph function that I need to show a graph for my panel data with medians that does not include the lower and upper bound. Instead, I need to show the 25th and 75th percentiles (like boxplot). But xtgraph doesn't allow me to have that option, as far as I know. Could you please help me figuring out how to show it with xtgraph function?

Thank you!
Suparit

Get outreg2 to report AIC in summary statistics

$
0
0
Greetings,

I am using outreg2 to create a table presenting multinomial logit results and would like to add the AIC to the summary statistics. In keeping with advice I picked up from an archived Statalist post (on the same question but for outreg - not outreg2)
HTML Code:
http://www.stata.com/statalist/archive/2013-03/msg00216.html
and example 9 in the help for outreg2
HTML Code:
http://repec.org/bocode/o/outreg2
I have used the following code but have tried to substitute the "addstat" option for the "addrows" option.

Code:
mlogit DepVar IndVar1 IndVar2, base(2) r
est store DV1M1, title(Model 1)
estat ic
mat es_ic = r(S)
local AIC: display %4.1f es_ic[1,5]
outreg2 using Table_Dev_Pref.xls, dec(3) addstat(AIC, 'AIC') groupvar(IndVar1 IndVar2) replace

Stata returns the error:
unknown function ()
r(133);

How can I correct this code to get the AIC into my outreg2 table output? Alternatively, is there an easier or other way to get the AIC into my outreg2 table?

Kind regards,
Christiana

Problem with forvalues command and return calculations

$
0
0
First, I use Stata 12 SE for Mac.

This is my dataex extract to illustrate my data:

input str12 acquirerisin float(statatime announcementdate set group_id datenum td dif event_window estimation_window predicted_return id spotstockreturn spotmarketreturn)
"AT0000652011" 14609 15558 1 1 1 678 -677 0 0 . 1 . .
"AT0000652011" 14612 15558 1 1 2 678 -676 0 0 . 1 3.943417 0
"AT0000652011" 14613 15558 1 1 3 678 -675 0 0 . 1 -.2624908 0
"AT0000652011" 14614 15558 1 1 4 678 -674 0 0 . 1 -1.560981 0
"AT0000652011" 14615 15558 1 1 5 678 -673 0 0 . 1 0 0
"AT0000652011" 14616 15558 1 1 6 678 -672 0 0 . 1 4.4841413 0
"AT0000652011" 14619 15558 1 1 7 678 -671 0 0 . 1 1.2232277 0
"AT0000652011" 14620 15558 1 1 8 678 -670 0 0 . 1 -1.5702645 0
"AT0000652011" 14621 15558 1 1 9 678 -669 0 0 . 1 -.21062852 .7025968
"AT0000652011" 14622 15558 1 1 10 678 -668 0 0 . 1 .25636232 -.3601329
"AT0000652011" 14623 15558 1 1 11 678 -667 0 0 . 1 .7197889 .2818298
"AT0000652011" 14626 15558 1 1 12 678 -666 0 0 . 1 -.318253 -.9759812
"AT0000652011" 14627 15558 1 1 13 678 -665 0 0 . 1 -.7404442 -2.4385405

I m trying to perform an event study to compute cumulative abnormal returns and in the process i ran the following code:
forvalues i=1(1)61 {
l id acquirerisin if id==`i' & dif==0
reg spotstockreturn spotmarketreturn if id==`i' & estimation_window==1
predict p if id==`i'
replace predicted_return = p if id==`i' & event_window==1
drop p
}

However, the forvalues loop always aborts and returns the following message:
(option xb assumed; fitted values)
(250500 missing values generated)
(41 real changes made)

+-------------------+
| id acquirerisin |
|-------------------|
50297. | 13 DK0010274414 |
+-------------------+
no observations
r(2000);

end of do-file

r(2000);


I would like to compute as many CARs as possible (which is done after the forvalues loop), but even deleting the ID in question will not solve my problem, which is that I can only compute the CARs for 12 out of 61 entities in my dataset.

Help would be greatly appreciated.

Thank you guys in advance.

I look forward to the advice of the Stata cracks out here on Statalist.


Does 'version' command remove need for 'saveold'?

$
0
0
Does using the "version" command eliminate the need to use the "saveold" command?

That is, if I need to make some v12 data using v14, do I have to change every save command to saveold, or will one "version 12" at the top of the .do file mean that every dataset created by that .do file will automatically be saved as v12 style data?

Post Estimation for Variance and standard deviation Multilevel model

$
0
0
Dear all,

I ran some multilevel analyses. Below you can find the commands:

. xtgls TotGLStressWeek Gender Workinghours Previousexp Arrivaldatemonths Age, nmk


However, I need to find out both the variance (level 1 and 2) and the standard deviance. Is there anyone who knows what command I should use?


Exceeded Memory for Fisher's Exact Test

$
0
0
I got the following error notice when trying to run a Fisher's exact test: "exceeded memory limits using exact(1); try again with larger #"
Sample size is only 120, but some cell counts are 0. I thought this was precisely what Fisher's exact was for?

This neophyte is grateful for the help.

How can a CFA model estimated with SEM yield an SRMR>1?

$
0
0
I just estimated a simple CFA model with SEM, and obtained an SRMR (standardized root mean squared residual) of 2.212 . Unless I am mistaken, it should have a maximum value of 1.0. How is this possible?

The data is on a sample of 742 observations, and items are scored on a 10 pt scale

Command:
sem ( pih1-pih12 <- PIH), method(ADF)

The factor loadings look reasonable (no Heywood cases)

The model clearly does not fit well (not unexpected), and here are the fit indices:

-> estat gof, stats(all)

----------------------------------------------------------------------------
Fit statistic | Value Description
---------------------+------------------------------------------------------
Discrepancy |
chi2_ms(54) | 319.043 model vs. saturated
p > chi2 | 0.000
chi2_bs(66) | 748.898 baseline vs. saturated
p > chi2 | 0.000
---------------------+------------------------------------------------------
Population error |
RMSEA | 0.081 Root mean squared error of approximation
90% CI, lower bound | 0.073
upper bound | 0.090
pclose | 0.000 Probability RMSEA <= 0.05
---------------------+------------------------------------------------------
Baseline comparison |
CFI | 0.612 Comparative fit index
TLI | 0.526 Tucker-Lewis index
---------------------+------------------------------------------------------
Size of residuals |
SRMR | 2.212 Standardized root mean squared residual
CD | 0.926 Coefficient of determination
----------------------------------------------------------------------------


Of note,many of the standardized residuals could not be calculated, and thus are missing.

Fixed Effects Dummy Variables _Omitted and Best Model

$
0
0
Hi Stata Intellectuals,

I have one more quick question on fixed effect. I have a panel data with firm level characteristics over 10 years. I am currently running a code, but all fixed effect control dummies for year and industry are collinear (all of them are omitted). What is the reason for that? I found some previous discussions in this forum for one (witch is normal) or two omitted dummy variables, but not for all.

Here is the code:
(1)
xtset firm year
xtreg y x1 x2 x3 i.year i.industry, fe


in this case all dummies for year and industry are all omitted

When I run (as suggested in a previous discussion) :
(2)
xtreg y x1 x2 x3 i.year i.industry

the variables are not omitted, but if I don't specify fixed effect or random effect what is Stata running when I use xtreg?

An alternatively suggested model is the following:
(3)
egen both= group(year indu)
xtreg y x1 x2 x3 i.both, fe


or finally
(4)
xtreg y x1 x2 x3 , i(both) fe


In summery: when I use #(1) all dummies are omitted, why? What is xtreg running without specifying fe or re? What is the difference between 3 and 4? What is the recommended approach?


Thank you

Marco

Listing observations and then saving them into a string variable

$
0
0
Hello dear forum members,

I am seeking your help with the following task.

So, I have a list of MSA (metropolitan statistical area) codes (N=104) and also a list of corresponding ZIP codes (N="a lot"). I use the following command to list ZIP codes that fall under a given MSA:

HTML Code:
 list ZIPCODE if msa == 10420, noobs clean

    ZIPCODE  
      44056  
      44067  
      44087  
      44201  
      44202  
      44203  
      44210  
      44211  
      44216  
      44221  
      44222  
      44223  
      44224  
      44231  
      44232  
      44234  
      44236  
      44237  
      44238  
      44240  
      44241  
      44242 
I further need to save the observed ZIP codes for each MSA into a new string in the following format "44056, 44067, 44087, ..., 44242".

Could you please suggest a code to execute such task.

Thankfully,
Anton

Fun loop challenge

$
0
0
Dear all,

I am trying to create a loop using two local arrays and numbers.

Here is what I have so far:

Code:
svyset    [pweight = trendwt], strata (nis_stratum) psu (hospid)

local var1 "pneumoall pneumo sepsisall sepsis otherall other mrsa sensitive resistant"
localoutput`""S. aureus pneumonia" "MRSA pneumoniae" "S. aureus Septicemia" "MRSA Septicemia" "S. aureus Others" "MRSA Others" "MRSA Totals" "Total MRSA Infections" "Total MSSA Infections""'
tempfile alldata2010
save "`alldata2010'"

capture confirm file "Results.xls"
foreach name of local var1 {
                    forvalues i=1/30 {
                                    if _rc!=0 {
                                    svy: total dischgs
                                    regsave, ci
                                    gen year=2010
                                    replace var="Total Discharges"
                                    export excel using "Results", firstrow(variables)
                                    }
                                    else {
                                    use `alldata2010', replace
                                    svy: total dischgs, subpop(`"name"')
                                    regsave, ci
                                    gen year=2010
                                    replace var=`"output"' if
                                    export excel using "Results", sheet("Sheet1") sheetmodify cell(A`i')  
And ultimately, the output will look something like this on excel. (Numbers are not important, I just wanted to show what the final product would look like).
var coef stderr ci_lower ci_upper N year
Total Discharges 3225712 734.9375 38188 3836 7441 2010
S. aureus pneumonia 2345.05469 365.130859 836.6875 9601.42188 7441 2010
MRSA pneumonia 348.12891 233.0855 599.40625 6806.85156 7441 2010
S. aureus Septicemia 12221.52344 244.5088 864.78125 9560.26563 7441 2010
MRSA Septicemia 11248.63672 1853.1337 456.69531 50.57813 7441 2010
S. aureus Others 33044.9688 146.44727 4053.875 5236.0625 7441 2010
MRSA Others 32714.9375 745.66748 2963.0938 336.7813 7441 2010
The first part of my 'if' statement is inserted to create the excel file with the first row, Total Discharges. If the excel file already exist, it is supposed to move on to the `else' section, and for each element of my array var1, it should calculate the totals and put it in the next line of my excel file. The tricky part is replacing "var" to equal the local macro "output" depending on which `var1' was being calculated in the subpop. For example, if the loop ran svy:total dischgs, subpop(pnuemo), I would like for var to be replaced to equal "MRSA pneumoniae" (the first element in BOTH arrays). In PHP / Java, elements of an array can be identified with brackets; replace var=`"output[2]"' if `'"name[2]". Is it possible to do something like this in Stata or get around this in another way?

Also, how can I change the export excel command to keep increasing by 1, i++, until the loop itself finishes without knowing how many times the loop will need to run. I'm not sure how many excel rows will need to be added, therefore I don't think a statement like i=1/30, makes a lot of sense.

Thanks so much for your help.



Testing regression coefficients using xtmixed

$
0
0
I use xtmixed to estimate a mixed-effects multilevel model with
Code:
xtmixed dep var1 var2 var3 var4 var5 var6 || clustervar, covariance(independent)
Afterwards, I want to test the joint significance of the first two variables and the constant using
Code:
test _cons var1 var2
Does the Wald test make sense in this case? And why does t:_cons not work for the constant term? Using _cons instead produces

Code:
[t]_cons = 0
[lns1_1_1]_cons = 0
[lnsig_e]_cons = 0
[t]var1 = 0
[t]var2 = 0

chi2(   4) = 25.51
Prob > chi2 = 0.0000
However, I don't want to include [lns1_1_1]_cons = 0 and [lnsig_e]_cons = 0 in the test...

Thanks for your help!

How to model intra-EU trade dummy variables for Gravity Model studying EU Membership effect on trade flows

$
0
0
Hi,

I am estimating a gravity model using an unbalanced country-pair panel data set, looking at European countries from 1948-2006, with about 48,000 observations. The dataset is the one used in Head et al (2010) “The erosion of colonial trade linkages after independence”. Anyway, I am looking at the EU membership effect, i.e the gains in trade due to EU membership.

I am struggling to model the intra-EU trade dummys to estimate the EU membership effect. The problem is that European countries joined the EU at multiple dates so that it is difficult to find a single EU membership effect.

How I have gone about coding it so far, with little success, is by generating multiple intra-EU dummy variables and adding new members when new expansions were created. The following is the code I used. (Also, iso_o represents exporter and iso_d is importer, so that the dummys equal to 1 if both the exporter and importer are EU members).

* Original Members
gen intra_EU_1 = (iso_o=="FRA" | iso_o=="DEU" | iso_o=="ITA" | iso_o=="BEL" | iso_o=="NLD" | iso_o=="LUX") & (iso_d=="FRA" | iso_d=="DEU" | iso_d=="ITA" | iso_d=="BEL" | iso_d=="NLD" | iso_d=="LUX")
replace intra_EU_1 = 0 if year<1964

*
* Original Members + First Expansion ( i.e plus Ireland UK Denmark and Greenland)
gen intra_EU_2 = (iso_o=="FRA" | iso_o=="DEU" | iso_o=="ITA" | iso_o=="BEL" | iso_o=="NLD" | iso_o=="LUX" | iso_o=="IRL" | iso_o=="GBR" | iso_o=="DNK" | iso_o=="GRL") & (iso_d=="FRA" | iso_d=="DEU" | iso_d=="ITA" | iso_d=="BEL" | iso_d=="NLD" | iso_d=="LUX" | iso_d=="IRL" | iso_d=="GBR" | iso_d=="DNK" | iso_d=="GRL")
replace intra_EU_2 = 0 if year<1973

*
* Original Members + First Expansion + Second Expansion (i.e plus Greece)
gen intra_EU_3 = (iso_o=="FRA" | iso_o=="DEU" | iso_o=="ITA" | iso_o=="BEL" | iso_o=="NLD" | iso_o=="LUX" | iso_o=="IRL" | iso_o=="GBR" | iso_o=="DNK" | iso_o=="GRL" | iso_o=="GRC") & (iso_d=="FRA" | iso_d=="DEU" | iso_d=="ITA" | iso_d=="BEL" | iso_d=="NLD" | iso_d=="LUX" | iso_d=="IRL" | iso_d=="GBR" | iso_d=="DNK" | iso_d=="GRL" | iso_d=="GRC")
replace intra_EU_3 = 0 if year<1981

*
* Original Members + First Expansion + Second Expansion + Third Expansion (i.e plus Portugal and Spain)
gen intra_EU_4 = (iso_o=="FRA" | iso_o=="DEU" | iso_o=="ITA" | iso_o=="BEL" | iso_o=="NLD" | iso_o=="LUX" | iso_o=="IRL" | iso_o=="GBR" | iso_o=="DNK" | iso_o=="GRL" | iso_o=="GRC" | iso_o=="PRT" | iso_o=="ESP") & (iso_d=="FRA" | iso_d=="DEU" | iso_d=="ITA" | iso_d=="BEL" | iso_d=="NLD" | iso_d=="LUX" | iso_d=="IRL" | iso_d=="GBR" | iso_d=="DNK" | iso_d=="GRL" | iso_d=="GRC" | iso_d=="PRT" | iso_d=="ESP")
replace intra_EU_4 = 0 if year<1986

*
* Final Model with all members upto the fourth enlargement (i.e Plus Austria, Sweden and Finland)
gen intra_EU_5 = (iso_o=="FRA" | iso_o=="DEU" | iso_o=="ITA" | iso_o=="BEL" | iso_o=="NLD" | iso_o=="LUX" | iso_o=="IRL" | iso_o=="GBR" | iso_o=="DNK" | iso_o=="GRL" | iso_o=="GRC" | iso_o=="PRT" | iso_o=="ESP" | iso_o=="AUT" | iso_o=="SWE" | iso_o=="FIN") & (iso_d=="FRA" | iso_d=="DEU" | iso_d=="ITA" | iso_d=="BEL" | iso_d=="NLD" | iso_d=="LUX" | iso_d=="IRL" | iso_d=="GBR" | iso_d=="DNK" | iso_d=="GRL" | iso_d=="GRC" | iso_d=="PRT" | iso_d=="ESP" |iso_d=="AUT" | iso_d=="SWE" | iso_d=="FIN")
replace intra_EU_5 = 0 if year<1995

Using this method, I gather very confusing results. Some of the intra-EU dummys' coefficients are positive and some are negative. Also, many of the results are insignificant.
I know that studying EU membership has been done many times before, and I was just wondering if anyone could provide me with advice on how to go about generating intra-EU dummy variables.

Thanks, Amran

Cox model, PSM after Multiple imputation

$
0
0
Dear all,

I met a problem on cox models after multiple imputation.
In stata, I use mi chained to get imputed datasets. The censored time variable (dur) was used in the imputed model.
When I used "mi stset dur, fail(censor=1)", it did not work because it said "dur was registered as imputed".
This means I have to do the analysis for each imputed dataset and then combine them using Rubin's rules.
But how? I am still not figure out how to use Rubin's rules to combine estimates, standard error, and p-vlaue considering I cannot use mi estimate command.

Thank you very much

collapse of all obs vs collapse of a fraction of obs

$
0
0
Hello. Could anyone help me figure out why the two commands below end up differently? Each row of my dataset corresponds to a delivery in a given health facility. The variable tp_par indicates the type of delivery. I would like to collapse my dataset aiming for a variable with the sum of all deliveries of type 5 by health facility.

1st code
Code:
. use SINASC_12a13_all, clear

. keep if tp_par==5
(3598297 observations deleted)

. gen um=1

. collapse (sum) npar_semtp=um, by(cnes)

. su npar_semtp

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
  npar_semtp |      1880    715.2697    1237.922          1      26083

2nd code
Code:
. use SINASC_12a13_all, clear

. gen npar_semtp = (tp_par==5)

. replace npar_semtp=. if tp_par==.
(574065 real changes made, 574065 to missing)

. collapse (sum) npar_semtp, by(cnes)

. su npar_semtp

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
  npar_semtp |      1894    709.9826    1234.857          0      26083
Viewing all 65052 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>