Complex merge or perhaps other strategy?

June 22, 2016, 9:00 am

≫ Next: Computing and Storing Variance, Covariance, ANOVA

≪ Previous: How to implement in Stata to define a market using a fixed radius

Dear all,

Brief background. Some areas in a resort have employed a new service strategy. The main goal is to see if this new service reduce the number of guest complaints. So the aim is to compare rates of complaints before and after the “program”.

I have 2 datasets. One is the area movement of the guests (a guest can move to different rooms/areas during their stays) and a complaint dataset. The areas started the new service program at different times. I used the movement data to flag guests that stay in the program areas after the start date of the program so the moving file only show movement that happened in the program areas after the start date of the program (participants).

I could flag which complaints happened before or after just using the start date of the program, the problem is that after the program a guest can be moved to an non-program area, so if the complaint happened during that period (non program area), we should not count that complaint. For example, this guest was in a non-program area during 17 and 18 of may (after the start date of the program) and report a complaint. Then we was sent again to a program area.

Stay	Area	In	Out	Programdate
33	Lower1	12-May-16	13-May-16	9-May-16
33	Fac1	13-May-16	16-May-16	9-May-16
33	Lower1	19-May-16	30-Jun-16	9-May-16

Stay	ComplaintID	Date
33	1	10-Apr-16
33	2	22-Apr-16
33	3	28-Apr-16
33	4	15-May-16
33	5	17-May-16
33	6	25-May-16

What would be the right merge strategy for this task? Is there a way to merge based on time parameters so the first 3 complaints happened before the program and complaint 4 correspond to the second movement and complaint 6 correspond to the third movement? Or perhaps just append the datasets and then work my way to assign the complaints to the specific movements?

Code:

clear
input byte Stay str6 Area int(In Out Programdate)
33 "Lower1" 20586 20587 20583
33 "Fac1"   20587 20590 20583
33 "Lower1" 20593 20635 20583
end
format %tddd-Mon-YY In
format %tddd-Mon-YY Out
format %tddd-Mon-YY Programdate

Code:

clear
input byte(Stay ComplaintID) int Date
33 1 20554
33 2 20566
33 3 20572
33 4 20589
33 5 20591
33 6 20599
end
format %tddd-Mon-YY Date

Thank you

↧

Computing and Storing Variance, Covariance, ANOVA

June 22, 2016, 9:21 am

≫ Next: Multilevel Model with xtimixed command on Stata, how do i weight it?

≪ Previous: Complex merge or perhaps other strategy?

Hi Everyone,

I have a very straightforward question, but I am stuck, therefore would appreciate your help on this. I want to calculate and store variance/covariance matrices and ANOVA numbers (e.g., SSA, SSE, MSA, MSE) in the attached file. I would appreciate if you help me in getting the Stata code for this.

Thanks,

Moeen

↧

Multilevel Model with xtimixed command on Stata, how do i weight it?

June 22, 2016, 10:04 am

≫ Next: Error - "repeated time values in sample" - varsoc command

≪ Previous: Computing and Storing Variance, Covariance, ANOVA

Good morning fellows, currently i'm working on a multilevel model in Stata and the data base i'm using is a House hold survey, one per each year. My question is if i should weight the model with the expasion factor that comes with each survey, or not. When is it proper to do so?
Thank you!

↧

Error - "repeated time values in sample" - varsoc command

June 22, 2016, 10:45 am

≫ Next: Marginal effect of a median change?

≪ Previous: Multilevel Model with xtimixed command on Stata, how do i weight it?

Hi everybody! I am using strongly balanced panel data of 14 countries from 1870-2008 containing information on financial aggregates and economic indicators.

egen ccode = group(iso)
tab iso, gen(ccode)
sort ccode year
xtset ccode year

panel variable: ccode (strongly balanced)
time variable: year, 1870 to 2008
delta: 1 unit

When trying to determine the optimal lag length between credit and real GDP, the following error message appears: "repeated time values in sample".

After following the instruction given here http://www.stata.com/support/faqs/data-management/repeated-time-values/ , I find that my panel is correctly identified and does not contain duplicates. Next, I deleted the missing observations for the variables to be included in the varsoc command, which reduces my sample by 13.8%. Nonetheless, the error message remains.

Are there any other reason for this error warning???

Update: reducing the number of countries from 14 to 1 makes the command work perfectly fine

↧

Marginal effect of a median change?

June 22, 2016, 10:55 am

≫ Next: How to save means as a new dataset/matrix

≪ Previous: Error - "repeated time values in sample" - varsoc command

Dear Statalist,

I'm currently writing my MSc thesis. I estimate the effect of a tax increase on investment. The data is cross-country and at the country-level.

I run the command

reg inv LD.tax i.year i.id_country other_controls,cluster(id_country)

I do not fully understand how to calculate the marginal effects, especially because I was asked to have a graph on it showing the median tax increase.
Am I correct in doing:

lincom 'median of LD.tax' * LD.tax

Thank you very much in advance.

Greetings from Cologne
Lennart

↧

How to save means as a new dataset/matrix

June 22, 2016, 12:24 pm

≫ Next: Need help on xtgraph!

≪ Previous: Marginal effect of a median change?

Hey there, I was wondering how to store estimates of mean values of several subpopulations as a new variable in my dataset or as matrix or anything that I can load insert into another dataset. This is the command for the mean-values I am using: mean trust, over(region) I then get all the mean values for trust in all the 200 different regions and now want to use these in another dataset. How do I use them? Storing them doesn't get me any further. Ideally I'd like to ssave them seperately as a .sav file. Is there a way of doing so? Thank you very much for helping. Chris

↧

Need help on xtgraph!

June 22, 2016, 1:40 pm

≫ Next: Get outreg2 to report AIC in summary statistics

≪ Previous: How to save means as a new dataset/matrix

Hello Statalist,

I found the problem on the xtgraph function that I need to show a graph for my panel data with medians that does not include the lower and upper bound. Instead, I need to show the 25th and 75th percentiles (like boxplot). But xtgraph doesn't allow me to have that option, as far as I know. Could you please help me figuring out how to show it with xtgraph function?

Thank you!
Suparit

↧

Get outreg2 to report AIC in summary statistics

June 22, 2016, 2:53 pm

≫ Next: Problem with forvalues command and return calculations

≪ Previous: Need help on xtgraph!

Greetings,

I am using outreg2 to create a table presenting multinomial logit results and would like to add the AIC to the summary statistics. In keeping with advice I picked up from an archived Statalist post (on the same question but for outreg - not outreg2)

HTML Code:

http://www.stata.com/statalist/archive/2013-03/msg00216.html

and example 9 in the help for outreg2

HTML Code:

http://repec.org/bocode/o/outreg2

I have used the following code but have tried to substitute the "addstat" option for the "addrows" option.

Code:

mlogit DepVar IndVar1 IndVar2, base(2) r
est store DV1M1, title(Model 1)
estat ic
mat es_ic = r(S)
local AIC: display %4.1f es_ic[1,5]
outreg2 using Table_Dev_Pref.xls, dec(3) addstat(AIC, 'AIC') groupvar(IndVar1 IndVar2) replace

Stata returns the error:
unknown function ()
r(133);

How can I correct this code to get the AIC into my outreg2 table output? Alternatively, is there an easier or other way to get the AIC into my outreg2 table?

Kind regards,
Christiana

↧

Problem with forvalues command and return calculations

June 22, 2016, 3:30 pm

≫ Next: Does 'version' command remove need for 'saveold'?

≪ Previous: Get outreg2 to report AIC in summary statistics

First, I use Stata 12 SE for Mac.

This is my dataex extract to illustrate my data:

input str12 acquirerisin float(statatime announcementdate set group_id datenum td dif event_window estimation_window predicted_return id spotstockreturn spotmarketreturn)
"AT0000652011" 14609 15558 1 1 1 678 -677 0 0 . 1 . .
"AT0000652011" 14612 15558 1 1 2 678 -676 0 0 . 1 3.943417 0
"AT0000652011" 14613 15558 1 1 3 678 -675 0 0 . 1 -.2624908 0
"AT0000652011" 14614 15558 1 1 4 678 -674 0 0 . 1 -1.560981 0
"AT0000652011" 14615 15558 1 1 5 678 -673 0 0 . 1 0 0
"AT0000652011" 14616 15558 1 1 6 678 -672 0 0 . 1 4.4841413 0
"AT0000652011" 14619 15558 1 1 7 678 -671 0 0 . 1 1.2232277 0
"AT0000652011" 14620 15558 1 1 8 678 -670 0 0 . 1 -1.5702645 0
"AT0000652011" 14621 15558 1 1 9 678 -669 0 0 . 1 -.21062852 .7025968
"AT0000652011" 14622 15558 1 1 10 678 -668 0 0 . 1 .25636232 -.3601329
"AT0000652011" 14623 15558 1 1 11 678 -667 0 0 . 1 .7197889 .2818298
"AT0000652011" 14626 15558 1 1 12 678 -666 0 0 . 1 -.318253 -.9759812
"AT0000652011" 14627 15558 1 1 13 678 -665 0 0 . 1 -.7404442 -2.4385405

I m trying to perform an event study to compute cumulative abnormal returns and in the process i ran the following code:
forvalues i=1(1)61 {
l id acquirerisin if id==`i' & dif==0
reg spotstockreturn spotmarketreturn if id==`i' & estimation_window==1
predict p if id==`i'
replace predicted_return = p if id==`i' & event_window==1
drop p
}

However, the forvalues loop always aborts and returns the following message:
(option xb assumed; fitted values)
(250500 missing values generated)
(41 real changes made)

+-------------------+
| id acquirerisin |
|-------------------|
50297. | 13 DK0010274414 |
+-------------------+
no observations
r(2000);

end of do-file

r(2000);

I would like to compute as many CARs as possible (which is done after the forvalues loop), but even deleting the ID in question will not solve my problem, which is that I can only compute the CARs for 12 out of 61 entities in my dataset.

Help would be greatly appreciated.

Thank you guys in advance.

I look forward to the advice of the Stata cracks out here on Statalist.

↧

Does 'version' command remove need for 'saveold'?

June 22, 2016, 4:57 pm

≫ Next: Post Estimation for Variance and standard deviation Multilevel model

≪ Previous: Problem with forvalues command and return calculations

Does using the "version" command eliminate the need to use the "saveold" command?

That is, if I need to make some v12 data using v14, do I have to change every save command to saveold, or will one "version 12" at the top of the .do file mean that every dataset created by that .do file will automatically be saved as v12 style data?

↧

Post Estimation for Variance and standard deviation Multilevel model

June 22, 2016, 8:33 pm

≫ Next: Exceeded Memory for Fisher's Exact Test

≪ Previous: Does 'version' command remove need for 'saveold'?

Dear all,

I ran some multilevel analyses. Below you can find the commands:

. xtgls TotGLStressWeek Gender Workinghours Previousexp Arrivaldatemonths Age, nmk

However, I need to find out both the variance (level 1 and 2) and the standard deviance. Is there anyone who knows what command I should use?

↧

Exceeded Memory for Fisher's Exact Test

June 23, 2016, 10:28 am

≫ Next: How can a CFA model estimated with SEM yield an SRMR>1?

≪ Previous: Post Estimation for Variance and standard deviation Multilevel model

I got the following error notice when trying to run a Fisher's exact test: "exceeded memory limits using exact(1); try again with larger #"
Sample size is only 120, but some cell counts are 0. I thought this was precisely what Fisher's exact was for?

This neophyte is grateful for the help.

↧

How can a CFA model estimated with SEM yield an SRMR>1?

June 23, 2016, 11:15 am

≫ Next: Fixed Effects Dummy Variables _Omitted and Best Model

≪ Previous: Exceeded Memory for Fisher's Exact Test

I just estimated a simple CFA model with SEM, and obtained an SRMR (standardized root mean squared residual) of 2.212 . Unless I am mistaken, it should have a maximum value of 1.0. How is this possible?

The data is on a sample of 742 observations, and items are scored on a 10 pt scale

Command:
sem ( pih1-pih12 <- PIH), method(ADF)

The factor loadings look reasonable (no Heywood cases)

The model clearly does not fit well (not unexpected), and here are the fit indices:

-> estat gof, stats(all)

----------------------------------------------------------------------------
Fit statistic | Value Description
---------------------+------------------------------------------------------
Discrepancy |
chi2_ms(54) | 319.043 model vs. saturated
p > chi2 | 0.000
chi2_bs(66) | 748.898 baseline vs. saturated
p > chi2 | 0.000
---------------------+------------------------------------------------------
Population error |
RMSEA | 0.081 Root mean squared error of approximation
90% CI, lower bound | 0.073
upper bound | 0.090
pclose | 0.000 Probability RMSEA <= 0.05
---------------------+------------------------------------------------------
Baseline comparison |
CFI | 0.612 Comparative fit index
TLI | 0.526 Tucker-Lewis index
---------------------+------------------------------------------------------
Size of residuals |
SRMR | 2.212 Standardized root mean squared residual
CD | 0.926 Coefficient of determination
----------------------------------------------------------------------------

Of note,many of the standardized residuals could not be calculated, and thus are missing.

↧

Fixed Effects Dummy Variables _Omitted and Best Model

June 23, 2016, 11:17 am

≫ Next: Listing observations and then saving them into a string variable

≪ Previous: How can a CFA model estimated with SEM yield an SRMR>1?

Hi Stata Intellectuals,

I have one more quick question on fixed effect. I have a panel data with firm level characteristics over 10 years. I am currently running a code, but all fixed effect control dummies for year and industry are collinear (all of them are omitted). What is the reason for that? I found some previous discussions in this forum for one (witch is normal) or two omitted dummy variables, but not for all.

Here is the code:
(1)
xtset firm year
xtreg y x1 x2 x3 i.year i.industry, fe

in this case all dummies for year and industry are all omitted

When I run (as suggested in a previous discussion) :
(2)
xtreg y x1 x2 x3 i.year i.industry

the variables are not omitted, but if I don't specify fixed effect or random effect what is Stata running when I use xtreg?

An alternatively suggested model is the following:
(3)
egen both= group(year indu)
xtreg y x1 x2 x3 i.both, fe

or finally
(4)
xtreg y x1 x2 x3 , i(both) fe

In summery: when I use #(1) all dummies are omitted, why? What is xtreg running without specifying fe or re? What is the difference between 3 and 4? What is the recommended approach?

Thank you

Marco

↧

Listing observations and then saving them into a string variable

June 23, 2016, 11:29 am

≫ Next: Fun loop challenge

≪ Previous: Fixed Effects Dummy Variables _Omitted and Best Model

Hello dear forum members,

I am seeking your help with the following task.

So, I have a list of MSA (metropolitan statistical area) codes (N=104) and also a list of corresponding ZIP codes (N="a lot"). I use the following command to list ZIP codes that fall under a given MSA:

HTML Code:

 list ZIPCODE if msa == 10420, noobs clean

    ZIPCODE  
      44056  
      44067  
      44087  
      44201  
      44202  
      44203  
      44210  
      44211  
      44216  
      44221  
      44222  
      44223  
      44224  
      44231  
      44232  
      44234  
      44236  
      44237  
      44238  
      44240  
      44241  
      44242

I further need to save the observed ZIP codes for each MSA into a new string in the following format "44056, 44067, 44087, ..., 44242".

Could you please suggest a code to execute such task.

Thankfully,
Anton

↧

Fun loop challenge

June 23, 2016, 11:35 am

≫ Next: Testing regression coefficients using xtmixed

≪ Previous: Listing observations and then saving them into a string variable

Dear all,

I am trying to create a loop using two local arrays and numbers.

Here is what I have so far:

Code:

svyset    [pweight = trendwt], strata (nis_stratum) psu (hospid)

local var1 "pneumoall pneumo sepsisall sepsis otherall other mrsa sensitive resistant"
localoutput`""S. aureus pneumonia" "MRSA pneumoniae" "S. aureus Septicemia" "MRSA Septicemia" "S. aureus Others" "MRSA Others" "MRSA Totals" "Total MRSA Infections" "Total MSSA Infections""'
tempfile alldata2010
save "`alldata2010'"

capture confirm file "Results.xls"
foreach name of local var1 {
                    forvalues i=1/30 {
                                    if _rc!=0 {
                                    svy: total dischgs
                                    regsave, ci
                                    gen year=2010
                                    replace var="Total Discharges"
                                    export excel using "Results", firstrow(variables)
                                    }
                                    else {
                                    use `alldata2010', replace
                                    svy: total dischgs, subpop(`"name"')
                                    regsave, ci
                                    gen year=2010
                                    replace var=`"output"' if
                                    export excel using "Results", sheet("Sheet1") sheetmodify cell(A`i')

And ultimately, the output will look something like this on excel. (Numbers are not important, I just wanted to show what the final product would look like).

var	coef	stderr	ci_lower	ci_upper	N	year
Total Discharges	3225712	734.9375	38188	3836	7441	2010
S. aureus pneumonia	2345.05469	365.130859	836.6875	9601.42188	7441	2010
MRSA pneumonia	348.12891	233.0855	599.40625	6806.85156	7441	2010
S. aureus Septicemia	12221.52344	244.5088	864.78125	9560.26563	7441	2010
MRSA Septicemia	11248.63672	1853.1337	456.69531	50.57813	7441	2010
S. aureus Others	33044.9688	146.44727	4053.875	5236.0625	7441	2010
MRSA Others	32714.9375	745.66748	2963.0938	336.7813	7441	2010

The first part of my 'if' statement is inserted to create the excel file with the first row, Total Discharges. If the excel file already exist, it is supposed to move on to the `else' section, and for each element of my array var1, it should calculate the totals and put it in the next line of my excel file. The tricky part is replacing "var" to equal the local macro "output" depending on which `var1' was being calculated in the subpop. For example, if the loop ran svy:total dischgs, subpop(pnuemo), I would like for var to be replaced to equal "MRSA pneumoniae" (the first element in BOTH arrays). In PHP / Java, elements of an array can be identified with brackets; replace var=`"output[2]"' if `'"name[2]". Is it possible to do something like this in Stata or get around this in another way?

Also, how can I change the export excel command to keep increasing by 1, i++, until the loop itself finishes without knowing how many times the loop will need to run. I'm not sure how many excel rows will need to be added, therefore I don't think a statement like i=1/30, makes a lot of sense.

Thanks so much for your help.

↧

Testing regression coefficients using xtmixed

June 23, 2016, 1:42 pm

≫ Next: How to model intra-EU trade dummy variables for Gravity Model studying EU Membership effect on trade flows

≪ Previous: Fun loop challenge

I use xtmixed to estimate a mixed-effects multilevel model with

Code:

xtmixed dep var1 var2 var3 var4 var5 var6 || clustervar, covariance(independent)

Afterwards, I want to test the joint significance of the first two variables and the constant using

Code:

test _cons var1 var2

Does the Wald test make sense in this case? And why does t:_cons not work for the constant term? Using _cons instead produces

Code:

[t]_cons = 0
[lns1_1_1]_cons = 0
[lnsig_e]_cons = 0
[t]var1 = 0
[t]var2 = 0

chi2(   4) = 25.51
Prob > chi2 = 0.0000

However, I don't want to include [lns1_1_1]_cons = 0 and [lnsig_e]_cons = 0 in the test...

Thanks for your help!

↧

How to model intra-EU trade dummy variables for Gravity Model studying EU Membership effect on trade flows

June 23, 2016, 1:47 pm

≫ Next: Cox model, PSM after Multiple imputation

≪ Previous: Testing regression coefficients using xtmixed

↧

Cox model, PSM after Multiple imputation

June 23, 2016, 2:18 pm

≫ Next: collapse of all obs vs collapse of a fraction of obs

≪ Previous: How to model intra-EU trade dummy variables for Gravity Model studying EU Membership effect on trade flows

Dear all,

I met a problem on cox models after multiple imputation.
In stata, I use mi chained to get imputed datasets. The censored time variable (dur) was used in the imputed model.
When I used "mi stset dur, fail(censor=1)", it did not work because it said "dur was registered as imputed".
This means I have to do the analysis for each imputed dataset and then combine them using Rubin's rules.
But how? I am still not figure out how to use Rubin's rules to combine estimates, standard error, and p-vlaue considering I cannot use mi estimate command.

Thank you very much

↧

collapse of all obs vs collapse of a fraction of obs

June 23, 2016, 2:43 pm

≫ Next: sample selection and corner solution model

≪ Previous: Cox model, PSM after Multiple imputation

Hello. Could anyone help me figure out why the two commands below end up differently? Each row of my dataset corresponds to a delivery in a given health facility. The variable tp_par indicates the type of delivery. I would like to collapse my dataset aiming for a variable with the sum of all deliveries of type 5 by health facility.

1st code

Code:

. use SINASC_12a13_all, clear

. keep if tp_par==5
(3598297 observations deleted)

. gen um=1

. collapse (sum) npar_semtp=um, by(cnes)

. su npar_semtp

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
  npar_semtp |      1880    715.2697    1237.922          1      26083

2nd code

Code:

. use SINASC_12a13_all, clear

. gen npar_semtp = (tp_par==5)

. replace npar_semtp=. if tp_par==.
(574065 real changes made, 574065 to missing)

. collapse (sum) npar_semtp, by(cnes)

. su npar_semtp

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
  npar_semtp |      1894    709.9826    1234.857          0      26083

↧