A version controlled -save- command

October 18, 2016, 9:57 am

≫ Next: modify charlson.ado to add new codes

≪ Previous: Expand quarterly to monthly dataset

In a recent announcement, Michael Stepner discusses the problem of different dta-file formats when collaborators working with different versions of Stata. Although the new saveold command can write files that will open in Stata 11, the command used to create this file will not work in a version prior to Stata 14. His solution is to have commands of the form save# that explicitly state which Stata version a file is saved for.

I very much like the idea and propose yet another solution, vsave, which can be downloaded from the SSC archives. The idea is to save datasets in the file format implied by version control. Although it is kind of contrary to the basic concept of version control (see this critical discussion) I believe this behavior is desirable in some situations. The command has a version() option, mirroring the one introduced in the new saveold, that may be specified to emphasize that the data is saved to some fixed dta-file format. vsave will work with Stata 11 up to 14 (and perhaps beyond).

I hope some of you will find this useful and I am interested in any comments or questions.

Best
Daniel

↧

modify charlson.ado to add new codes

October 18, 2016, 11:15 am

≫ Next: shading areas on a graph

≪ Previous: A version controlled -save- command

dear all,
im trying to use the charlson program to find renal patients in a database, but the program dont identify all the codes that i need... or perpaphs im using the program in wrong way.
I have to find the code 584.9 , that it is acute kidney failure, but charlson program doesnt recognize.
has anybody any idea to modify the code of the program?

thanks a lot

↧

shading areas on a graph

October 18, 2016, 12:24 pm

≫ Next: Looping over the same GDP variable over vintages of time

≪ Previous: modify charlson.ado to add new codes

Array I am using Newson's -eclplot- (from SSC) to show confidence intervals; I want to include a shaded area showing the CI for the "combined" data; here is the command I am using:

Code:

eclplot cat ll ul studynum11 if group==11 in 1/6, hor xla(0 "0%" .2 "20%" .4 "40%" .6 "60%" .8 "80%" 1 "100%") xti("Percentage Emerged from DOC") yti("Study") ti("Results for 1 Mo. Baseline/3 Mo. Outcome with DOC as Dx", span) xline(.34286, lw(*50) lc(gs14)) xline(.3429)

and here is the resulting graph; note that because the CI is not symmetric, my use of relative size in the linewidth (lw) option, is wrong (as that gives me symmetric shading around the mid-point)

↧

Looping over the same GDP variable over vintages of time

October 18, 2016, 1:09 pm

≫ Next: problem to summarize my variables by year

≪ Previous: shading areas on a graph

Hi all
I have Gross Domestic Product, gdp, data that are in vintages (i.e. the gdp variable is revised over time, so gdp65Q4 below is the gdp data revised at a vintage date of year 1965 quarter 4. Therefore for each column below we have gdp data for each quarter measured on a specific vintage date. The next column will show how the gdp for the same quarters have been revised in a subsequent vintage date and so on. To clarify more, the first cell 306.4 on the north west corner represents gdp in quarter 1 year 1947 but as revised in quarter 4 of year 1965, the next cell to the right 307.4 represents gdp in quarter 1 year 1947 again but as revised in quarter 1 year 1966....and so on)

date	gdp65Q4	gdp66Q1	gdp66Q2	gdp66Q3
1947:Q1	306.4	307.4	309.4	302.4
1947:Q2	309	301	302	309
1947:Q3	309.6	304.6	308.6	302.6
1947:Q4	314.5	315.5	314.5	310.5
1948:Q1	317.1	311.2	314.3	319.1
1948:Q2	322.9	325.7	333.9	322.9

What I want to do is to generate a variable that represents gdp vintage shock and measured as residual of a regression of gdp at a given vintage date on the same gdp at a previous vintage date.
I do not know how a loop can estimate that ?

Thank you

↧

problem to summarize my variables by year

October 18, 2016, 2:07 pm

≫ Next: Twoway graph with defined x scale and x label still clusters into small section of x axis

≪ Previous: Looping over the same GDP variable over vintages of time

I am new to stata, i need your help, i am trying to summarize all my variables by year (from 2002 to 2015)
but i only know how to summarize for one year in particular
like this :
sum if year ==2002

but is it possible to do this for every year ?
I joined my excel file that i used for stata
Thanks in advance

↧

Twoway graph with defined x scale and x label still clusters into small section of x axis

October 18, 2016, 2:57 pm

≫ Next: Beginner's tutorial

≪ Previous: problem to summarize my variables by year

Hi there, I am using Stata 14. The title pretty much explains my problem. My code is as follows:

twoway (scatter PTemp ABC if wave==5, mlabel(ccode) mcolor(green) mlabcolor(green)) /*
*/ (scatter PTemp ABCif wave==6, mlabel(ccode) mcolor(blue) mlabcolor(blue))/*
*/ (lfit PTemp ABC if ABC<=2.5, msymbol(i) lcolor(black)) /*
*/ , subtitle("Womens Part Time employment and ABC") /*
*/ xlabel(1(.5)2.5) xtitle("ABC") xscale(r (1 2.5))/*
*/ legend(col(2) colgap(20) lab(1 "WVS Wave 5") lab(2 "WVS Wave 6") /*
*/ region(lstyle(none)) symxsize(14) keygap(2) textwidth(20) )

and the resulting graph is attached.

How can I extend the x axis to cover the full graph space?

I have run into a similar problem in the past and the problem had been concerning the lfit line, but that is not the case in this code.

Any help would be greatly appreciated

↧

Beginner's tutorial

October 18, 2016, 6:26 pm

≫ Next: 2 dimensional clustering

≪ Previous: Twoway graph with defined x scale and x label still clusters into small section of x axis

Hi all,

I'm not sure if this is the correct place to come with a general curiosity about learning STATA, but it's hit me a couple of times over the past few weeks that I'd like to start picking up on how to work it. I'm an economics major waiting to take an econometrics class in the Spring and I'd like to find places for tutorials on how to operate STATA and building on from that foundation.

Any information is greatly appreciated.

↧

2 dimensional clustering

October 18, 2016, 7:39 pm

≫ Next: Working life tables

≪ Previous: Beginner's tutorial

Hello, I am following a methodology of one of the papers. According to this I want to estimate OLS regression using 2 dimensional cluster at firm and year level. Now STATA allows me to cluster by only 1 variable. Can anyone please help me how to carry out 2 dimensional clustering in STATA ? Thanks a lot.

↧

Working life tables

October 19, 2016, 12:17 am

≫ Next: Warning Message when trying to implment Propensity Score Matching using PSMATCH2

≪ Previous: 2 dimensional clustering

Dear Statalist Users,

Am currently generating some Labor Statistics report using Stata. Am using Stata 13 on Windows 7 and was inquiring if there is a Stata command that can generate tables of working life. Below is the desired output.

Array

Thanks in advance.
Stephen.

↧

Warning Message when trying to implment Propensity Score Matching using PSMATCH2

October 19, 2016, 8:23 am

≫ Next: Average of a dummy variable

≪ Previous: Working life tables

Dear Statalisters,

I keep on getting a warning message when try to implement Propensity score matching using psmatch2 . The following is the error message

***There are observations with identical propensity score values.
The sort order of the data could affect your results.
Make sure that the sort order is random before calling psmatch2.***

Can any one kindly advice me on how to make the sort order of the data random.

↧

Average of a dummy variable

October 19, 2016, 8:37 am

≫ Next: How to aggregate an ordinal variable

≪ Previous: Warning Message when trying to implment Propensity Score Matching using PSMATCH2

Hi All,

let's say I want to control for gender effect in a given market. Then I set up a binary variable "Male" which takes on value 1 for males and 0 otherwise. Does it make any sense to average the dummy variable at a market level? So, assuming that I have 10 agents in a market and 8 of them are males, the dummy would assume value 0.8 and will provide the model with the information that the market was "male-dominated".

Would that be correct?

Thanks
Simone

↧

How to aggregate an ordinal variable

October 19, 2016, 9:18 am

≫ Next: metandi and 0 false positives

≪ Previous: Average of a dummy variable

Dear Stata Forum,

I have an ordinal variable on teacher's experience t_exp: 1 2 3 4 5 6

Considering that one school has many teachers, I need to aggregate this variable at school level.

Calculating the mean does not look sensible as 6 would weight more than 1 for instance. So a school with six teachers with 1 year experience would have the same mean as a school with one teacher with 6 years experience. Well, this does not sound right to me.

Any help on how to aggregate an ordinal variable in a sensible way? Thanks!

↧

metandi and 0 false positives

October 19, 2016, 12:09 pm

≫ Next: max(n, . ) = n? Why is that a reasonable behaviour?

≪ Previous: How to aggregate an ordinal variable

Dear Statalist,

I'm having trouble with metandi from SSC in Stata 13.1.

From reading previous posts about this, it seems that the trouble arises from the zero-values in two of the studies. Can I transform the values somehow to make Stat accept the command? force does not help. The only alternative is to conduct the analyses in the RevMan software (from The Cochrane Collaboration), but I find that the Stata outputs are much more informative and also look better, so I would prefer using Stata. As the manuscript will include two meta-analyses, I find that I need to use the same software. Stata runs the other meta-analysis just fine.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str20 author float(n tp fp fn tn)
"Ni"         98  44  0 17  37
"Bertino"   248 154 13 26  55
"Kraft"     205  68 13  9 115
"De Vito"    73  27  0  6  40
"Stanikova"  63   .  .  .   .
end

When running metandi on these data, Stata informs me that:

Hessian has become unstable or asymmetric
r(504);

I hope that you will be able to help me.

Kind regards,
Anne-Kirstine

↧

max(n, . ) = n? Why is that a reasonable behaviour?

October 19, 2016, 12:28 pm

≫ Next: Graph: Many Stata graphs in one document (PDF)

≪ Previous: metandi and 0 false positives

For example:

. display max(-5, . )
-5

I had assumed if one of the arguments was missing, the max function would also return missing. I would certainly prefer that behavior, at least in my current project.

↧

Graph: Many Stata graphs in one document (PDF)

October 19, 2016, 12:59 pm

≫ Next: Simulation question: fit logit model on 90% of sample and test on 10%--how to save predictions to compare with actual DV?

≪ Previous: max(n, . ) = n? Why is that a reasonable behaviour?

Hello,

I have done some research on this but could find a concrete solution. I created 4 graphs resulting in 4 single PDF documents.

Code:

graph bar (max) preUOFtotr (max) inUOFtotr (max) postUOFtotr,by(unit)
etc
graph export "H:\graph1.pdf", as (pdf) replace

I would like to "append" all 4 graphs into one single PDF document- one page per graph. How can I do this? Perhaps I can save the graphs in Stata format and then somehow combine them. I don't think the graph combine command will work since it combines graphs into 1 page/view and I want each graph in one page.

I would appreciate any help since combining these single pdf pages using other program is very time consuming.

Thank you,
Marvin

↧

Simulation question: fit logit model on 90% of sample and test on 10%--how to save predictions to compare with actual DV?

October 19, 2016, 1:40 pm

≫ Next: don't display missing dates in twoway line graph

≪ Previous: Graph: Many Stata graphs in one document (PDF)

I am trying to run a simulation that fits a logit model on 90% of the sample and then tests it on the remaining 10%. Next, I want to compare the predictions with the actual outcomes. I'm not sure how to save the predicted outcomes to compare with actual outcomes. This is what I have so far, but it is not working. Should save the p1 in a separate dta file with the actual Y and compare as a second step after the simulation has run? If so, how do I do that?

program define simcheck
* drop all variables to create an empty dataset
drop _all
* get dataset
use "F:\Master.dta"
* set sample size. Set to 90% (estimation here based on 200)
generate random = runiform()
sort random
gen group = 1 + (_n > 180)
* retain the variables of interest
keep DV X1 X2 X3 X4 X5 group
* run logit model ON group == 1
logit DV X1 X2 X3 X4 X5 if group == 1
* get predictions from model and test on other 10%. How many errors?
predict p1 if group == 2
generate predicted_error = DV-p1 if group == 2
* close programming language
end

simulate predicted_error, reps(10): simcheck

↧

don't display missing dates in twoway line graph

October 19, 2016, 1:51 pm

≫ Next: Best practices for using tempfiles

≪ Previous: Simulation question: fit logit model on 90% of sample and test on 10%--how to save predictions to compare with actual DV?

I am creating a line graph with the variable ratio. I would like my graph to show no line when data for that month is missing.

twoway (line ratio date if treated == 1) (line ratio date if treated == 0), title("Test")
graph export test.emf, replace

date is a Stata date with a month format and 1 for each day:

date
1/1/2012 - Jan12
2/1/2012 - Feb12
3/1/2012 - Mar12

Array

My data-set has missing values for certain months, but from the graph it looks like continuous data. Is it possible to show missing months as blank (no line)?

Thank you for your assistance.

↧

Best practices for using tempfiles

October 19, 2016, 4:11 pm

≫ Next: Multiple Imputation - Truncreg leads to missing imputations and error

≪ Previous: don't display missing dates in twoway line graph

I was wondering if anyone has a good sense of when it is better (for efficiency and to avoid errors) to use a tempfile vs. a .dta file. If there are resources you can point me to, I would appreciate that as well.

I've just begun using tempfiles in loops when using .dta files is impractical. Now I'm wondering if I should be converting all intermediate .dta files I create in my .do files to tempfiles. If I don't plan to use a file outside of a .do file, is there ever a case when I shouldn't use tempfiles? Are tempfiles more or less efficient than .dta files? If I create many tempfiles, is there a reason (and way) to delete them within my .do file?

Thank you,
Krista

↧

Multiple Imputation - Truncreg leads to missing imputations and error

October 19, 2016, 4:20 pm

≫ Next: Using egen, seq and cond together to create a list of days in each month

≪ Previous: Best practices for using tempfiles

Hi all,

I am using multiple imputation on a dataset. Of the ~220 observations, ~40 require imputation for three active variables and a handful of passive variables. I transform these variables to restrict their ranges: the number of wells should be only positive, so I impute ln(wells); the proportion of agricultural wells should be between 0 and 1, so I impute logit(percent_ag); and the same for proportion near the coast.

The missing observations follow a monotone pattern, so I use mi impute monotone. Because the imputation procedure was giving me many extreme results (e.g., almost all proportions coming out to 99% or <1%), I tried to restrict the range of the regression on the logit-transformed variables (see below, e.g., truncreg, ll(-2) ul(2)). This reduced the imputation sample slightly, but not by much.

The issue that arises is that the imputation datasets are now no longer complete. Of the 20 imputations, a handful will have missing values for some observations for one or more of these variables. When I then try to run estimation afterwards, I get the following error: "estimation sample varies between m=1 and m=2; click here for details r(459)" This makes sense. Does anyone have an idea for why Stata is not imputing full datasets when I restrict the range of the dep vars using a truncated regression??? Is it because it wants to impute values outside the truncated range and then drops the imputation (m) when it cannot do so? That would seem odd...

The important parts of the code are:

//Make necessary transformations
gen epsilon=.0001
gen ln_wells = ln(num_wells_exog+epsilon) //We don't want this to be ln(0)
//Logit undefined if p=0 or p=1
gen logit_ag = logit(percent_ag_wells_exog)
replace logit_ag = logit(percent_ag_wells_exog+epsilon) if percent_ag_wells_exog==0
replace logit_ag = logit(percent_ag_wells_exog-epsilon) if percent_ag_wells_exog==1
replace prop_wells_1000m_coast=0 if dum_coast==0 //Conditional Value
gen logit_prop_coast = logit(prop_wells_1000m_coast)
replace logit_prop_coast = logit(prop_wells_1000m_coast+epsilon) if prop_wells_1000m_coast==0
replace logit_prop_coast = logit(prop_wells_1000m_coast-epsilon) if prop_wells_1000m_coast==1

mi set wide
mi register imputed ln_wells logit_ag logit_prop_coast
mi register regular wellyieldavg mean_precip_19502014 dum_coast avggrowth_1950_2010 type_num swp_connect totalarea_acres mean_spatialvariance_19502014 nfarms_avg19401959

mi impute monotone (truncreg, ll(-10) ul(12)) ln_wells (truncreg, ll(-2) ul(2)) logit_ag (truncreg if dum_coast==1, ll(-10) ul(10)) logit_prop_coast = wellyieldavg mean_precip_19502014 dum_coast avggrowth_1950_2010 type_num swp_connect totalarea_acres mean_spatialvariance_19502014 nfarms_avg19401959, noisily force add(20) rseed(47)

mi passive: gen mi_wells_exog = exp(ln_wells)
mi passive: replace mi_wells_exog = num_wells_exog if num_wells_exog!=.

mi passive: gen mi_percent_ag_wells_exog = invlogit(logit_ag)
mi passive: replace mi_percent_ag_wells_exog = percent_ag_wells_exog if percent_ag_wells_exog!=.

mi passive: gen mi_prop_wells_1000m_coast = invlogit(logit_prop_coast)
mi passive: replace mi_prop_wells_1000m_coast = prop_wells_1000m_coast if prop_wells_1000m_coast!=.

mi passive: gen mi_prop_1000m_sq = mi_prop_wells_1000m_coast^2
mi passive: replace mi_prop_1000m_sq = prop_1000m_sq if prop_1000m_sq!=.

mi passive: gen mi_percent_nonag_wells_exog = (1-mi_percent_ag_wells_exog)
mi passive: replace mi_percent_nonag_wells_exog = percent_nonag_wells_exog if percent_nonag_wells_exog!=.

mi passive: gen mi_well_heterogeneity_exog = (mi_percent_ag_wells_exog)*(mi_percent_nonag_wells _exog)
mi passive: replace mi_well_heterogeneity_exog = well_heterogeneity_exog if well_heterogeneity_exog!=.

mi passive: gen mi_wells_per_acre_exog = (mi_wells_exog)/(totalarea_acres)
mi passive: replace mi_wells_per_acre_exog = wells_per_acre_exog if wells_per_acre_exog!=.

mi estimate, post: ologit type_num wellyieldavg mean_precip_19502014 dum_coast mi_wells_per_acre_exog avggrowth_1950_2010 mi_percent_ag_wells_exog, robust
//Here is where the error is encountered

↧

Using egen, seq and cond together to create a list of days in each month

October 19, 2016, 6:01 pm

≫ Next: Fixed effects estimator and gmm

≪ Previous: Multiple Imputation - Truncreg leads to missing imputations and error

I have used expand so that my monthly data is repeated for each day in the given month. So there are 30 repeated observations for September and 31 for October. I would like to create a variable "day" that lists the days in order (1 to 30, etc.)

If all months had 30 days, I would use:

Code:

egen month =seq(), f(1) t(30)

Instead, I tried

Code:

egen day = cond(days_in_month=30, seq(), f(1) t(30), ///
    cond(days_in_month=31, seq() f(1) t(31), ///
    cond(days_in_month=28, seq() f(1) t(28), ///
    seq() f(1) t(29))))

But this gives the error: "unknown egen function cond()". I also tried moving the cond() to within t(), but this gave the error "'cond' found where integer expected". Is there a way to do this?

Thank you.

↧