Collapsing Nested Variable

December 3, 2016, 10:18 pm

≫ Next: How to adjust graph's transparency?

≪ Previous: Scale break in forest plot after using metan

Hello,

I am trying to generate a variable that indicates if more than 1/3 of the population (perc_pop_block) in a census tract (geo_id) is at least 1 mile away from a clinic (mile_1). The distance is measured at the block group level (block_group) and block groups are nested within census tracts. I can easily code this variable for tracts where one or more block groups is a mile away from a clinic

Code:

gen mile_33=1 if mile_1==1 & perc_pop_block>(1/3)

but I also need to be able to create this variable for tracts for which several block groups equal more than 33% of population being further than 1 mile from a clinic. The data below represents an example for a tract in which more than 33% of the population is away from a clinic but no one block group represents more than 33%.

I'd appreciate any help. Thank you.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str12 geo_id byte block_group float near_distance int(pop_block tract_pop) byte mile_1 float perc_pop_block
"A29189210600" 1  .8881201 1336 6425 0 .20793775
"A29189210600" 2 1.3657404 1443 6425 1 .22459143
"A29189210600" 3  1.742405  711 6425 1 .11066148
"A29189210600" 4 1.1704314 1260 6425 1 .19610895
"A29189210600" 5  .7445323 1675 6425 0 .26070037
end

↧

How to adjust graph's transparency?

December 4, 2016, 12:10 am

≫ Next: [sspace] setting an initial value for the variance-covariance matrix

≪ Previous: Collapsing Nested Variable

Hello,

I tried to draw a graph like this:
Array

but a curve will be covered by another:
Array

Code:

Code:

clear
set obs 100

generate x1 = rnormal(0,1)
generate x2 = rnormal(0.5,1.3)

summarize x1
local M1 = r(mean)
local SD1 = r(sd)

summarize x2
local M2 = r(mean)
local SD2 = r(sd)

twoway function x1 = normalden(x, `M1', `SD1') ,recast(area)  range(-4 4) lcolor(black) fcolor(blue*0.04)  || ///
function x2 = normalden(x, `M2', `SD2'), recast(area) range(-4 5) lcolor(black) fcolor(red*0.03)

When filled color, the graph below will be covered by the graph above. But I want to make the line of the graph at the bottom show and the overlapped area be marked. Can the transparency of the graph be adjusted or are there any other methods?

I have installed the "drarea" command which helps to highlight the overlapped area of two graphs and am still working on applying it in making normal distribution plots.

I'd appreciate any help or suggestion. Thank you!

↧

[sspace] setting an initial value for the variance-covariance matrix

December 4, 2016, 5:59 am

≫ Next: How to find the value for which my continuous variable stops being significant?

≪ Previous: How to adjust graph's transparency?

Hello everyone

I am trying to replicate the Laubach and Williams paper on measuring the natural rate of interest. Link: http://www.frbsf.org/economic-resear.../wp2016-11.pdf
However, between the first and the second stage of the model, I need to impose a relationship between the variances of 2 state equations.
The original code for this paper is provided by the authors on this link http://www.frbsf.org/economic-resear...john-williams/, under (+) supplement. In it, you can find a HLW_codeguide.
My question is related with matrix Q in section 7.4 of page 10.

I know that the sspace command allows me to claim whether the covariance matrix of the state equations is unstructured, diagonal or identity.
But how do I impose such a covariance matrix on stata?
Is it possible to create such relationship between the errors of different state equations?

Thank you in advance!

↧

How to find the value for which my continuous variable stops being significant?

December 4, 2016, 7:18 am

≫ Next: Merging Datasets with Two Unique Identification Variables

≪ Previous: [sspace] setting an initial value for the variance-covariance matrix

I have a continuous distance measure as an explanatory variable (a properties distance to an amenity, ranging from 0 to 60km). How, if possible at all, can I find the value for which distance to an amenity stops being significant? e.g. an amenity that is 1km away has explanatory power, but anything above 30km has no explanatory power. Is there a special stata command I can use to find this?

↧

Merging Datasets with Two Unique Identification Variables

December 4, 2016, 7:23 am

≫ Next: -GSEM- Adding higher-level predictors to a multilevel random intercept model

≪ Previous: How to find the value for which my continuous variable stops being significant?

Hi everyone,

I am attemping to merge two panel data sets. The documentation of the data tells me that I should utilise two unique identification variables to merge the data, titled 'household_id2' and 'hh_s5aq00'. However, only one the variables is present in my master data set (household_id2). So, using the command:

merge m:1 individual_id2 hh_s5aq00 using "dataset_location"

Returns an error of 'variable hh_s5aq00 not found'.

Any help would be much appreciated, thanks!

↧

-GSEM- Adding higher-level predictors to a multilevel random intercept model

December 4, 2016, 9:11 am

≫ Next: The use of robust standard errors for correction of estimation problems

≪ Previous: Merging Datasets with Two Unique Identification Variables

Hello!

I have a question on generalized structural equation modeling. Say, I fit the following two-level random intercept model:

Code:

gsem (x1 -> y, ) (x2 -> y, ) (M1[hospital_id] -> y, ), covstruct(_lexogenous, diagonal) vce(robust) latent(M1 ) nocapslatent

where x1/x2 and y are at the doctor level and hospital_id is a latent variable denoting random intercept for upper (hospital) level.

Now, in my data set I also have several hospital level predictors (say, b1 and b2). Is it plausible to add them to the model and estimate like:

Code:

gsem (x1 -> y, ) (x2 -> y, ) (M1[hopsital_id] -> y, ) (b1 -> y, ) (b2 -> y, ), covstruct(_lexogenous, diagonal) vce(robust) latent(M1 ) nocapslatent

I recall that in multilevel estimation dependent variable (y) must be at the lowest level (doctor in my case), so I am not sure if I can estimate effects of higher-level b1 and b2 on y.

Thank you in advance for help.

↧

The use of robust standard errors for correction of estimation problems

December 4, 2016, 10:44 am

≫ Next: Labeling Boxplot elements

≪ Previous: -GSEM- Adding higher-level predictors to a multilevel random intercept model

Hello all,

I'm performing a convergence analysis of income growth rates based on panel data and for this I'm estimating my models by fixed effects or random effects, depending on the Hausman test result, using the 'xtreg' command.
Before completing the estimates, I carry out the following tests:

a) Serial correlation (through the xtserial command);

b) Heteroscedasticity (through the xttest3 command); and

c) Cross-Sectional Dependency / Contemporary Correlation (via the xtcsd command, weigh abs).

My first question concerns the heteroskedasticity test because since the 'xttest3' command can only be run after the fixed effects estimation, I want to know how to test heteroskedasticity in the random effects models. Do I need to test heteroskedasticity for random-effects models, since they are estimated by least-squares generalized, a method that, unless mistaken, already eliminates problems caused by heteroskedasticity?

The second question concerns what to do in the face of the presence of serial correlation, heteroskedasticity and cross-sectional dependence at the same time. Can I eliminate the problems they cause by using robust standard errors in the fixed effects or random effects estimations? Doing:

xtreg depvar [indepvars], fe r

xtreg depvar [indepvars], re r

I hope I can count on your help.

Best regards

Girlan

↧

Labeling Boxplot elements

December 4, 2016, 2:21 pm

≫ Next: Homework:/

≪ Previous: The use of robust standard errors for correction of estimation problems

Hello,

In the following data I want to add the names of the corresponding countries as labels to the min, max, p25, p50, and p75 lines in the boxplots. Is it possible?

Code:

sysuse lifeexp, replace
drop if region==2
.graph box lexp, over(region)

Thanks

↧

Homework:/

December 4, 2016, 2:31 pm

≫ Next: Seemingly Unrelated regression and Bootstrap

≪ Previous: Labeling Boxplot elements

I wonder if someone could help me answering this question:
Generate a categorical variable with four categories, called cingr, each Category representing a quartile of its distribution

↧

Seemingly Unrelated regression and Bootstrap

December 5, 2016, 8:49 am

≫ Next: recursive bivariate probit with instrumental variable

≪ Previous: Homework:/

Hello! I am using a seemingly unrelated regression to estimate the impacts of terrorist attacks on the tourism flows from four different nationalities. Although the independent variables are all the same across the equations, I decided to use this method because it yield different sandard errors than the individidual OLS regressions.

the model works like this:

. sureg (lYg = d1 T1e T2e T3e T1t T2t T3t LT1e LT2e LT3e LT1t LT2t LT3t L2T1e L2T2e L2T3e L2T1t L2T2t L2T3t) (lYf = d1 T1e T2e T3e T1t T2t T3t LT1e LT2e LT3e LT1t LT2t LT3t L2T1e L2T2e L2T3e L2T1t L2T2t L2T3t) (lYuk = d1 T1e T2e T3e T1t T2t T3t LT1e LT2e LT3e LT1t LT2t LT3t L2T1e L2T2e L2T3e L2T1t L2T2t L2T3t) (lYus = d1 T1e T2e T3e T1t T2t T3t LT1e LT2e LT3e LT1t LT2t LT3t L2T1e L2T2e L2T3e L2T1t L2T2t L2T3t), level(90) corr

However, I am pretty sure that my equations suffer from serial correlation (and possibly heteroskedasticity) and the SUR model does not account for that. I was wondering if using the bootstrap method would correct this problem.

I would strongly appreciate any advice on this matter as this is for my master's dissertation and a really want to get right.

Thanks in advance

↧

recursive bivariate probit with instrumental variable

December 5, 2016, 9:02 am

≫ Next: Outputting Results from Loop into Excel

≪ Previous: Seemingly Unrelated regression and Bootstrap

Hi statalist users
im going to use a recursive bivaraite probit model for estimate the factor that affect the decision to move for studies and for work (just after graduation).
mobility for study= x+b+c
mobility for work=mobility for study+b+c
than i want estimate the return (on the wage) from the mobility for work.

In this case the endogenus variable is the "mobility for work" (dummy variable).
In this case ia found ad instrumental variable (the x in the upper equation) that im going to exclude in the second eqaution (mobility for work)
The next step is to calculate the wage equation
ln w =a+mobility for work+error

if in understand well, the bivariate probit model is an alternative to the two stage least square where i need an instrumental variable in order to establish a causal relation between the mobility for work and wage.

i dont' understand if with the recursive bivariate probit, after the use of the step just described above, a i can use the predicted value of the recursive bivariate probit (i use the predicted value mobility for work==1) and insert the predicted value in the wage equation.

i have to use the the 2sls in order to control the result?

if somewone has confidence with this topic cold give an advice?
best regards
Alessandro

↧

Outputting Results from Loop into Excel

December 5, 2016, 9:05 am

≫ Next: Taylor estimation via sspace (Kalman filter)

≪ Previous: recursive bivariate probit with instrumental variable

Hi everyone,

I am trying to export the results from a loop I did using the ranksum function. I want to graph the the sums of the ranks for the two groups in the test. I am doing the test for all the variables in my data set, which consist of 11 different categories sampled in 9 waves (=99 variables), so entering the data into Excel would take a long time. Here is the code I have so far:

Code:

foreach var if varlist lang_0-gtotal_9 {
ranksum `var', by(treatment)
}

Thanks for the help.

↧

Taylor estimation via sspace (Kalman filter)

December 5, 2016, 9:05 am

≫ Next: Cases dropped in xtlogit

≪ Previous: Outputting Results from Loop into Excel

Hi guys,

I have an issue. Assuming that the trend growth rate (g) of gdp equals the real interest rate and the inflation trend (i_trend) is just a crude proxy for the inflation target and there are changes in the inflation target over time. I want to estimate these changes in interest rate policy.

The model from Flaig and Wollmershäuser (2007) is the following:
Array

where
Array Array

all variables except c and epsilon are observed.
I'm not very familiar with the Kalman filter. So I tried to understand it and read some examples and papers but I failed to estimate c in stata. Do you have any advise how the stata command should look like?

↧

Cases dropped in xtlogit

December 5, 2016, 9:40 am

≫ Next: twoway loop - how to make title out of the country name + country code?

≪ Previous: Taylor estimation via sspace (Kalman filter)

Hi All,

I'm using xtlogit, fe. Some cases are dropped. I wonder how I can figure out exactly which cases are dropped. I want to make sure that the fixed-effects model didn't lead to a selective sample.

note: multiple positive outcomes within groups encountered.
note: 3,861 groups (20,002 obs) dropped because of all positive or
all negative outcomes.

When I tried to identify those cases with all positive or all negative outcomes within each group, I used
bysort id:egen die=mean(next_die)
gen flag=1 if die==0|die==1
tab flag

But the number of cases from my calculation is always smaller than the number of cases dropped by xtlogit. Why?

Thanks!

↧

twoway loop - how to make title out of the country name + country code?

December 5, 2016, 10:45 am

≫ Next: Shrinking the database according to date

≪ Previous: Cases dropped in xtlogit

Hi,

So i have several panel units and i want to plot 2 different variables using lines, which i did using loop command.
my question is currently each graph's title is its country code, but i want it to be a combination of a country name and the country code (i.e. USA_111).
I am using this code, but it only comes out as "_111". Anyone?

qui levelsof ifs_code, local(allCn)
foreach cn of local allCn {
twoway (line RGDP year, lcolor(navy)) (line NGDP year, lcolor(orange_red)) if ifs_code == `cn', ///
title("`country'_`cn'") legend(label(1 "RGDP") label(2 "NGDP"))
graph export CompareGDP_`cn'.pdf, replace
}
end

↧

Shrinking the database according to date

December 5, 2016, 11:19 am

≫ Next: Splitting yearly panel data to monthly data in preparation for survival analysis

≪ Previous: twoway loop - how to make title out of the country name + country code?

Hi all,
I am rather new in STATA and I am stuck with a little problem.
I have a database in which a have a patient ID, a treatment date (%td) and different drug types that were administered (yes or no). The problem is that for each month I have different days that these drugs were administered (for example the first drug was administered on the 1st and the 15th of January, on the 2nd, 10th and 24th of February etc, the second on the 1st and the 15th of January, 3rd, 10th and 27th of February etc). What I would want is to reduce the number of observation - I am interested in knowing if a specific drug was administered in a month - I would like to have only a single observation per month, with only 1 values per drug (yes or no) if that drug was administered in that particular month.

Thank you,
Dimi

↧

Splitting yearly panel data to monthly data in preparation for survival analysis

December 5, 2016, 11:40 am

≫ Next: Homework policy statement added at http://www.statalist.org/forums/help#adviceextras

≪ Previous: Shrinking the database according to date

Hello All,

I'd appreciate getting some help please:
How can I split my yearly panel data to monthly data in preparation for survival analysis?
In the dataset below, 'Fail' would be the failure event variable, and so for the split data, for the firms that fail, the fail indicator (1) will only appear in the last month while other months will be a '0'. Other yearly data will be replicated for all the months in that year.

LFE_REFYR is year; OP_ID is the company ID; T4_NUM_EMP is the number of employees.

Thank you

Kele

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input int(LFE_REFYR CD CSD NAICS OP_ID) byte Fail int BIRTH_DATE byte T4_NUM_EMP
2000 2113 31106 111 12476 0 2000 29
2001 2113 31106 111 12476 0 2000 40
2002 2113 31106 111 12476 0 2000 22
2003 2113 31106 111 12476 0 2000 32
2004 2113 31106 111 12476 1 2000 34
2000 2112 31105 334 12477 0 2000 38
2001 2112 31105 334 12477 0 2000 33
2002 2112 31105 334 12477 0 2000 39
2003 2112 31105 334 12477 0 2000 24
2004 2112 31105 334 12477 0 2000 21
2000 2112 31105 517 12478 0 2000 32
2001 2112 31105 517 12478 0 2000 30
2002 2112 31105 517 12478 0 2000 24
2003 2112 31105 517 12478 0 2000 30
2004 2112 31105 517 12478 1 2000 35
2000 2112 31104 111 12479 0 2000 26
2001 2112 31104 111 12479 0 2000 39
2002 2112 31104 111 12479 0 2000 38
2003 2112 31104 111 12479 0 2000 26
2004 2112 31104 111 12479 1 2000 40
2000 2112 31104 112 12480 0 2000 34
2001 2112 31104 112 12480 0 2000 23
2002 2112 31104 112 12480 0 2000 32
2003 2112 31104 112 12480 0 2000 21
2004 2112 31104 112 12480 0 2000 36
2000 2111 31103 721 12481 0 2000 35
2001 2111 31103 721 12481 0 2000 20
2002 2111 31103 721 12481 0 2000 38
2003 2111 31103 721 12481 0 2000 23
2004 2111 31103 721 12481 1 2000 25
end

↧

Homework policy statement added at http://www.statalist.org/forums/help#adviceextras

December 5, 2016, 12:58 pm

≫ Next: Labeling Variables Using A Loop

≪ Previous: Splitting yearly panel data to monthly data in preparation for survival analysis

We have reinstated a formal statement of policy advising that people do not ask or answer homework questions. It is at http://www.statalist.org/forums/help#adviceextras

If you want to know more, read as far as you feel inclined.

Some recent threads have posted homework questions and asked for help.

All the replies that I have seen from experienced members explained a personal policy of not helping with homework. (In addition, and worth noting, is the obvious fact that many people just ignored the requests.)

Statalist has been in existence since 1994 and has posted generic advice as its FAQ Advice in one form or another for most of that time. Sometimes there are little flurries of one kind of behaviour that seems inappropriate and we post advice of the form "Please don't do X, and here's why" (with whatever positive advice we can add).

When modified like this the FAQ Advice tends to grow and in turn every now and again we slim it down, cutting out advice that no longer seems of primary importance.

We're all busy people and those who maintain the FAQ, at present myself but in consultation with several others, want the FAQ Advice to be concise but also comprehensive, ideals that are difficult to satisfy together. (Some may sympathise with the once prominent member, long since departed, who complained first that the FAQ was badly written and then that he hadn't had the time ever to read it.)

By 2014 when Statalist was re-launched as this forum, requests to do homework had become rare, so we cut it out. Now that there has been a new flurry of such questions, we thought we should reinstate such a policy statement if only so that people could point to a policy.

This has not seemed contentious in the past, but the world is full of surprises.

Who is we? would be a fair question here.

In a strong sense, Statalist is run by StataCorp and we all depend on that. StataCorp are responsible for occasional sharp decisions that you may never have even noticed, such as removing spam (very rare) and removing very offensive posts (even rarer, less than once per year). Although I am billed as FAQ maintainer, I could not change a single word of the FAQ without passing it by StataCorp, so there are checks and balances.

In another equally strong sense, Statalist is just run by its members. Posting a question and trying to follow guidelines is itself a positive contribution to the list. Asking someone else to post a dataset example or show their code or anything like that is a positive contribution to the list. Answering questions is a positive contribution! And so on. Everyone who contributes positively, which means almost everybody, helps to run the list.

But if you disagree with our practices, or our principles, then what could you do?

1. Post on Statalist and explain why you disagree. Then the discussion begins.

2. Post directly to the list administrators. That's a good method if you have a complaint about named individuals.

3. Find a forum more congenial to you. That's not meant aggressively! Statalist doesn't try to suit all tastes and its presumptions about what you should and should not do may seem ill-judged to you. Other forums such as Cross Validated, Stack Overflow, TalkStats or Quora may work more to your taste. Note that many people here participate in some of those forums, so commitment to Statalist doesn't rule out anything else.

↧

Labeling Variables Using A Loop

December 5, 2016, 1:15 pm

≫ Next: merge with dates within time frames

≪ Previous: Homework policy statement added at http://www.statalist.org/forums/help#adviceextras

Hi All,

I am working with a dataset that has employment data from 1969 to 2015 in different classifications (i.e. Private Employment 1969, 1970, 1971.....2015; Total Employment 1969, 1970...2015). Each of the yearly employment variables is stored in a different variable. I would like to label each variable with the correct employment classification and the correct year. So, for Private Employment I want to label all my Private Employment variables with a constant label but with the year changing so that my variable for Private Employment 1969 (p1969) is labeled "Private Employment 1969 and my private employment 2015 (p2015) is labeled "Private Employment 2015".

I have tried a couple of loops using foreach, forvalues, and setting a local macro but I can't quite get it. Does anyone have any advice on this one?

Thanks,

jack

↧

merge with dates within time frames

December 5, 2016, 2:17 pm

≫ Next: Manipulating Y axis using "stripplot" - getting rid of extra space/controlling margins

≪ Previous: Labeling Variables Using A Loop

Hello. Hope someone could help me.-

I have 3 datasets related to a large group of patients, divided into:

SET A. patients (id for patients) with demographic data
SET B. treatments (with id for treatments) AND ID FOR PATIENTS (key variable)
SET C. adverse events (with id for adverse events) and ID FOR PATIENTS

Each patient has many treatments with start and finish dates. Also each treatment may or may not have 1 or many adverse event/s (with a date that falls within a treatment).

Problem: I need to match each adverse event to the corresponding treatment.

I can merge SET C -events - into the SET B -treatments- or viceversa. the key variable is ID PATIENT.

Which is the command for doing this? Some time ago, someone has kindly sent me some codes, but this time, they don't seem to work.

Thanks so much in advance

The structures of the databases are:

A. Patients:

id patient	sex	age	diagnosis
1	0	20	1
2	1	34	2

B. Treatments

id treatment	id patient	start date	finish date
1	1	1jan2014	20march2015
2	2	1march2013	20jun2013
3	2	21jun2013	.

C. Adverse events.

id adverse event	id patient	date adverse event
1	2	5march2013
2	1	2sep2014
3	1	3dec2014

What I need:

ID treatment	start date	finish date	ID event	date event

↧