Several times, I 've seen text apparently posted between CODE delimiters that does not display as code. For an example, see: post #5 here
↧
Failure of CODE delimiters
↧
outreg2 how to show the "string" name of dummy variables
Dear all,
I'm working on a regression with many dummy variables and I want to report my result in excel using outreg2.
Everything looks fine, except the variable names.
My dummy variable is country, has observations like US, UK, Canada and is stored in "string"
In order to run the regression, I first change it to float.
Then run the regression
Then present the result in excel
What appears in the stata is
But what outreg2 shows is
is
What can I do to change the name of dummy variables in outreg2 output?
Thanks in advance.
Any help will be highly appreciated.
I'm working on a regression with many dummy variables and I want to report my result in excel using outreg2.
Everything looks fine, except the variable names.
My dummy variable is country, has observations like US, UK, Canada and is stored in "string"
In order to run the regression, I first change it to float.
Code:
egen country1 = group(cooutry), label
Code:
regress Y X i.country1
Code:
outreg2 using regression.xls, replace
Y |
X |
US |
UK |
Canada |
is
Y |
X |
1.country1 |
2.country1 |
3.country1 |
Thanks in advance.
Any help will be highly appreciated.
↧
↧
New version of cprdutil on SSC
Thanks once again to Kit Baum, a new version of the cprdutil package is now available for download from SSC. In Stata, use the ssc command to do this, or adoupdate if you already have an old version of cprdutil.
The cprdutil package is described as below on my website. In the new version, I have removed from the component ado-files most of the tabulations, summarisations and assert checks, which may be time-consuming in large datasets. The user therefore has the freedom, and the responsibility, to add whatever tabulations, summarisations and/or assert checks the user wishes to add. However, the modules producing keyed datasets still check that the key variables are nonmissing, and identify the observations uniquely.
I am currently planning further developments to the cprdutil package, to be added in due course.
Best wishes
Roger
-----------------------------------------------------------------------------
package cprdutil from http://www.imperial.ac.uk/nhli/r.newson/stata13
-----------------------------------------------------------------------------
TITLE
cprdutil: Utilities for inputting CPRD datasets into Stata
DESCRIPTION/AUTHOR(S)
The cprdutil package is a suite of utility programs for inputting
text datasets produced by the Clinical Practice Research Datalink
(CPRD), and outputting Stata datasets and/or do-files to create value
labels. CPRD text datasets may contain XYZ lookup tables, non-XYZ
lookup tables, or non-lookup datasets, with one text row for each of
a set of things of a kind known to the primary-care sector of the
British Health Service, such as patients, primary-care practices,
clinical events or prescriptions. All of these may be output into
Stata datasets, with one output observation per input text row. An
XYZ lookup table may alternatively be translated into a set of Stata
value labels, created using a generated do-file, to be assigned to
variables in other Stata datasets after running the do-file in the
dataset. The cprdutil package uses the SSC packages keyby, lablist,
chardef, intext and msdirb, which need to be installed for cprdutil
to work.
Author: Roger Newson
Distribution-Date: 22march2016
Stata-Version: 13
INSTALLATION FILES (click here to install)
cprd_additional.ado
cprd_batchnumber.ado
cprd_bnfcodes.ado
cprd_clinical.ado
cprd_common_dosages.ado
cprd_consultation.ado
cprd_entity.ado
cprd_immunisation.ado
cprd_medical.ado
cprd_nonxyzlookup.ado
cprd_packtype.ado
cprd_patient.ado
cprd_practice.ado
cprd_product.ado
cprd_referral.ado
cprd_scoremethod.ado
cprd_staff.ado
cprd_test.ado
cprd_therapy.ado
cprd_xyzlookup.ado
cprd_additional.sthlp
cprd_batchnumber.sthlp
cprd_bnfcodes.sthlp
cprd_clinical.sthlp
cprd_common_dosages.sthlp
cprd_consultation.sthlp
cprd_entity.sthlp
cprd_immunisation.sthlp
cprd_medical.sthlp
cprd_nonxyzlookup.sthlp
cprd_packtype.sthlp
cprd_patient.sthlp
cprd_practice.sthlp
cprd_product.sthlp
cprd_referral.sthlp
cprd_scoremethod.sthlp
cprd_staff.sthlp
cprd_test.sthlp
cprd_therapy.sthlp
cprd_xyzlookup.sthlp
cprdutil.sthlp
-----------------------------------------------------------------------------
(click here to return to the previous screen)
The cprdutil package is described as below on my website. In the new version, I have removed from the component ado-files most of the tabulations, summarisations and assert checks, which may be time-consuming in large datasets. The user therefore has the freedom, and the responsibility, to add whatever tabulations, summarisations and/or assert checks the user wishes to add. However, the modules producing keyed datasets still check that the key variables are nonmissing, and identify the observations uniquely.
I am currently planning further developments to the cprdutil package, to be added in due course.
Best wishes
Roger
-----------------------------------------------------------------------------
package cprdutil from http://www.imperial.ac.uk/nhli/r.newson/stata13
-----------------------------------------------------------------------------
TITLE
cprdutil: Utilities for inputting CPRD datasets into Stata
DESCRIPTION/AUTHOR(S)
The cprdutil package is a suite of utility programs for inputting
text datasets produced by the Clinical Practice Research Datalink
(CPRD), and outputting Stata datasets and/or do-files to create value
labels. CPRD text datasets may contain XYZ lookup tables, non-XYZ
lookup tables, or non-lookup datasets, with one text row for each of
a set of things of a kind known to the primary-care sector of the
British Health Service, such as patients, primary-care practices,
clinical events or prescriptions. All of these may be output into
Stata datasets, with one output observation per input text row. An
XYZ lookup table may alternatively be translated into a set of Stata
value labels, created using a generated do-file, to be assigned to
variables in other Stata datasets after running the do-file in the
dataset. The cprdutil package uses the SSC packages keyby, lablist,
chardef, intext and msdirb, which need to be installed for cprdutil
to work.
Author: Roger Newson
Distribution-Date: 22march2016
Stata-Version: 13
INSTALLATION FILES (click here to install)
cprd_additional.ado
cprd_batchnumber.ado
cprd_bnfcodes.ado
cprd_clinical.ado
cprd_common_dosages.ado
cprd_consultation.ado
cprd_entity.ado
cprd_immunisation.ado
cprd_medical.ado
cprd_nonxyzlookup.ado
cprd_packtype.ado
cprd_patient.ado
cprd_practice.ado
cprd_product.ado
cprd_referral.ado
cprd_scoremethod.ado
cprd_staff.ado
cprd_test.ado
cprd_therapy.ado
cprd_xyzlookup.ado
cprd_additional.sthlp
cprd_batchnumber.sthlp
cprd_bnfcodes.sthlp
cprd_clinical.sthlp
cprd_common_dosages.sthlp
cprd_consultation.sthlp
cprd_entity.sthlp
cprd_immunisation.sthlp
cprd_medical.sthlp
cprd_nonxyzlookup.sthlp
cprd_packtype.sthlp
cprd_patient.sthlp
cprd_practice.sthlp
cprd_product.sthlp
cprd_referral.sthlp
cprd_scoremethod.sthlp
cprd_staff.sthlp
cprd_test.sthlp
cprd_therapy.sthlp
cprd_xyzlookup.sthlp
cprdutil.sthlp
-----------------------------------------------------------------------------
(click here to return to the previous screen)
↧
price elasticity
I am trying to get the price elasticity of data from a choice experiment where an individual makes repeated choices my data looks like the clip below
Array
i run this code
asclogit choice price, case(hhid) alternatives(brand) casevars(age income)
i get this error message
variable brand has replicate levels for one or more cases; this is not allowed
how do i fix it
↧
move a column up
how i can do this
title |
1 |
2 |
3 |
4 |
5 |
6 |
title |
2 |
3 |
4 |
5 |
6 |
. |
↧
↧
How to remove growth rate for observations when they do not exist
I have data that is a longer version of thethe following:
Here, A is a lender and B is the borrower (there are many more lenders and borrowers). Here A lends B Amount X in year Y. I have been able to declare my data set as a panel, and am in the process of calculating growth rates of the amount of the loan. This would involve first differencing, dividing by the value at the beginning and multiplying by 100. However, for many individuals, they do not borrow an amount in year 1 but do in year 2. Of course, the growth rate is infinite. I do not know how to deal with such observations (probably should be deleted as the growth rate does not exist). Any help is much appreciated.
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input str1(Lender Borrower) float(Year Amount) "A" "B" 1979 1000 "A" "B" 1980 10000 "A" "B" 1981 20000 "A" "B" 1982 23444 end
Here, A is a lender and B is the borrower (there are many more lenders and borrowers). Here A lends B Amount X in year Y. I have been able to declare my data set as a panel, and am in the process of calculating growth rates of the amount of the loan. This would involve first differencing, dividing by the value at the beginning and multiplying by 100. However, for many individuals, they do not borrow an amount in year 1 but do in year 2. Of course, the growth rate is infinite. I do not know how to deal with such observations (probably should be deleted as the growth rate does not exist). Any help is much appreciated.
↧
Is there a way to use a varlist local macro as the row or column names of a matrix?
I am looking to take all of the variable names of the variables in my data that start with p, and use those as some of the column names in my matrix. So far, I have:
but I am getting error r(198), where it is saying that the string in plist is an invalid name. It seems to me to be trying to assign all of the variable names saved in plist as one large name for a column in the matrix. I would like it to take each component of the string (where components are separated by spaces) as a separate column name. I am guessing it can be done with the right combination of quotes and maybe tokenize?
I am an R user just using Stata to run these regressions. An alternative solution would be another way (non-matrix way) to output a compact dataset of the coefficient values, where the row names are the dependent variable names (I am using sureg for the regressions), the column names are the independent variable names, and the values are the appropriate coefficients. I'm just trying to get the coefficients back into R where I will perform some more calculations with them, and this was the best way I have been able to find to export them. Let me know if you have any suggestions, thanks!
Code:
ds p* local plist=r(varlist) ... matrix colname B_coeffs = lnxp "`plist'" Constant
I am an R user just using Stata to run these regressions. An alternative solution would be another way (non-matrix way) to output a compact dataset of the coefficient values, where the row names are the dependent variable names (I am using sureg for the regressions), the column names are the independent variable names, and the values are the appropriate coefficients. I'm just trying to get the coefficients back into R where I will perform some more calculations with them, and this was the best way I have been able to find to export them. Let me know if you have any suggestions, thanks!
↧
Create economic indicator merging variables
Dear Stata community,
I would like to generate an indicator for real estate price. The data I use to explain are migration, mortgage prices and consumer mood index.
What I've done so far are xcorr and a principal component analysis. The trouble now is how to merge these three variables into one indicator that I can use to predict real estate prices. I have searched this and other forums but with no success. What can I do?
Thank you very much for your help in advance,
Best
Martin
I would like to generate an indicator for real estate price. The data I use to explain are migration, mortgage prices and consumer mood index.
What I've done so far are xcorr and a principal component analysis. The trouble now is how to merge these three variables into one indicator that I can use to predict real estate prices. I have searched this and other forums but with no success. What can I do?
Thank you very much for your help in advance,
Best
Martin
↧
Cleaning strings
Hi Statalist,
I have a messy string variable that's 3-16 numbers (yes, I could destring it). The variable is 4-14 characters long, begins with either 07 or 08 (ignore the one beginning with 01) and looks something like:
0174
07080301010040
070804020056
080301040224
0803020185
For all observations > 12 characters, I want to remove the 07 from the beginning. I'm not sure how to do this with varying character lengths but I've been tinkering with subinstr(). Any ideas?
I have a messy string variable that's 3-16 numbers (yes, I could destring it). The variable is 4-14 characters long, begins with either 07 or 08 (ignore the one beginning with 01) and looks something like:
0174
07080301010040
070804020056
080301040224
0803020185
For all observations > 12 characters, I want to remove the 07 from the beginning. I'm not sure how to do this with varying character lengths but I've been tinkering with subinstr(). Any ideas?
↧
↧
creating an indicator for matching observations
Hello,
My data includes an ID variable for an individual, and a variable for the ID of the person's siblings: sibling2, ..., sibling5 within the household. I also have a variable bullied = 0 if the individual was never bullied, 1 if the person was ever bullied in one time period, and 2 if the person was bullied in more than one time period. What I'd like to do is to create a variable, sibbullied = 1 if the individual (ID) has one or more siblings who has a bullied value of 1 or 2; 0 otherwise. I'd also like to create a value totsibbullied = total amount of siblings who had bullied values of 1 or 2; 0 otherwise. For example:
My data includes an ID variable for an individual, and a variable for the ID of the person's siblings: sibling2, ..., sibling5 within the household. I also have a variable bullied = 0 if the individual was never bullied, 1 if the person was ever bullied in one time period, and 2 if the person was bullied in more than one time period. What I'd like to do is to create a variable, sibbullied = 1 if the individual (ID) has one or more siblings who has a bullied value of 1 or 2; 0 otherwise. I'd also like to create a value totsibbullied = total amount of siblings who had bullied values of 1 or 2; 0 otherwise. For example:
ID | sibling2 | sibling3 | sibling4 | sibling5 | bullied | sibbullied | totsibbullied |
1 | 2 | 4 | . | . | 1 | 1 | 1 |
2 | 1 | 4 | . | . | 0 | 1 | 2 |
3 | . | . | . | . | 1 | . | . |
4 | 1 | 2 | . | . | 2 | 1 | 1 |
↧
State Space Model - Kalman Filter
I am currently working on market beta instability and my analysis is currently based on Kalman Filter, however I am having some difficulties into modelling time-variant coefficients in the observation equation in Stata.
Y(t) = X(t)B(t) + e(t) - Observation Equation
B(t) = Z*B(t-1) + u(t) - State Equation
I was wondering if someone managed to implement time-variant coefficients in Stata? specifically how to implement X(t) in the constraints section. I would be grateful if someone could point me in the right direction to solve this issue or advise me to use other software.
Y(t) = X(t)B(t) + e(t) - Observation Equation
B(t) = Z*B(t-1) + u(t) - State Equation
I was wondering if someone managed to implement time-variant coefficients in Stata? specifically how to implement X(t) in the constraints section. I would be grateful if someone could point me in the right direction to solve this issue or advise me to use other software.
↧
ivreg2 robustness check for xtabond2
Dear Stata users,
I am investigating the impact of regulations on financial stability by using xtabond2. The data is unbalanced panel of 10 countries with more than 200 banks. Following is the code:
xtabond2 Z l.Z llerner_bar lhhi_bar own1 own2 operation size L.linflation L.lgrowth L.loverallfreedomscore L.capreq L.restrict L.supervisor L.marketdisc L.llerner_capreq_bar L.llerner_restrict_bar L.llerner_supervisor_bar L.llerner_marketdisc_bar yr2001 yr2002 yr2003 yr2004 yr2005 yr2006 yr2007 yr2008 yr2009 yr2010 yr2011 yr2012, gmmstyle(L.Z, lag (1 1) collapse) gmmstyle(L.capreq L.restrict L.supervisor L.marketdisc, lag(2 2) collapse) ivstyle(llerner_bar lhhi_bar size own1 own2 operation L.linflation L.lgrowth L.loverallfreedomscore L.llerner_capreq_bar L.llerner_restrict_bar L.llerner_supervisor_bar L.llerner_marketdisc_bar yr2001 yr2002 yr2003 yr2004 yr2005 yr2006 yr2007 yr2008 yr2009 yr2010 yr2011 yr2012) twostep robust orthogonal
The mean of some variables were subtracted to minimise the correlation among variables (with _bar). However, the reviewer wants to conduct a robustness check by using an alternative estimator. Since I have endogenous variables (i.e. L.capreq L.restrict L.supervisor L.marketdisc) I have been using ivreg2 with the following code:
ivreg2 Z L.Z llerner_bar lhhi_bar own1 own2 operation size L.linflation L.lgrowth L.loverallfreedomscore L.llerner_capreq_bar L.llerner_restrict_bar L.llerner_supervisor_bar L.llerner_marketdisc_bar yr2001 yr2002 yr2003 yr2004 yr2005 yr2006 yr2007 yr2008 yr2009 yr2010 yr2011 yr2012 (L.capreq L.restrict L.supervisor L.marketdisc = L2.capreq L2.restrict L2.supervisor L2.marketdisc), gmm cluster(id)
However, ivreg2 has provided different results. I have tried various scenarios, but all produced different results. I have been searching, but failed to find any information that would help me identify what I did wrong with ivreg2. Thank you for your help.
Regards,
Ruslan
I am investigating the impact of regulations on financial stability by using xtabond2. The data is unbalanced panel of 10 countries with more than 200 banks. Following is the code:
xtabond2 Z l.Z llerner_bar lhhi_bar own1 own2 operation size L.linflation L.lgrowth L.loverallfreedomscore L.capreq L.restrict L.supervisor L.marketdisc L.llerner_capreq_bar L.llerner_restrict_bar L.llerner_supervisor_bar L.llerner_marketdisc_bar yr2001 yr2002 yr2003 yr2004 yr2005 yr2006 yr2007 yr2008 yr2009 yr2010 yr2011 yr2012, gmmstyle(L.Z, lag (1 1) collapse) gmmstyle(L.capreq L.restrict L.supervisor L.marketdisc, lag(2 2) collapse) ivstyle(llerner_bar lhhi_bar size own1 own2 operation L.linflation L.lgrowth L.loverallfreedomscore L.llerner_capreq_bar L.llerner_restrict_bar L.llerner_supervisor_bar L.llerner_marketdisc_bar yr2001 yr2002 yr2003 yr2004 yr2005 yr2006 yr2007 yr2008 yr2009 yr2010 yr2011 yr2012) twostep robust orthogonal
The mean of some variables were subtracted to minimise the correlation among variables (with _bar). However, the reviewer wants to conduct a robustness check by using an alternative estimator. Since I have endogenous variables (i.e. L.capreq L.restrict L.supervisor L.marketdisc) I have been using ivreg2 with the following code:
ivreg2 Z L.Z llerner_bar lhhi_bar own1 own2 operation size L.linflation L.lgrowth L.loverallfreedomscore L.llerner_capreq_bar L.llerner_restrict_bar L.llerner_supervisor_bar L.llerner_marketdisc_bar yr2001 yr2002 yr2003 yr2004 yr2005 yr2006 yr2007 yr2008 yr2009 yr2010 yr2011 yr2012 (L.capreq L.restrict L.supervisor L.marketdisc = L2.capreq L2.restrict L2.supervisor L2.marketdisc), gmm cluster(id)
However, ivreg2 has provided different results. I have tried various scenarios, but all produced different results. I have been searching, but failed to find any information that would help me identify what I did wrong with ivreg2. Thank you for your help.
Regards,
Ruslan
↧
move a column down and delete last obs in column
i have this table
i need move down var1 and delete in var1 first and last obs.
date | var1 | var2 | |
1 | 02jan2014 | 10 | 3 |
2 | 02feb2014 | 20 | 6 |
3 | 02mar2014 | 30 | 9 |
4 | 02apr2014 | 40 | 12 |
5 | 02may2014 | 50 | 15 |
6 | 02jun2014 | 60 | 18 |
7 | 02jul2014 | 70 | 21 |
8 | 02aug2014 | 80 | 24 |
9 | 02sep2014 | 90 | 27 |
Code:
expand 2 in l sort date replace date = dofm(mofd(date)+1) in l
date | var1 | var2 | |
1 | 02jan2014 | . | 3 |
2 | 02feb2014 | 10 | 6 |
3 | 02mar2014 | 20 | 9 |
4 | 02apr2014 | 30 | 12 |
5 | 02may2014 | 40 | 15 |
6 | 02jun2014 | 50 | 18 |
7 | 02jul2014 | 60 | 21 |
8 | 02aug2014 | 70 | 24 |
9 | 02sep2014 | 80 | 27 |
10 | 02oct2014 | . | 27 |
i need move down var1 and delete in var1 first and last obs.
↧
↧
Testing nonlinearity in a MLM with xtmixed
I have been searching high and low for a straightforward answer to this question: I am using xtmixed to run a three-level regression. I suspect my data might be non-linear. How can I test whether there is a non-linear effect in this case? If there IS a non-linear effect, how do I proceed?
↧
Export Mata string matrix to Stata and Excel
Hello
I'm working with string and numeric vectors in mata, and I need to export them to Excel. For numeric vectors, I use the matrix conversion to Stata and then the putexcel command, but for the string case I don't know what to do, because Stata does not admit "string" vectors. So, any help will be welcome!
Best
I'm working with string and numeric vectors in mata, and I need to export them to Excel. For numeric vectors, I use the matrix conversion to Stata and then the putexcel command, but for the string case I don't know what to do, because Stata does not admit "string" vectors. So, any help will be welcome!
Best
↧
Using mi xeq for making new variables
Hi,
I am facing some problems in making a new variable in multiple imputed dataset.
I need to make a new variable (h) that is equal to 1 if x1-x5 are all equal to 1. Else =0 (excluding any missing on x1-x5, due to missing values in m=0 dataset).
I used the following commands, but the total observations of the variable ‘h’ in datasets m=1-100 is still equal to the total observations in m=0 dataset (basically these commands did nothing for those with missing values in datasets m=1-100).
mi xeq: gen h =1 if !missing(x1, x2, x3, x4, x5)
mi xeq: replace h=0 if (x1 == 1 & x2 == 1 & x3 == 1 & x4 == 1 & x5 == 1)
Can anyone tell how to fix this?
Thankfully,
Massao
I am facing some problems in making a new variable in multiple imputed dataset.
I need to make a new variable (h) that is equal to 1 if x1-x5 are all equal to 1. Else =0 (excluding any missing on x1-x5, due to missing values in m=0 dataset).
I used the following commands, but the total observations of the variable ‘h’ in datasets m=1-100 is still equal to the total observations in m=0 dataset (basically these commands did nothing for those with missing values in datasets m=1-100).
mi xeq: gen h =1 if !missing(x1, x2, x3, x4, x5)
mi xeq: replace h=0 if (x1 == 1 & x2 == 1 & x3 == 1 & x4 == 1 & x5 == 1)
Can anyone tell how to fix this?
Thankfully,
Massao
↧
Endogenous Price
I want to run a demand model (BLP) in nature how do i test if price is endogenous?
↧
↧
Rolling variance
Hello, I am using stata 12 and I woud like to make a rolling variance with a window of 3 for my variable Profitbeforetax.
Could anyone help me with a commando?
Thanks in advance
Victor
Could anyone help me with a commando?
Thanks in advance
Victor
↧
remove strange symbol from split string variable
I tried to split a string variable, there is a response including ""�"". How do I get rid of this weird symbol? The ultimate goal is to generate new 0-1 variable for each response.
. tab AttentionSeeker1
AttentionSeeker1 | Freq. Percent Cum.
------------------------------------+-----------------------------------
Causes class disruptions | 1,443 43.57 43.57
Other | 75 2.26 45.83
Talks at inappropriate times | 856 25.85 71.68
Wants teacher�s undivided attention | 938 28.32 100.00
------------------------------------+-----------------------------------
Total | 3,312 100.00
. tab AttentionSeeker1
AttentionSeeker1 | Freq. Percent Cum.
------------------------------------+-----------------------------------
Causes class disruptions | 1,443 43.57 43.57
Other | 75 2.26 45.83
Talks at inappropriate times | 856 25.85 71.68
Wants teacher�s undivided attention | 938 28.32 100.00
------------------------------------+-----------------------------------
Total | 3,312 100.00
↧
Estimating constants that solve a system of non-linear equations
Hi all, I have a question about how to solve a system of non-linear equations with
uncorrelated data. I've researched nl and nlsur for this purpose and I'm able to get
nlsur to work for a particular type of data set. Here are the details. I have
p_3days = 1-(1-p_1day*bias)*(1-p_1day*bias^2)*(1-p_1day*bias^3)
p_6days = 1-(1-p_1day*bias)*(1-p_1day*bias^2)*(1-p_1day*bias^3)*...*(1-p_1day*bias^6)
p_9days = 1-(1-p_1day*bias)*(1-p_1day*bias^2)*(1-p_1day*bias^3)*...*(1-p_1day*bias^9)
where p_3days,p_6days, and p_9days are 3 variables that I have yes/no (1/0) responses
for, for series of observations. p_1day and bias are constants that I want to
estimate.
Now, when there is data for p_3days, p_6days and p_9days for each observation, then
nlsur works fine and I get results. I made an evaluator function that works on
simulated data and estimates the constants very closely. e(V) has a nice covariance
matrix too that I can use.
However, the data I have in the real world is not repeated measure data. I only have
p_3days, p_6days and p_9days from independent samples. How do I set this up so I can
get estimates of my two constants that include a variance estimate? Yes, all the
covariances will be zero I think, but can't I still get an estimate of the variance of
p_1day and bias? I successfully set the problem up in mata using solvenl and received
the correct parameters, but this gives no information on how well the two constants
are estimated.
I've tried the nlsur approach using 3 different variables for the p_3days, p_6days and
p_9days with missing values in the variables for observations for which there is no
data, but I receive a no observations error when run with nlsur. I have example code
that creates a simulated data set below and walks you through the ways I have tried to
solve the problem. The data is set up in a way that represents how I think the data
generating process works. This gives repeated measures type data. As I said, nlsur
works fine on a data like this. After this, you will see an example of the type of
data I actually receive, and how I can't get things to work.
Perhaps another approach is to use mata solvenl to get a solution to the deterministic
equations:
.71 = 1-(1-p_1day*bias)*(1-p_1day*bias^2)*(1-p_1day*bias^3)
.79 = 1-(1-p_1day*bias)*(1-p_1day*bias^2)*(1-p_1day*bias^3)*...*(1-p_1day*bias^6)
.84 = 1-(1-p_1day*bias)*(1-p_1day*bias^2)*(1-p_1day*bias^3)*...*(1-p_1day*bias^9)
as an example with estimates from the data for p_3day, etc., then use proportion to
estimate the variance of p_3days, p_6days and p_9days and apply the algebraic solution
(found from symbolic computation software like Maple or Matlab) using nlcom to get the
variance of the bias and p_1day? Depending on how I solve them I might have 2 or three
estimates for p_1day and bias, that I could then take the average of? I'm not sure but
it seems like nl or nlsur would be much easier.
Any help on this would be greatly appreciated.
Matt Hurst
Public Health Agency of Canada
*Example code using Stata 13.1
*Make a simulated data set
clear
set obs 10000
*parameters
local p = .5
local b1= .9
local b2= .7
*make variables
*Period 1. Can be thought of as days 1 to 6
forvalues i=1/6 {
capture drop d`i' b`i' p`i'
gen d`i' = 0
gen b`i' = 0
gen p`i' = 0
replace p`i' = 1 if uniform() <= `p'
replace b`i' = 1 if uniform() <= `b1'^`i'
replace d`i' = p`i'*b`i'
}
*Period 2. Can be thought of as days 7 to 9
*Note use of a b2 instead of b1
forvalues i=7/9 {
capture drop d`i' b`i' p`i'
gen d`i' = 0
gen b`i' = 0
gen p`i' = 0
replace p`i' = 1 if uniform() <= `p'
replace b`i' = 1 if uniform() <= `b2'^`i'
replace d`i' = p`i'*b`i'
}
capture drop d1_3 d1_6 d1_9
gen double d1_3 = 0
replace d1_3 = 1 if d1+d2+d3 >= 1
gen double d1_6 = 0
replace d1_6 = 1 if d1+d2+d3+d4+d5+d6 >= 1
gen double d1_9 = 0
replace d1_9 = 1 if d1+d2+d3+d4+d5+d6+d7+d8+d9 >= 1
sum d*
*Overall mean values, to verify what we should roughly get
capture drop p_tot
gen double p_tot = (p1+p2+p3+p4+p5+p6+p7+p8+p9)/9
mean b*
matrix A = e(b)
scalar A1 = A[1,1]
scalar A2 = A[1,2]
scalar A3 = A[1,3]
scalar A4 = A[1,4]
scalar A5 = A[1,5]
scalar A6 = A[1,6]
scalar A7 = A[1,7]
scalar A8 = A[1,8]
scalar A9 = A[1,9]
capture drop b1_tot
gen double b1_tot = (A1+A2^(1/2)+A3^(1/3)+A4^(1/4)+A5^(1/5)+A6^(1/6))/6
capture drop b2_tot
gen double b2_tot = (A7^(1/7)+A8^(1/8)+A9^(1/9))/3
sum b?_tot
*Example solving for p=p_1day, b1=bias in period 1, b2=bias in period 2
*Example that works. Repeated measure data.
capture program drop nlsursolver_mh3_6_9
program nlsursolver_mh3_6_9
syntax varlist(min=1 max=3) [if], at(name)
local y1 : word 1 of `varlist'
local y2 : word 2 of `varlist'
local y3 : word 3 of `varlist'
tempname p b1 b2
scalar `p' = `at'[1,1]
scalar `b1' = `at'[1,2]
scalar `b2' = `at'[1,3]
replace `y1' = 1-(1-`p'*`b1')*(1-`p'*`b1'^2)*(1-`p'*`b1'^3)
replace `y2' = 1-(1-`p'*`b1')*(1-`p'*`b1'^2)*(1-`p'*`b1'^3)*(1-`p'*`b1'^4)*(1-`p'*`b1'^5)*(1-`p'*`b1'^6)
replace `y3' = 1-(1-`p'*`b1')*(1-`p'*`b1'^2)*(1-`p'*`b1'^3)*(1-`p'*`b1'^4)*(1-`p'*`b1'^5)*(1-`p'*`b1'^6)*(1-`p'*`b2'^7)*(1-`p'*`b2'^8)*(1-`p'*`b2'^9)
end
*Use nlsur to solve the system of equations in the evaluator program
nlsur solver_mh3_6_9 @ d1_3 d1_6 d1_9, parameters(p b1 b2) initial(p 0.4 b1 0.4 b2 0.4) nequations(3)
*Usually works very well.
*Convergence is acheived and estimates are correct.
*The data I actually get is from independent samples.
*So, to similuate that, I have created dlong and stack_ind
*Note, the fact that they all have 10,000 observations is purely
*coincidental. The real data could have 400 observations, or 600
*for instance for the variables d1_3, d1_6, d1_9.
capture drop dlong
set obs 30000
gen double dlong = d1_3
replace dlong = d1_6[_n-10000] if _n >= 10001
replace dlong = d1_9[_n-20000] if _n >= 20001
capture drop stack_ind
gen byte stack_ind = 1
replace stack_ind = 2 if _n >= 10001
replace stack_ind = 3 if _n >= 20001
*Now make separate variables for d1_3 d1_6 d1_9
capture drop dlong_1_3
gen double dlong_1_3 = d1_3
capture drop dlong_1_6
gen double dlong_1_6 = dlong if stack_ind == 2
capture drop dlong_1_9
gen double dlong_1_9 = dlong if stack_ind == 3
*verification
tab dlong stack_ind, col
*From here I don't know what command to use or what variables to use
*I tried using nlsur with 3 uncorrelated variables
nlsur solver_mh3_6_9 @ dlong_1_3 dlong_1_6 dlong_1_9, parameters(p b1 b2) initial(p 0.4 b1 0.4 b2 .4) nequations(3)
/* error text
(obs = 0)
cannot have fewer observations than parameters
r(2001);
*/
*I also tried nlsur using 1 variable.
*Note use of in statement at end of equation evaluation line
capture program drop nlsursolver_mh3_6_9_long
program nlsursolver_mh3_6_9_long
syntax varlist(min=1 max=3) [if], at(name)
local y1 : word 1 of `varlist'
local y2 : word 2 of `varlist'
local y3 : word 3 of `varlist'
tempname p b1 b2
scalar `p' = `at'[1,1]
scalar `b1' = `at'[1,2]
scalar `b2' = `at'[1,3]
replace `y1' = 1-(1-`p'*`b1')*(1-`p'*`b1'^2)*(1-`p'*`b1'^3)*(1-`p'*`b1'^4)*(1-`p'*`b1'^5)*(1-`p'*`b1'^6) in 1/10000
replace `y2' = 1-(1-`p'*`b1')*(1-`p'*`b1'^2)*(1-`p'*`b1'^3)*(1-`p'*`b1'^4)*(1-`p'*`b1'^5)*(1-`p'*`b1'^6) in 10001/20000
replace `y3' = 1-(1-`p'*`b1')*(1-`p'*`b1'^2)*(1-`p'*`b1'^3)*(1-`p'*`b1'^4)*(1-`p'*`b1'^5)*(1-`p'*`b1'^6)*(1-`p'*`b2'^7)*(1-`p'*`b2'^8)*(1-`p'*`b2'^9) in 20001/30000
end
nlsur solver_mh3_6_9_long @ dlong dlong dlong, parameters(p b1 b2) initial(p 0.5 b1 0.5 b2 0.5) nequations(3)
/* error
(obs = 30000)
Calculating NLS estimates...
(10000 real changes made)
(10000 real changes made)
(10000 real changes made)
could not evaluate equation 1
starting values invalid or some RHS variables have missing values
r(480);
*/
*Then I thought maybe there is a way to use nl
capture program drop nlsolver_mh3_6_9_long
program nlsolver_mh3_6_9_long
syntax varlist(min=1 max=1) [if], at(name)
local y1 : word 1 of `varlist'
tempname p b1 b2
scalar `p' = `at'[1,1]
scalar `b1' = `at'[1,2]
scalar `b2' = `at'[1,3]
replace `y1' = 1-(1-`p'*`b1')*(1-`p'*`b1'^2)*(1-`p'*`b1'^3)*(1-`p'*`b1'^4)*(1-`p'*`b1'^5)*(1-`p'*`b1'^6) in 1/10000
replace `y1' = 1-(1-`p'*`b1')*(1-`p'*`b1'^2)*(1-`p'*`b1'^3)*(1-`p'*`b1'^4)*(1-`p'*`b1'^5)*(1-`p'*`b1'^6) in 10001/20000
replace `y1' = 1-(1-`p'*`b1')*(1-`p'*`b1'^2)*(1-`p'*`b1'^3)*(1-`p'*`b1'^4)*(1-`p'*`b1'^5)*(1-`p'*`b1'^6)*(1-`p'*`b2'^7)*(1-`p'*`b2'^8)*(1-`p'*`b2'^9) in 20001/30000
end
nl solver_mh3_6_9_long @ dlong, parameters(p b1 b2) initial(p 0.4 b1 0.4 b2 0.4)
*This runs, but two coefficients are not in the right range and
*model blew up for estimating the variance.
*Forcing the b2 term to be 0 and 1:
capture program drop nlsolver_mh3_6_9_long
program nlsolver_mh3_6_9_long
syntax varlist(min=1 max=1) [if], at(name)
local y1 : word 1 of `varlist'
tempname p b1 b2 bnew
scalar `p' = `at'[1,1]
scalar `b1' = `at'[1,2]
scalar `b2' = `at'[1,3]
scalar `bnew' = 1/(1+exp(`b2'))
replace `y1' = 1-(1-`p'*`b1')*(1-`p'*`b1'^2)*(1-`p'*`b1'^3)*(1-`p'*`b1'^4)*(1-`p'*`b1'^5)*(1-`p'*`b1'^6) in 1/10000
replace `y1' = 1-(1-`p'*`b1')*(1-`p'*`b1'^2)*(1-`p'*`b1'^3)*(1-`p'*`b1'^4)*(1-`p'*`b1'^5)*(1-`p'*`b1'^6) in 10001/20000
replace `y1' = 1-(1-`p'*`b1')*(1-`p'*`b1'^2)*(1-`p'*`b1'^3)*(1-`p'*`b1'^4)*(1-`p'*`b1'^5)*(1-`p'*`b1'^6)*(1-`p'*`bnew'^7)*(1-`p'*`bnew'^8)*(1-`p'*`bnew'^9) in 20001/30000
end
nl solver_mh3_6_9_long @ dlong, parameters(p b1 b2) initial(p 0.4 b1 0.4 b2 0.4)
*I'm just getting the boundary of 0, because it once it to be negative
*so I don't think I'm solving what I'm intending to solve here
uncorrelated data. I've researched nl and nlsur for this purpose and I'm able to get
nlsur to work for a particular type of data set. Here are the details. I have
p_3days = 1-(1-p_1day*bias)*(1-p_1day*bias^2)*(1-p_1day*bias^3)
p_6days = 1-(1-p_1day*bias)*(1-p_1day*bias^2)*(1-p_1day*bias^3)*...*(1-p_1day*bias^6)
p_9days = 1-(1-p_1day*bias)*(1-p_1day*bias^2)*(1-p_1day*bias^3)*...*(1-p_1day*bias^9)
where p_3days,p_6days, and p_9days are 3 variables that I have yes/no (1/0) responses
for, for series of observations. p_1day and bias are constants that I want to
estimate.
Now, when there is data for p_3days, p_6days and p_9days for each observation, then
nlsur works fine and I get results. I made an evaluator function that works on
simulated data and estimates the constants very closely. e(V) has a nice covariance
matrix too that I can use.
However, the data I have in the real world is not repeated measure data. I only have
p_3days, p_6days and p_9days from independent samples. How do I set this up so I can
get estimates of my two constants that include a variance estimate? Yes, all the
covariances will be zero I think, but can't I still get an estimate of the variance of
p_1day and bias? I successfully set the problem up in mata using solvenl and received
the correct parameters, but this gives no information on how well the two constants
are estimated.
I've tried the nlsur approach using 3 different variables for the p_3days, p_6days and
p_9days with missing values in the variables for observations for which there is no
data, but I receive a no observations error when run with nlsur. I have example code
that creates a simulated data set below and walks you through the ways I have tried to
solve the problem. The data is set up in a way that represents how I think the data
generating process works. This gives repeated measures type data. As I said, nlsur
works fine on a data like this. After this, you will see an example of the type of
data I actually receive, and how I can't get things to work.
Perhaps another approach is to use mata solvenl to get a solution to the deterministic
equations:
.71 = 1-(1-p_1day*bias)*(1-p_1day*bias^2)*(1-p_1day*bias^3)
.79 = 1-(1-p_1day*bias)*(1-p_1day*bias^2)*(1-p_1day*bias^3)*...*(1-p_1day*bias^6)
.84 = 1-(1-p_1day*bias)*(1-p_1day*bias^2)*(1-p_1day*bias^3)*...*(1-p_1day*bias^9)
as an example with estimates from the data for p_3day, etc., then use proportion to
estimate the variance of p_3days, p_6days and p_9days and apply the algebraic solution
(found from symbolic computation software like Maple or Matlab) using nlcom to get the
variance of the bias and p_1day? Depending on how I solve them I might have 2 or three
estimates for p_1day and bias, that I could then take the average of? I'm not sure but
it seems like nl or nlsur would be much easier.
Any help on this would be greatly appreciated.
Matt Hurst
Public Health Agency of Canada
*Example code using Stata 13.1
*Make a simulated data set
clear
set obs 10000
*parameters
local p = .5
local b1= .9
local b2= .7
*make variables
*Period 1. Can be thought of as days 1 to 6
forvalues i=1/6 {
capture drop d`i' b`i' p`i'
gen d`i' = 0
gen b`i' = 0
gen p`i' = 0
replace p`i' = 1 if uniform() <= `p'
replace b`i' = 1 if uniform() <= `b1'^`i'
replace d`i' = p`i'*b`i'
}
*Period 2. Can be thought of as days 7 to 9
*Note use of a b2 instead of b1
forvalues i=7/9 {
capture drop d`i' b`i' p`i'
gen d`i' = 0
gen b`i' = 0
gen p`i' = 0
replace p`i' = 1 if uniform() <= `p'
replace b`i' = 1 if uniform() <= `b2'^`i'
replace d`i' = p`i'*b`i'
}
capture drop d1_3 d1_6 d1_9
gen double d1_3 = 0
replace d1_3 = 1 if d1+d2+d3 >= 1
gen double d1_6 = 0
replace d1_6 = 1 if d1+d2+d3+d4+d5+d6 >= 1
gen double d1_9 = 0
replace d1_9 = 1 if d1+d2+d3+d4+d5+d6+d7+d8+d9 >= 1
sum d*
*Overall mean values, to verify what we should roughly get
capture drop p_tot
gen double p_tot = (p1+p2+p3+p4+p5+p6+p7+p8+p9)/9
mean b*
matrix A = e(b)
scalar A1 = A[1,1]
scalar A2 = A[1,2]
scalar A3 = A[1,3]
scalar A4 = A[1,4]
scalar A5 = A[1,5]
scalar A6 = A[1,6]
scalar A7 = A[1,7]
scalar A8 = A[1,8]
scalar A9 = A[1,9]
capture drop b1_tot
gen double b1_tot = (A1+A2^(1/2)+A3^(1/3)+A4^(1/4)+A5^(1/5)+A6^(1/6))/6
capture drop b2_tot
gen double b2_tot = (A7^(1/7)+A8^(1/8)+A9^(1/9))/3
sum b?_tot
*Example solving for p=p_1day, b1=bias in period 1, b2=bias in period 2
*Example that works. Repeated measure data.
capture program drop nlsursolver_mh3_6_9
program nlsursolver_mh3_6_9
syntax varlist(min=1 max=3) [if], at(name)
local y1 : word 1 of `varlist'
local y2 : word 2 of `varlist'
local y3 : word 3 of `varlist'
tempname p b1 b2
scalar `p' = `at'[1,1]
scalar `b1' = `at'[1,2]
scalar `b2' = `at'[1,3]
replace `y1' = 1-(1-`p'*`b1')*(1-`p'*`b1'^2)*(1-`p'*`b1'^3)
replace `y2' = 1-(1-`p'*`b1')*(1-`p'*`b1'^2)*(1-`p'*`b1'^3)*(1-`p'*`b1'^4)*(1-`p'*`b1'^5)*(1-`p'*`b1'^6)
replace `y3' = 1-(1-`p'*`b1')*(1-`p'*`b1'^2)*(1-`p'*`b1'^3)*(1-`p'*`b1'^4)*(1-`p'*`b1'^5)*(1-`p'*`b1'^6)*(1-`p'*`b2'^7)*(1-`p'*`b2'^8)*(1-`p'*`b2'^9)
end
*Use nlsur to solve the system of equations in the evaluator program
nlsur solver_mh3_6_9 @ d1_3 d1_6 d1_9, parameters(p b1 b2) initial(p 0.4 b1 0.4 b2 0.4) nequations(3)
*Usually works very well.
*Convergence is acheived and estimates are correct.
*The data I actually get is from independent samples.
*So, to similuate that, I have created dlong and stack_ind
*Note, the fact that they all have 10,000 observations is purely
*coincidental. The real data could have 400 observations, or 600
*for instance for the variables d1_3, d1_6, d1_9.
capture drop dlong
set obs 30000
gen double dlong = d1_3
replace dlong = d1_6[_n-10000] if _n >= 10001
replace dlong = d1_9[_n-20000] if _n >= 20001
capture drop stack_ind
gen byte stack_ind = 1
replace stack_ind = 2 if _n >= 10001
replace stack_ind = 3 if _n >= 20001
*Now make separate variables for d1_3 d1_6 d1_9
capture drop dlong_1_3
gen double dlong_1_3 = d1_3
capture drop dlong_1_6
gen double dlong_1_6 = dlong if stack_ind == 2
capture drop dlong_1_9
gen double dlong_1_9 = dlong if stack_ind == 3
*verification
tab dlong stack_ind, col
*From here I don't know what command to use or what variables to use
*I tried using nlsur with 3 uncorrelated variables
nlsur solver_mh3_6_9 @ dlong_1_3 dlong_1_6 dlong_1_9, parameters(p b1 b2) initial(p 0.4 b1 0.4 b2 .4) nequations(3)
/* error text
(obs = 0)
cannot have fewer observations than parameters
r(2001);
*/
*I also tried nlsur using 1 variable.
*Note use of in statement at end of equation evaluation line
capture program drop nlsursolver_mh3_6_9_long
program nlsursolver_mh3_6_9_long
syntax varlist(min=1 max=3) [if], at(name)
local y1 : word 1 of `varlist'
local y2 : word 2 of `varlist'
local y3 : word 3 of `varlist'
tempname p b1 b2
scalar `p' = `at'[1,1]
scalar `b1' = `at'[1,2]
scalar `b2' = `at'[1,3]
replace `y1' = 1-(1-`p'*`b1')*(1-`p'*`b1'^2)*(1-`p'*`b1'^3)*(1-`p'*`b1'^4)*(1-`p'*`b1'^5)*(1-`p'*`b1'^6) in 1/10000
replace `y2' = 1-(1-`p'*`b1')*(1-`p'*`b1'^2)*(1-`p'*`b1'^3)*(1-`p'*`b1'^4)*(1-`p'*`b1'^5)*(1-`p'*`b1'^6) in 10001/20000
replace `y3' = 1-(1-`p'*`b1')*(1-`p'*`b1'^2)*(1-`p'*`b1'^3)*(1-`p'*`b1'^4)*(1-`p'*`b1'^5)*(1-`p'*`b1'^6)*(1-`p'*`b2'^7)*(1-`p'*`b2'^8)*(1-`p'*`b2'^9) in 20001/30000
end
nlsur solver_mh3_6_9_long @ dlong dlong dlong, parameters(p b1 b2) initial(p 0.5 b1 0.5 b2 0.5) nequations(3)
/* error
(obs = 30000)
Calculating NLS estimates...
(10000 real changes made)
(10000 real changes made)
(10000 real changes made)
could not evaluate equation 1
starting values invalid or some RHS variables have missing values
r(480);
*/
*Then I thought maybe there is a way to use nl
capture program drop nlsolver_mh3_6_9_long
program nlsolver_mh3_6_9_long
syntax varlist(min=1 max=1) [if], at(name)
local y1 : word 1 of `varlist'
tempname p b1 b2
scalar `p' = `at'[1,1]
scalar `b1' = `at'[1,2]
scalar `b2' = `at'[1,3]
replace `y1' = 1-(1-`p'*`b1')*(1-`p'*`b1'^2)*(1-`p'*`b1'^3)*(1-`p'*`b1'^4)*(1-`p'*`b1'^5)*(1-`p'*`b1'^6) in 1/10000
replace `y1' = 1-(1-`p'*`b1')*(1-`p'*`b1'^2)*(1-`p'*`b1'^3)*(1-`p'*`b1'^4)*(1-`p'*`b1'^5)*(1-`p'*`b1'^6) in 10001/20000
replace `y1' = 1-(1-`p'*`b1')*(1-`p'*`b1'^2)*(1-`p'*`b1'^3)*(1-`p'*`b1'^4)*(1-`p'*`b1'^5)*(1-`p'*`b1'^6)*(1-`p'*`b2'^7)*(1-`p'*`b2'^8)*(1-`p'*`b2'^9) in 20001/30000
end
nl solver_mh3_6_9_long @ dlong, parameters(p b1 b2) initial(p 0.4 b1 0.4 b2 0.4)
*This runs, but two coefficients are not in the right range and
*model blew up for estimating the variance.
*Forcing the b2 term to be 0 and 1:
capture program drop nlsolver_mh3_6_9_long
program nlsolver_mh3_6_9_long
syntax varlist(min=1 max=1) [if], at(name)
local y1 : word 1 of `varlist'
tempname p b1 b2 bnew
scalar `p' = `at'[1,1]
scalar `b1' = `at'[1,2]
scalar `b2' = `at'[1,3]
scalar `bnew' = 1/(1+exp(`b2'))
replace `y1' = 1-(1-`p'*`b1')*(1-`p'*`b1'^2)*(1-`p'*`b1'^3)*(1-`p'*`b1'^4)*(1-`p'*`b1'^5)*(1-`p'*`b1'^6) in 1/10000
replace `y1' = 1-(1-`p'*`b1')*(1-`p'*`b1'^2)*(1-`p'*`b1'^3)*(1-`p'*`b1'^4)*(1-`p'*`b1'^5)*(1-`p'*`b1'^6) in 10001/20000
replace `y1' = 1-(1-`p'*`b1')*(1-`p'*`b1'^2)*(1-`p'*`b1'^3)*(1-`p'*`b1'^4)*(1-`p'*`b1'^5)*(1-`p'*`b1'^6)*(1-`p'*`bnew'^7)*(1-`p'*`bnew'^8)*(1-`p'*`bnew'^9) in 20001/30000
end
nl solver_mh3_6_9_long @ dlong, parameters(p b1 b2) initial(p 0.4 b1 0.4 b2 0.4)
*I'm just getting the boundary of 0, because it once it to be negative
*so I don't think I'm solving what I'm intending to solve here
↧