Dropping obs with string variables of value "0" / converting string to numeric when string contains "X"

March 15, 2019, 4:54 am

≫ Next: Forval and In not working together.

≪ Previous: Postestimation following melogit

Hello,

I am trying to drop all observations in my dataset that have the value "0" in variable "occsoc". Unfortunately, the command

drop if occsoc=0

is returning 'type mismatch'.

This may be because the variable has leading blanks. So to play it safe I created an identical variable 'occsoc_num' and removed the leading blanks with strtrim(occsoc_num). But the command

drop if occsoc_num=0

is still returning 'type mismatch'.

Please see my screenshot below (I know we are not supposed to post screenshots in the forum, but I wasn't able to make dataex work for this example):

Array
I think the 'type mismatch' error may have something to do with the fact that occsoc (and therefore occsoc_num) are string variables. However, these variables contain 'X' in addition to numbers (see data extract below). Can I destring a variable that includes the character X? Numerically, all X's could be replaced with a 0.

See example below:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str6 occsoc
"     0"
"     0"
"436010"
"173020"
"     0"
"     0"
"     0"
"436010"
"     0"
"471011"
"395012"
"311010"
"319091"
"252020"
"513020"
"412010"
"     0"
"     0"
"     0"
"434051"
"     0"
"     0"
"     0"
"491011"
"212011"
"     0"
"537064"
"     0"
"372012"
"499071"
"     0"
"     0"
"     0"
"373010"
"514XXX"
"436010"
"     0"
"113031"
"232011"
"399011"
"434181"
"351012"
"372012"
"     0"
"332011"
"252020"
"     0"
"     0"
"37201X"
"37201X"
"     0"
"     0"
"472111"
"436010"
"37201X"
"412010"
"212011"
"352010"
"     0"
"     0"
"372012"
"435081"
"     0"
"     0"
"     0"
"292021"
"519196"
"411011"
"     0"
"131070"
"151122"
"     0"
"     0"
"     0"
"     0"
"516063"
"311010"
"     0"
"352010"
"     0"
"471011"
"     0"
"471011"
"273043"
"     0"
"     0"
"412010"
"411011"
"     0"
"434051"
"291141"
"151150"
"291060"
"253000"
"     0"
"     0"
"     0"
"111021"
"395012"
"     0"
end

Best of thanks for your consideration.

↧

Forval and In not working together.

March 15, 2019, 1:39 pm

≫ Next: 🙋 Equivalent of mat_capp in Mata

≪ Previous: Dropping obs with string variables of value "0" / converting string to numeric when string contains "X"

I'm using Stata 14

I'm trying to put the ORs from a regression into successive rows of the variable test1 and test2. I have 1500 observations in my data set. I will eventually want to change the number of iterations from 5 to a large number.
The correct values end up in the test1 and test2 from the matrix log_tbl but they do not end up in the 1st 5 rows of test1 and test2. They do show up on the same row but the rows seems to be randomly selected. If I run the for loop again, the OR end up on different rows.
The disp shows up as 1,2,3,4,5 as expected.

In the second for loop below, the values in the 1st 5 rows are 1 to 5 just as I'd expect.
And if I enter replace test1 = log_tbl[1,1] in 1 on the command line, it shows up in the first line.

Is this a known bug, am I missing something obvious?
Thanks

Code:

gen test1 = .
gen test2 = .

forval w = 1/5 {    
    bsample 100 if strats == 2 & sex == 0, strata(strats) weight(Qwts)
    quiet replace Qwts = 1 if strats == 0 
    quiet logit outcome i.exposure c.age28 c.age29 i.urban_rural ///
        if sex == 0 & (Q_posneg == 0 | Q_posneg == .) ///
        [pweight=Qwts], iterate(50) vce(robust) or 
    matrix logist_tbl = r(table)' 
    * grabs 2nd and 3rd OR, won't need 3rd row for diab. 
    matselrc logist_tbl log_tbl, r(2,3) c(1)  
    quiet replace test1 = log_tbl[1,1] in `w'
    quiet replace test2 = log_tbl[2,1] in `w'    
    disp `w'    
}

forval w = 1/5 {    
    replace test1 = `w' in `w'
    replace test2 = `w' in `w'
}

↧

🙋 Equivalent of mat_capp in Mata

March 15, 2019, 2:12 pm

≫ Next: Subgroup analysis using mixlogit for DCE

≪ Previous: Forval and In not working together.

I am looking for the equivalent of mat_capp/mat_rapp commands in Mata.
Specifically, I need to perform the same operations on string matrices which are not possible in the ado-language of Stata.

↧

Subgroup analysis using mixlogit for DCE

March 15, 2019, 2:31 pm

≫ Next: Store a new variable with unique observations and missing otherwise

≪ Previous: 🙋 Equivalent of mat_capp in Mata

Hi does anyone knows how to compare if 2 subgroup models are significantly different if they are estimated with mixlogit?

When using clogit to look at 2 subgroups, it appears that we can use the suest command to test if the parameters are significantly different in the 2 subgroups. But the suest command doesn't work with mixlogits. Is there a similar test that can be used for mixlogit estimations?

↧

Store a new variable with unique observations and missing otherwise

March 15, 2019, 2:38 pm

≫ Next: Extract different parts from one variable

≪ Previous: Subgroup analysis using mixlogit for DCE

Hello,

Can you please help me with the following issue: I want to compute returns (ret_eom) based on two conditions, i.e. for each monthly observation (mo_dates) in my sample and for each decile based on the credit rating (decile_cr_monthly).

I computed the returns using the following command:
bysort mo_dates decile_cr_monthly : egen ptf_ret_crat=mean(ret_eom)

Naturally, the variable generated ptf_ret_crat has the same number of observations as the original sample (containing many duplicates).
Is there a way to generate a new variable satisfying these conditions and having only the unique observations?

I was able to identify the number of unique observations using the following code:

by ptf_ret_crat, sort: gen nvals = _n == 1
replace nvals = sum(nvals)
replace nvals = nvals[_N]

So for example, if I have 12 months and 10 deciles, the new variable should have 120 observations (which is also what nvals retrieves).
Is there a way to store a new variable containing only these unique observations, while sorting them based on the date and rating?

Thank you for your help!

↧

Extract different parts from one variable

March 15, 2019, 4:16 pm

≫ Next: conditional ATE in treatment effects estimation

≪ Previous: Store a new variable with unique observations and missing otherwise

Dear experts,

I have a table like this:

Year Violation Year

2010 2005,2006,2007,2008,2009,2010

Both "Year"and "Violation Year" are string. How can I extract the multiple years in "Violation Year" seperately to make the below happen?

Year Violation Year

2010 2005
2010 2006
2010 2007
2010 2008
2010 2009
2010 2010

Thank you so much!

↧

conditional ATE in treatment effects estimation

March 15, 2019, 6:43 pm

≫ Next: foreach issue

≪ Previous: Extract different parts from one variable

How can one obtain the average treatment effect (ATE) or average treatment effect on the treated (ATET) in "teffects" estimation (for example using IPWRA estimator) conditional on one of the binary variables in the outcome or treatment models; i.e. I would like to compute ATE for male and female subjects separately.

↧

foreach issue

March 15, 2019, 7:17 pm

≫ Next: runby/rangestat for the loop problem?

≪ Previous: conditional ATE in treatment effects estimation

Hi STATA users,

I have a data set and code as attached.

I want to create new columns with a prefix nmedi for the list of variables using a condition. But the code does not work. Could you help me to identify the problem? Thank you.

----------------------- copy starting from the next line -----------------------

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input double maindemographicid long(visitdate2 visitdate3 visitdate4 visitdate5 visitdate6 visitdate7 visitdate11 visitdate12 visitdate13 visitdate14) float(medi medi_start_date medi_end_date)
12 19403 19415 19436 19492 19583 19795     . 20159 . 20549 0 . .
15 19424 19430 19450 19520 19613 19795 19963 20255 .     . 0 . .
16 19424 19430 19458 19520 19613 19786     . 20150 . 20521 0 . .
17 19459 19466 19494 19543 19627 19823     .     . .     . 0 . .
18 19508 19513 19536 19599 19697 19872     .     . .     . 0 . .
end
format %d visitdate2
format %d visitdate3
format %d visitdate4
format %d visitdate5
format %d visitdate6
format %d visitdate7
format %d visitdate11
format %d visitdate12
format %d visitdate13
format %d visitdate14
format %td medi_start_date
format %td medi_end_date


foreach each var of varlist visitdate2 - visitdate14 {

gen nmedi_`var' = 1 if inlist(medi, 1,2,3,4,9) & medi_start_date < `var'  & `var' < medi_end_date

replace nmedi_`var' = 2 if inlist(medi, 5, 6, 7, 8) & medi_start_date < `var' & `var' < medi_end_date

}

------------------ copy up to and including the previous line ------------------

↧

runby/rangestat for the loop problem?

March 15, 2019, 8:04 pm

≫ Next: Breusch Pegan LM test prov-value 1.0000

≪ Previous: foreach issue

Dear All, Is it possible to use runby/rangestat (or others) to speed up the following loop? Note, you have to ssc install tuples. (The real problem is more complicated than the example!)

Code:

sysuse auto, clear
tuples headroom trunk length price rep78 weight length turn displacement gear_ratio mpg
forval i = 1/`ntuples' {
  logit foreign `tuple`i''
}

↧

Breusch Pegan LM test prov-value 1.0000

March 15, 2019, 8:53 pm

≫ Next: query regarding log difference

≪ Previous: runby/rangestat for the loop problem?

Dear forum fellows,

I would appreciate if you please give me your expert opinion on one of the problem I am facing.
I have 9 Independent variables, 13 countries, 25 years and unbalanced panel data.
While using LM test to decide in between OLS or Random effect model, I am getting prob-value 1.0000. How should I deal with this issue. I have also identified that both OLS and Random results are identical.

Looking for your best comments. appreciate if you please explain how I should I proceed further.

These are test results

xttest0

Breusch and Pagan Lagrangian multiplier test for random effects

bond2gdp[code,t] = Xb + u[code] + e[code,t]

Estimated results:
| Var sd = sqrt(Var)
---------+-----------------------------
bond2gdp | 7.93203 2.816386
e | 3.409645 1.846523
u | 0 0

Test: Var(u) = 0
chibar2(01) = 0.00
Prob > chibar2 = 1.0000

Thanking you

↧

query regarding log difference

March 15, 2019, 8:54 pm

≫ Next: Adding up coefficient estimates and the standard errors after running the regression

≪ Previous: Breusch Pegan LM test prov-value 1.0000

I have constructed a variable later to convert it into growth variable (as the variable is equal to percentage change in that variable) I have to take log difference . My question is my data is unbalanced panel data
1.log difference will provide same interpretation here ?
2 What if i take simple log of that variable as after implying difference most of the coefficients are insignificant and resulted signs are against the predicted one .
3. is there any other way to create a growth variable?

Regards

↧

Adding up coefficient estimates and the standard errors after running the regression

March 15, 2019, 9:03 pm

≫ Next: Finding the minimum of difference between all observations by group

≪ Previous: query regarding log difference

Hi,

I have ran a regression with interaction variables as shown below and received the output where δ1=0.179, δ2=-0.046, δ3=-0.118 and δ4=-0.158. I would like to add the coefficients (which can be done manually). However, how do I go about adding the clustered standard errors? Is there any function/command in Stata which adds up the coefficients and standard errors? I do know that lincom can be used. However, what about for multiple regressions (controlling for different variables) which requires the addition of the same coefficients as well?
Array
Array

Thank you.

↧

Finding the minimum of difference between all observations by group

March 15, 2019, 9:37 pm

≫ Next: A series of problem

≪ Previous: Adding up coefficient estimates and the standard errors after running the regression

Hi, everybody,

I am trying to calculate the minimum of differences between all observations by group.

For example, I have this data,

obs	var	group
1	1	1
2	3	1
3	6	1
4	2	2
5	1	2
6	2	2
7	3	2

and I want,

obs	var	group	Want
1	1	1	2
2	3	1	2
3	6	1	3
4	2	2	0
5	1	2	1
6	2	2	0
7	3	2	1

For observation 1, it belongs to group 1. So the difference between obs 1 and obs 2 is 2, between obs 1 and obs 3 is 5. So the min diff for obs 1 is 2. I am thinking of using a loop, the logic is within each group, for observation i, calculate the diff between i and all but i, and return the min of the diffs. And move to the next i until it is the last observation in that group. But I am not sure how to put it in Stata code. Please help, thank you so much!

Kevin

↧

A series of problem

March 16, 2019, 1:50 am

≫ Next: Independent component Analysis (ICA)

≪ Previous: Finding the minimum of difference between all observations by group

(a) Run the regression with all years
(c) Test if the coefficients on on-base percentage and slugging are equal. (d) Rerun the regression for each year separately saving the coefficients on OBP and SLG for each year.
I am little confused about the regress command, is it can only regress one variable at a time? It doesn't work when I tried to regress (all variable names isolated with space) but it doesn't work.

↧

Independent component Analysis (ICA)

March 16, 2019, 3:20 am

≫ Next: sensitivity analysis

≪ Previous: A series of problem

Dear all,
i'd like t work on financial data and using arbitrage pricing model by the ICA algorithm. i need your help on stata software. thanks

↧

sensitivity analysis

March 16, 2019, 5:11 am

≫ Next: Difference in differences with changing sample

≪ Previous: Independent component Analysis (ICA)

Hello everyone,
I do not know how to do sensitivity test or analysis in stata
could any one explain to me how to do it ?
thanks

↧

Difference in differences with changing sample

March 16, 2019, 12:45 pm

≫ Next: Multi level modelling cross level analysis

≪ Previous: sensitivity analysis

Dear All,

I am working on a quasi-experimental study with a large unbalanced panel dataset. This is the specification that I use.

Y_it= β_1 Post_it + δi + γt+ u_it where Post_it is the value of treatment for individual i at week t, and δi and γt are individual and time fixed effect parameters that are estimated.

Here comes my concern. When t =1, I have only 10.000 individuals in sample, and the number gradually increases to 50.000 over time. That is to say, many individuals only have observations in later periods of the sample. Is there an issue if I use all observations to estimate the equation? The value of dependent variable is decaying over time and therefore using calendar time fixed effects might not be enough. Do you have any suggestions?

↧

Multi level modelling cross level analysis

March 16, 2019, 12:58 pm

≫ Next: Covariance matrix of residuals in VAR model

≪ Previous: Difference in differences with changing sample

I want to examine the effect of education(xvar) on prejudice(yvar) (both 1. level variables) moderatet by current regimetype in the country(xvar) (2. level variable).

I am using the most recent data from the World Value Survey to examine the effects of the interaction, my problem is the following:

In order to assign every country a regimescore I have used the Freedomhouse reports from 2012/13 to generate a new "regimetype" variable, and assigned every observation in the data set with the appropriate score (from 1 to 7). My problem is then: how do i aggregate the scores from each observation, to make it a group level variable? Or is it possible just to run the mixed command like
- mixed prejudice(yvar) c.education(xvar)##c.regimetype(xvar2)||regimetype :

I hope the question makes sense.

↧

Covariance matrix of residuals in VAR model

March 16, 2019, 1:32 pm

≫ Next: Merging Datasets -- Variable Responses Deleted

≪ Previous: Multi level modelling cross level analysis

I am estimating a VAR model in STATA. The model is as follows:

Code:

var potgap defl interest, noconstant lags(1/4)

I then output the covariance matrix of the residuals as:

Code:

matrix list e(Sigma)

So far, no problem. But then I try to calculate the covariance matrix more manually using the following code:

Code:

predict res1, residuals equation(#1)
predict res2, residuals equation(#2)
predict res3, residuals equation(#3)
mat accum cov = res1 res2 res3, noconstant deviations
mat cov = cov/274
mat list cov

When doing this, I get a slightly different matrix. The matrix is not very different, but the results vary on the 2nd or 3rd decimal.

I have tried changing the divisor if STATA where to use some different degree of freedom correction, but this just makes the results even more incorrect.

Does anyone know what is causing the difference? Thanks in advance!

↧

Merging Datasets -- Variable Responses Deleted

March 16, 2019, 1:55 pm

≫ Next: Estimating the significance of fixed effects with clustered standard errors

≪ Previous: Covariance matrix of residuals in VAR model

Hi all,

I am merging a number of different datasets and am having trouble with the merge command "deleting" cell values.

My starting panel dataset looks like this:

Code:

county year var1 var2
  1      1   25   30
  1      2   35   22
  1      3   40   50
  2      1   34   30
  2      2   45   44
  2      3   56   21
  3      1   23   12

I wish to merge it with a dataset that looks like this (i.e., adding two additional variables to the dataset):

Code:

county year var3 var4
  1      1   27   28
  1      2   34   32
  1      3   43   42
  2      1   32   33
  2      2   43   24
  2      3   67   23

However, when I use the merge command Stata "deletes" the original variables (var1 and var2) and turns them into missing values.

The dataset ends up looking like this:

Code:

county year var1 var2 var3 var4
  1      1   .    .   27   28
  1      2   .    .   34   32
  1      3   .    .   43   42
  2      1   .    .   32   33
  2      2   .    .   43   24
  2      3   .    .   67   23

Here's the basic code I'm using. Sometimes the problem doesn't occur for the first dataset I'm merging and then the deletion happens for the remaining datasets I'm attempting to merge.

Code:

merge m:1 county year using "X.dta"

Any help would be much appreciated. Thank you!
Matt

↧