Repeated measures from survey-based scenarios

October 13, 2019, 8:33 am

≫ Next: how to update the list of global macro within foreach loop?

≪ Previous: Meta Analysis - Data input problem

Hi everyone,

This is my first post, so I hope I am asking the question correctly and this hasn't been addressed elsewhere on the forum :-)

My dataset (in wide format) comes from a survey of 500 people where participants were asked scenario-based questions and had to give binary answers. For example:

Q1(baseline): "If you share you earn $100, if you do not share you earn 50$. Do you share (1) or not share (0)?"
Q2 (incentive treatment): "If you share you earn $200, if you do not share you earn $50. Do you share (1) or not share (0)?"
Q3 (social norms treatment): "If you share you earn $200, if you do not share you earn $50. Most people in your situation would share. Do you share (1) or not share (0)?"

My outcome of interest is the proportion of people who share. Using Q1 as a baseline, I'd like to know whether my incentive or social norms treatment increase sharing, and which one has a stronger effect.

Any suggestion will be greatly appreciated.

Thank you,

Will

↧

how to update the list of global macro within foreach loop?

October 13, 2019, 10:37 am

≫ Next: Lasso with Instrumental Variables - tests of weak instruments

≪ Previous: Repeated measures from survey-based scenarios

Hello,
I am trying to run the code below, and my ultimate goal is to draw the scatter plot of the macros AUC vs. Alpha.
The AUC seem to be okay, but when I do ' list Alpha ' after running my code, the every value stored under the macro Alpha is 5, where as I want Alpha to be the list of same values used for `a1' (foreach a1 of numlist 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.5 5.0)

How can I resolve this issue? I am new to Stata so I have lots of questions.

Thank you,

STATA Code:

gen Alpha = .
gen AUC = .
local i = 1

foreach a1 of numlist 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.5 5.0 {

python: a = float(Macro.getLocal("a1"))

// predict using the best value for alpha
python: mnb = MultinomialNB(alpha = a, class_prior = None, fit_prior = True)

// calculate probability of each class on the test set
// '[:, 1]' at the end extracts the probability for each pharmacy to be under compliance
python: Y_mnb_score = mnb.fit(X_train, np.ravel(Y_train)).predict_proba(X_test)[:, 1]

// make test_compliance python variable
python: test_compliance = Y_test['compliance']

// transfer the python variables Y_mnb_score and test_compliance as global STATA variables
python: Data.setObsTotal(len(Y_mnb_score))
python: Data.addVarFloat('mnbScore')
python: Data.store(var = 'mnbScore', obs = None, val = Y_mnb_score)

python: Data.setObsTotal(len(test_compliance))
python: Data.addVarFloat('testCompliance')
python: Data.store(var = 'testCompliance', obs = None, val = test_compliance)

roctab testCompliance mnbScore
replace AUC = r(area) in `i'
replace Alpha = `a1' // this line is causing a problem

drop testCompliance mnbScore
local i = `++i'

}

scatter AUC Alpha xline(bestAlpha, lcolor(blue)) //ultimate goal

↧

Lasso with Instrumental Variables - tests of weak instruments

October 13, 2019, 10:48 am

≫ Next: drawing vertical line on a scatter plot

≪ Previous: how to update the list of global macro within foreach loop?

I am using lasso on Stata 16 to make inference from a model with two endogenous variables (college quality and college quality squared - where college quality is measured as continuous index). I have many possible instruments, but most of them are weak (as indicated by Kleinbergen-Paap wald rk F statistic reported from variations of models estimated using ivreg2). From reading the Stata Lasso manual (and learning a bit about lasso), it seems like lasso might be a good option for me to select instruments. However, I would still like to know generally "how strong" the selected instruments are. poivregress and xpoivregress do not return the same weak identification tests as ivreg2. Can anyone help me understand how to assess instrument strength using lasso? Thank you!

↧

drawing vertical line on a scatter plot

October 13, 2019, 11:00 am

≫ Next: Fuzzy match inconsistent identifiers in a panel

≪ Previous: Lasso with Instrumental Variables - tests of weak instruments

I am trying to run the following Stata command:

scatter AUC Alpha xline(bestAlpha, lcolor(blue))

, where bestAlpha is a global macro storing the value of 0.0.

When I just execute the command:

scatter AUC Alpha

everything works fine, but when I try to add a vertical line by adding 'xline(bestAlpha, lcolor(blue))' the program would result in error, and I am not sure why.

Just to provide more detail, when I do:

list bestAlpha

Stata gives me the list of length 124 with a value (=0) assigned at index 1, and the rest of the entries are just missing values

when I do:

display bestAlpha

however, Stata gives out its correct value 0.

What should I do to fix my error? The error message is:

Invalid varlist

Thank you,

↧

Fuzzy match inconsistent identifiers in a panel

October 13, 2019, 11:11 am

≫ Next: IV regression with fixed effects; Warning: variance matrix is non symmetric or highly singular

≪ Previous: drawing vertical line on a scatter plot

Dear Statalist,

I've been working on what I feel like should be a common problem but I can't find any threads on it. I'm trying to create consistent identifiers for a panel dataset of cities that have slight spelling differences over time using some kind of fuzzy matching algorithm. A very simple example is below, in my real data, I have many hundreds of place names spelled differently over 30 years, so manual fixes are infeasible. I know there are not multiple observations of the same place for the same year. The end result I'm looking for is to get a third variable that is some numeric place_id that would consistently identify Chicago and New York in the example below. If anyone has experienced this problem before and could just give a general overview of their workflow, it would be much appreciated, I don't necessarily need specific code. I've been using reclink from ssc, but I'm open to anything. Let me know if I've left out important details or can give more background. Thanks!

Code:

clear
input str20 place_name year
"chicago" 1990
"chicag" 1991
"chicago" 1992
"new york" 1990
"new york city" 1991
"new york c" 1992
end

↧

IV regression with fixed effects; Warning: variance matrix is non symmetric or highly singular

October 13, 2019, 11:12 am

≫ Next: Maximize "positivity threshold" of symmetric ROC curve given a set of input values for diagnostic testing

≪ Previous: Fuzzy match inconsistent identifiers in a panel

Dear Statalist experts

I am running an IV regression with a dummy dependent variable and primary school fixed effects in Stata 16. The regression has the form: ivregress 2sls y (endogenousvar = instrumentalvar) $controls i.primaryschool, first cluster(primaryschool). I get the error Warning: variance matrix is non symmetric or highly singular. I tried to run the regressions with vce(robust), with cluster(primaryschool) and without specifying the standard errors but I always get the same error. Reading an older post in statalist I understood that the cause of the problem might be because of having many primary schools where only one observation attends. When I run the non-IV version of the same regression though I do not have this problem and the regression is run as normal. It is really important for my model to use both the IV and the fixed effects, any suggestion on how I can overcome this will be really appreciated.

Thanks a lot in advance
Konstantina

↧

Maximize "positivity threshold" of symmetric ROC curve given a set of input values for diagnostic testing

October 13, 2019, 11:16 am

≫ Next: Finding intersection of two lines with two y axes

≪ Previous: IV regression with fixed effects; Warning: variance matrix is non symmetric or highly singular

Statalist folks,

I am trying to code a function into Stata and need some assistance.

I have a diagnostic test with a symmetric ROC, parameterized as follows: (β=0 as the curve is symmetrical)

Δ=DOR+βS

Δ= logit(s(z)) - logit(1-c(z))

S= logit(s(z)) + logit(1-c(z))

DOR = Diagnostic Odds Ratio | z is the positivity threshold | c(z) is the specificity with threshold z | s(z) is the sensitivity at threshold z

I am trying to find a positivity threshold, z, that will maximize the test's value below:

I am trying to figure out a way to find the S and C (sensitivity and specificity) to maximize a value quantity, X:

X = max ( E2 - E1, E2- E3 , 0 )

E1 = 0
E2 = 0.7
E3 = -0.3 * (1-C) + [(0.15 * S) - (0.45 (1-C)) ] * 0.25

Thanks very much for help!

↧

Finding intersection of two lines with two y axes

October 13, 2019, 12:11 pm

≫ Next: Geometric mean with zeros and negative values

≪ Previous: Maximize "positivity threshold" of symmetric ROC curve given a set of input values for diagnostic testing

I am trying to find the intersection of two lines (of course I am just looking for the x coordinate that they share) but I have two y axes. I was able to use aaplot to get the equations for each line, but I am not sure where to go from here.

↧

Geometric mean with zeros and negative values

October 13, 2019, 12:50 pm

≫ Next: spatial autoregression (SAR) rho coefficient

≪ Previous: Finding intersection of two lines with two y axes

Hi,

I'm struggling with the geometric mean computation in the following case.
I need to create composite indexes based on the geometric (row) mean of multiple variables. The indexes are composed of a different number of variables, and the variables have different distribution.
I created a syntax following these steps:
1) standardization of the variables by generating a "modified z-scores" based on median absolute deviation (to minimize the impact of extreme values);
2) log transformation: store the sign of the values before the logarithmic transformation and log transform abs(`var'), adding 1 so it returns zeros when `var' == 0
3) exponentiate the arithmetic rowmean of the log transformed variables: store its sign, exponentiate it, substract 1, and restore its sign.

This syntax is:

//Step1 - standardization: compute "modified z-scores" (based on median absolute deviation to minimize the impact of extreme values)

Code:

foreach var of varlist v* {
 qui su `var', det
 gen double `var'_zsco = ((`var'-`r(p50)')/`r(p50)')* 0.6745
}

//Step 2 - logarithmic transformation

Code:

foreach var of varlist *zsco {
//store the sign of the values before the logarithmic transformation
  gen s_`var' = .
  replace s_`var' =  -1 if `var' < 0 & `var' != .
  replace s_`var' =   1 if `var' > 0 & `var' != .
      replace s_`var' =   1 if `var' == 0 & `var' != .  /*to avoir missing values for (zsco==0)*/

//logarithmic transformation of `var', adding 1 so it returns zeros when `var' == 0
  gen double i_`var' = ln(1+(abs(`var')))*s_`var'
}

//Step 3 - compute the arithmetic rowmean of the ln transformed variables and

Code:

egen double i_Mean = rmean(i_*)

foreach var of varlist i_Mean {
//store the sign of the values of var
  gen s_`var' = .
  replace s_`var' =  -1 if `var' < 0 & `var' != .
  replace s_`var' =   1 if `var' > 0 & `var' != .
  replace s_`var' =   1 if `var' == 0 & `var' != .
// exponentiate the arithmetic mean
  gen double exp_`var' = (exp(abs(`var')))-1
//restore the sign of var values
  replace exp_`var' = s_`var'*exp_`var'
}

I created an independent check for rows with positive z scores only (as the gmean() function for egen in egenmore (SSC) ignores zeros and negatives).
Taking for granted that step 1 is irrelevant for the actual problem, I simulated steps 2 and 3 on a previous exmaple provided by Nick (https://www.statalist.org/forums/for...62#post1360962)

I get very close values to what my syntax generate, but it is not an exact match (I get a .9948 correlation), and I just can't find why and where is my mistake.

All the values I get from my own Steps 2 and 3 slightly higher then the expected values.

//Generating example data

Code:

clear
set obs 10
set seed 2803
forval j = 1/5 {
      gen y`j' = ceil(100 * (runiform()^2))
}

list
     +-------------------------+
     | y1   y2   y3    y4   y5 |
     |-------------------------|
  1. | 86   63   45     8    1 |
  2. | 12   40   73   100    4 |
  3. | 60    1   74    61    4 |
  4. |  2    1    4     2   54 |
  5. | 12    1   22    22    4 |
     |-------------------------|
  6. |  1    7   15    84   14 |
  7. |  4    1   12    94    7 |
  8. | 40    2   15     2   89 |
  9. | 16   34   25     7    6 |
10. | 15    6    3    44    6 |
     +-------------------------+

//Generating expected gmean values

Code:

gen double M1 = y1

quietly forval j = 2/5 {
    replace M1 = M1 * y`j'
}

replace M1 = exp(log(M1)/5)

list

//independent check 2 proposed by Nick

Code:

matrix test = (86, 63, 45, 8, 1)
gen test = test[1, _n]
means test

egen gmean = mean(ln(test))
replace gmean = exp(gmean)


means test
    Variable |    Type             Obs        Mean       [95% Conf. Interval]
-------------+---------------------------------------------------------------
      test | Arithmetic            5        40.6       -4.225618   85.42562
             |  Geometric            5    18.11458        1.794746   182.8326
             |   Harmonic            5    4.256322               .          .
-----------------------------------------------------------------------------
Missing values in confidence intervals for harmonic mean indicate
that confidence interval is undefined for corresponding variables.
Consult Reference Manual for details.

//Applying my syntax
//Step 2 - log transformation

Code:

foreach var of varlist y* {
//store the sign of the values before the log transformation
  gen s_`var' = .
  replace s_`var' =  -1 if `var' < 0 & `var' != .
  replace s_`var' =   1 if `var' > 0 & `var' != .
  replace s_`var' =   1 if `var' == 0 & `var' != .  /*to avoid missing values when var ==0)*/

//log transformation of `var', adding 1 so it returns zeros when `var' == 0
gen double i_`var' = ln(1+(abs(`var')))*s_`var'
}

//Step 3 - compute the arithmetic rowmean of the ln transformed variables and

Code:

egen double i_Mean = rmean(i_*)

foreach var of varlist i_Mean {
//store the sign of the values of var
  gen s_`var' = .
  replace s_`var' =  -1 if `var' < 0  & `var' != .
  replace s_`var' =   1 if `var' > 0  & `var' != .
  replace s_`var' =   1 if `var' == 0 & `var' != .   /*to avoid missing values when var == 0*/
// exponentiate the arithmetic mean
  gen double exp_`var' = exp(abs(`var'))-1
//restore the sign of var values
  replace exp_`var' = s_`var'*exp_`var'
}


list y1 y2 y3 y4 y5 M1 exp_i_Mean

     +-------------------------------------------------+
     | y1   y2   y3    y4   y5          M1   exp_i_M~n |
     |-------------------------------------------------|
  1. | 86   63   45     8    1   18.114581   20.515226 |
  2. | 12   40   73   100    4   26.873536    27.83036 |
  3. | 60    1   74    61    4   16.104771    18.52345 |
  4. |  2    1    4     2   54   3.8663641   4.4817729 |
  5. | 12    1   22    22    4   7.4682237   8.2785434 |
     |-------------------------------------------------|
  6. |  1    7   15    84   14   10.430841   11.669224 |
  7. |  4    1   12    94    7   7.9413333    8.975884 |
  8. | 40    2   15     2   89   11.639123   12.966184 |
  9. | 16   34   25     7    6   14.169602    14.40053 |
 10. | 15    6    3    44    6   9.3453063    9.713163 |
     +-------------------------------------------------+

Any help figuring out where is my mistake would be very appreciated!
Best,
Martin

↧

spatial autoregression (SAR) rho coefficient

October 13, 2019, 1:12 pm

≫ Next: PPML index

≪ Previous: Geometric mean with zeros and negative values

Hi everyone,

I am running a spatial autoregressive model (SAR) with stata commands spregress and spmatrix (for my W). I am lagging only the dependent variable. For both row and minmax normalization, I am getting a rho coefficient outside of the acceptable range (12.9); with spectral normalization, rho is “normal” (within -1 to 1). Can someone please explain this to me? I don’t want to just use spectral normalization because it “looks right”.

Thanks a bunch!

↧

PPML index

October 13, 2019, 4:29 pm

≫ Next: Plotting confidence intervals for simple exponential smoothing forecast results

≪ Previous: spatial autoregression (SAR) rho coefficient

Hi,

I am using PPML for my gravity model.

One of my independent variables is an index. How can I interpret the value of its coefficient?

Regards,

Daniel

↧

Plotting confidence intervals for simple exponential smoothing forecast results

October 13, 2019, 6:24 pm

≫ Next: Error with levelsof but not levels command

≪ Previous: PPML index

Hello, everyone!

I am currently trying to visualise the results of a simple exponential smoothing forecast.

Plotting a simple time series line graph is not a problem.

But how can I draw confidence intervals like the one on the picture below:
Array

Any help will be much appreciated!

↧

Error with levelsof but not levels command

October 13, 2019, 7:17 pm

≫ Next: How to loop over a merge of inconsistent variables over time

≪ Previous: Plotting confidence intervals for simple exponential smoothing forecast results

Hi everyone,
Has anyone experienced errors with levelsof not present with levels commands? This curious case occurs with the following effort to find the value for _n at which rho is maximized or MAE is minimized under specific conditions. The `variables' list has "ihap irlx isad" and `panelrange' is "1/3". The following levels command works:

if "`Ecriterion'"=="rho" | "`Ecriterion'"=="mae" {
qui gen long obsno=_n
foreach var of varlist `variables' {
foreach id of numlist `panelrange' {
capture noisily {
quietly summarize SP_`Ecriterion'_`var'_ID if SP_id_`var'_ID==`id' & SP_d_`var'_ID==`diff', meanonly
if "`Ecriterion'"=="rho" loc a="max"
else if "`Ecriterion'"=="mae" loc a="min"
levels obsno if SP_`Ecriterion'_`var'_ID == r(`a') & SP_id_`var'_ID==`id' & SP_d_`var'_ID==`diff'
loc emax`var'`id'=SP_e_`var'_ID[`r(levels)']
}
}
}
}

The code is not perfect and could use a clean, but it works just fine. I have confirmed this by checking the contents of the `emax`var'`id'' macros. However, curiously, if the "levels" command is replaced with "levelsof", then the following errors result:

SP_e_ihap_ID not found
SP_e_ihap_ID not found
SP_e_ihap_ID not found
SP_e_irlx_ID not found
SP_e_irlx_ID not found
SP_e_irlx_ID not found
SP_e_isad_ID not found
SP_e_isad_ID not found
SP_e_isad_ID not found

This is very strange, I think, because the two commands are often interchangeable (and meant to be so)? If anyone has experienced anything similar or can diagnose the problem, please let me know! The gtools version of levelsof also produces the same errors.

Thanks for any input and time you can offer!
Mike

↧

How to loop over a merge of inconsistent variables over time

October 13, 2019, 7:21 pm

≫ Next: how to calculate cumulative observation for a certain time interval, like 10 years?

≪ Previous: Error with levelsof but not levels command

Dear StataList.

I need to merge two sets of data (partner + respondent) per wave for many waves. The complication is that while the control variables appear in each wave (though some begin in wave 2, not wave 1), some variables (of interest) appear irregularly and inconsistently (every 3 or 4 years). How can I deal with this merge and apply a loop for all waves? I read over https://www.statalist.org/forums/new-content/51 and @Cox and Kantor suggested there may be better options than the "if, then" command (which I attempted using without success below).
Note: this is a very large dataset (000s of variables) ...

Code:

*partner data
local p = substr("abcdefghijklmnopqrstuv",`wave',1)
if wave == d g j n {
use waveid age sex empstat educ inc marstat nlpreg workhr relb relimp relat using "c:/data/Combined_a170c.dta", clear
}else { if wave == a b c e f h i k l m o p q     // all waves a-q, excl d, g, j, n
use waveid age sex empstat educ inc marstat nlpreg workhr using "c:/data/Combined_a170c.dta", clear
}rename `p'* p_* // replace wave with partner data prefix
  rename waveid hhidsort hhpxid save "c:/data/temp", replace

*respondent data
local p = substr("abcdefghijklmnopqrstuv",`wave',1)
if wave == d g j n {
use waveid hhid age sex empstat educ inc marstat nlpreg workhr relb relimp relat using "c:/data/Combined_a170c.dta", clear
}else { if wave == a b c e f h i k l m o p q     // all waves a-q, excl d, g, j, n
use waveid hhid age sex empstat educ inc marstat nlpreg workhr using `readdatadir'/Combined_q170c.dta, clear
}rename `p'* *
drop if hhid=="" sort hhid

merge 1:1 hhid using "c:/data/temp", replace

// I cannot figure out how to then loop this for all other waves?

↧

how to calculate cumulative observation for a certain time interval, like 10 years?

October 13, 2019, 8:44 pm

≫ Next: Collinarity in Fracreg analysis and collin command

≪ Previous: How to loop over a merge of inconsistent variables over time

Hi Guys:
i am new here.
I have a dataset that records two states and if they had militarized disputes in a certain year. so what i want to do is to calculate up till a single year, how many militarized disputes they have had in the previous ten years (including this year).
I find this thread: https://www.statalist.org/forums/for...year-intervals. However, the rangestat command gives somewhat weird result. here is my command: rangestat (sum) y, interval(year 0 10) by(id). and attach my weird result. if anyone could tell me what's going wrong or how to fix it, i would appreciate it greatly!
Array

↧

Collinarity in Fracreg analysis and collin command

October 13, 2019, 8:47 pm

≫ Next: Exporting sts list data?

≪ Previous: how to calculate cumulative observation for a certain time interval, like 10 years?

Dear statalisters,

I am writing you because I am using fractional logistic regressions for my analysis and I need to calculate VIFs, but I read in a paper that "fractional logit regression, and negative binomial regression do not allow the estimation of VIF scores. Thus, we report the VIF scores obtained from estimating the models with ordinary least squares (OLS).".
Now, I was using the command

Code:

 collin

(I use stata 15) and the list of variables, but after reading that quote, I am not sure if I am doing right. could you help me please? am I right using the collin command? How does collin Works?
Thank you so much for any help.

Alejandro

↧

Exporting sts list data?

October 13, 2019, 9:31 pm

≫ Next: Labelling values by importing information from .csv file

≪ Previous: Collinarity in Fracreg analysis and collin command

Hi,

Does anyone know how to export sts list data using putdocx or other means -- without having to either manually copy it or create a whole new dataset? I'm trying to automate this process.

For example:
sts list, at(1 12 60 120) by(variable)

Thanks

↧

Labelling values by importing information from .csv file

October 13, 2019, 10:57 pm

≫ Next: Problems with destring CIK numbers

≪ Previous: Exporting sts list data?

Hello,
We collected survey data on over 12,000 respondents across several hundred districts. The districts, however, are in their district id format- a 3 digit number ranging from 000 to 729. Excerpt below:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input long UID int district
331785 557
331798 556
331799 566
331745 556
331811 564
331818 560
331803 549
331825 553
331824 556
331829 556
331832 566
331836 577
331837 567
331822 556
331839 573
331813 577
278950 546
278952 575
278965 575
278966 575
278971 575
278974 563
278982 575
278987 576
278990 547
278991 577
278979 562
279010 577
279012 576
305539 577
305507 562
end

I have, in a CSV, the corresponding district names for each of these district IDs. How can I encode these? Is there a way other than la define and typing all the information into a do file?

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input int district_id str26 district_name
  1 "Nicobar"                
  2 "North Middle Andaman"   
  3 "South Andaman"          
  4 "Anantapur"              
  5 "Chittoor"               
  6 "East Godavari"          
  7 "Guntur"                 
  8 "Kadapa"                 
  9 "Krishna"                
 10 "Kurnool"                
 11 "Nellore"                
 12 "Prakasam"               
 13 "Srikakulam"             
 14 "Visakhapatnam"          
 15 "Vizianagaram"           
 16 "West Godavari"          
 17 "Anjaw"                  
 18 "Central Siang"          
 19 "Changlang"              
 20 "Dibang Valley"          
 21 "East Kameng"            
 22 "East Siang"             
 23 "Kamle"                  
 24 "Kra Daadi"              
 25 "Kurung Kumey"           
 26 "Lepa Rada"              
 27 "Lohit"                  
 28 "Longding"               
 29 "Lower Dibang Valley"    
 30 "Lower Siang"            
 31 "Lower Subansiri"        
 32 "Namsai"                 
 33 "Pakke Kessang"          
 34 "Papum Pare"             
 35 "Shi Yomi"               
 36 "Tawang"                 
 37 "Tirap"                  
 38 "Upper Siang"            
 39 "Upper Subansiri"        
 40 "West Kameng"            
 41 "West Siang"             
 42 "Baksa"                  
 43 "Barpeta"                
 44 "Biswanath"              
 45 "Bongaigaon"             
 46 "Cachar"                 
 47 "Charaideo"              
 48 "Chirang"                
 49 "Darrang"                
 50 "Dhemaji"                
 51 "Dhubri"                 
 52 "Dibrugarh"              
 53 "Dima Hasao"             
 54 "Goalpara"               
 55 "Golaghat"               
 56 "Hailakandi"             
 57 "Hojai"                  
 58 "Jorhat"                 
 59 "Kamrup"                 
 60 "Kamrup Metropolitan"    
 61 "Karbi Anglong"          
 62 "Karimganj"              
 63 "Kokrajhar"              
 64 "Lakhimpur"              
 65 "Majuli"                 
 66 "Morigaon"               
 67 "Nagaon"                 
 68 "Nalbari"                
 69 "Sivasagar"              
 70 "Sonitpur"               
 71 "South Salmara-Mankachar"
 72 "Tinsukia"               
 73 "Udalguri"               
 74 "West Karbi Anglong"     
 75 "Araria"                 
 76 "Arwal"                  
 77 "Aurangabad"             
 78 "Banka"                  
 79 "Begusarai"              
 80 "Bhagalpur"              
 81 "Bhojpur"                
 82 "Buxar"                  
 83 "Darbhanga"              
 84 "East Champaran"         
 85 "Gaya"                   
 86 "Gopalganj"              
 87 "Jamui"                  
 88 "Jehanabad"              
 89 "Kaimur"                 
 90 "Katihar"                
 91 "Khagaria"               
 92 "Kishanganj"             
 93 "Lakhisarai"             
 94 "Madhepura"              
 95 "Madhubani"              
 96 "Munger"                 
 97 "Muzaffarpur"            
 98 "Nalanda"                
 99 "Nawada"                 
100 "Patna"                  
end

Thanks, would really appreciate help on this.

↧

Problems with destring CIK numbers

October 14, 2019, 1:53 am

≫ Next: Question on stcompet

≪ Previous: Labelling values by importing information from .csv file

Hi guys,

In a dataset, I have rows with fyear (fiscal year) and CIK numbers, which are company identifiers from Compustat.
See the code below.

The problem is that the CIK numbers are defined as string (str7). I need to change CIK to a numeric variable in order to merge this dataset with my other dataset, where the CIK variable is a numeric variable. I used gen nummericCIK = real(CIK) but what STATA then does is remove all the 0's in the CIK number. Where it is good that STATA removes the "first" 0's in the number, because my CIK numbers do not start with 0, it is wrong that STATA removes the 0's in the rest of the number.

For example, the first CIK number is 0912057. STATA should remove the first 0 here, but not the second zero.

I tried "replace CIK = subinstr(CIK, "0", 1)" and this works for removing the 0's; however, if I want to destring the variable then, STATA keeps giving me the error that there the variable contains nonnumeric characters.

Anyone who knows what to do?

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input int fyear str7 CIK
1993 "0912057"
1993 "0032377"
1993 "0950131"
1993 "0353944"
1993 "0038777"
1993 "0912057"
1993 "0912057"
1993 "0868016"
1993 "0950124"
1993 "0950123"
1993 "0950152"
1993 "0060302"
1993 "0051296"
1993 "0950131"
1993 "0891618"
1993 "0808450"
1993 "0096935"
1993 "0889810"
1993 "0912057"
1993 "0950131"
1993 "0912057"
1993 "0034501"
1993 "0898430"

↧

Question on stcompet

October 14, 2019, 2:14 am

≫ Next: Random (clustered) sampling without replacement keeping two strata population proportions

≪ Previous: Problems with destring CIK numbers

Dear Statalisters,

I have a question on the command stcompet.

I am considering the entry into a marriage or a dissolution from cohabitation, as
competing risks.

In order to do that, I stset my observations by considering marriage (uniontype==2) as the failure.

Then, I use stcompet to set the dissolution (compet1(1) ) as the competing risk.

This is the code:

stset time, failure(uniontype==2) id(pidnew)
drop cif*
stcompet cif=ci , compet1(1) by(wave5) level(90)

I would like to understand how censored cases are treated.

1) Are they considered in the computation?
2) Should I specify a different competing risk (e.g. compet2(0)) for the censored individuals?

Potential code:

stset time, failure(uniontype==2) id(pidnew)
drop cif*
stcompet cif=ci , compet1(1) compet2(0) by(wave5) level(90)

I would like to understand if you have suggestion on the correct way of proceeding, since I have slightly different curves depending on whether I
am setting censored as competing risks or not.

Thank you and best,
Lydia

↧