Dominance Analysis - comparing between (not within) models and relative importance determination for systems GMM

October 27, 2023, 12:10 pm

≫ Next: Delimiting Mata code with mata: ... end vs. mata { ... }

≪ Previous: Exporting with .pdf changes the font for graphs

Hello,

My post concerns the application of dominance analysis using Stata's -domin- command. I learned about it here.

Issue #1

I am using panel data on 56 countries over five years. I had initially intended to apply dominance analysis to a model that compares the relative importance of two independent variables - I'll call them Most Interesting Variable 1 (MIV1) and Most Interesting Variable 2 (MIV2). However, due to the highly correlated nature of MIV1 and MIV2, including them in the same regression is problematic.

So I have decided to split the model in two. In other words, I run a model as shown in eq(1) and a model as shown in eq(2), where X_jtis a vector of country-level macroeconomic factors. The regression is run using OLS. Country and year fixed effects are included and standard errors are clustered at the country-level. The only difference between these two models is that eq1 includes MIV1, while eq2 includes MIV2. The panel is balanced in both cases.

Y_jt= β₀ + β₁MIV1_jt + β₂X_jt + δ_t + α_j + u_it (Eq1)

Y_jt= β₀ + β₁MIV2_jt + β₂X_jt + δ_t + α_j + u_it (Eq2)

I will then run -domin- for each model (I plan on renting online space to run the models because of my inclusion of fixed effects will probably make this take very long). Because I am not interested in determining relative importance within each model for its own sake, but rather, interested in understanding the relative importance of MIV1 and MIV2, which are in separate models, I was planning to compare the standardised general dominance statistic for MIV1 from eq1 and the standardised dominance statistic for MIV2 from eq2 and to use their respective contributions to the within R-squared of each model to inform an assessment of their relative importance.

Question: Is it appropriate to compare the standardised general dominance statistics on two variables from different regressions, where these different regressions are identical in every respect except that one regression includes MIV1 and one includes MIV2? If not, and one just obtains coefficients from MIV2 and MIV2 as outputs for estimating two separate models, are there commonly accepted methods for comparing the relative importance of the coefficients across models? Simply comparing the size and statistical significance of MIV1 and MIV2, or the within R-squares of each model seems a bit naive.

Issue #2:

My dependent variable Y_jtis serially correlated, which means the fixed effects OLS estimates are biased.

Question: If it is okay to use standardised general dominance statistics to compare the independent variables of interest between two models, and if one's research question is about determining relative importance between two independent variables using generalised dominance statistics, does the bias matter that much if this bias is 'consistent' between the two models?

Issue #3

Because Y_jtis serially correlated, in addition to the static specification, I'd also like to run a specification that includes the lag of the dependent variable. For this, I have determined that systems GMM using -xtabond2- is most appropriate given my short T and N>T. However, to my understanding, this throws out the use of Dominance Analysis as an option to assess relative importance of MIV1 and MIV2 because -xtabond2- does not produce an appropriate fit-statistic.

Question: Is there a way that an -xtabond2- regression can be dominance analysed? If not, is there a procedure for comparing the relative importance of independent variables across two (almost) identical models in a dynamic panel setting? The short T and moderate N of my data constrain my options of dynamic panel data methods.

Thank you for taking the time to read this.

Sam

↧

Delimiting Mata code with mata: ... end vs. mata { ... }

October 27, 2023, 12:13 pm

≫ Next: How to replace generate a variable that takes values based on some conditions?

≪ Previous: Dominance Analysis - comparing between (not within) models and relative importance determination for systems GMM

Dear Matalisters,

When writing Mata code in a do-file, are there important differences between the following two ways of delimiting the Mata content?

Method 1: mata: ... end

Code:

mata :
  a
  b
end

Method 2: mata { ... }

Code:

mata {
  a
  b
}

Method 2 can be included in loops and Stata programs, are there any downsides?

Thanks,
BL

↧

How to replace generate a variable that takes values based on some conditions?

October 27, 2023, 12:15 pm

≫ Next: Triple Difference with Poisson PPML Interpretation of Interaction

≪ Previous: Delimiting Mata code with mata: ... end vs. mata { ... }

Hi all, please consider the following example

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input str3 studentid float acadyr str1 postcode float(open replaceyear) str1(replacepcd replacepcd_exp)
"123" 2010 "B" 0    . "" ""
"123" 2011 "A" 1 2012 "" "B"
"123" 2012 "B" 0    . "" ""
"123" 2013 "C" 1 2014 "" "S"
"123" 2014 "S" 0    . "" ""
"123" 2015 "D" 1 2010 "" "B"
"124" 2012 "S" 0    . "" ""
"124" 2013 "C" 1 2012 "" "S"
"124" 2015 "C" 1 2012 "" "S"
"124" 2016 "C" 1 2012 "" "S"
"124" 2017 "S" 0    . "" ""
"126" 2012 "S" 0    . "" ""
"126" 2014 "B" 0    . "" ""
"126" 2015 "C" 1 2016 "" "B"
"126" 2016 "B" 0    . "" ""
"126" 2017 "A" 1 2016 "" "B"
"126" 2018 "A" 1 2016 "" "B"
end

For each postcode that has opened, I have identified a year, whose postcode I will use as replacement. Eg., student 123 is in postcode A in 2011. Since A is a newly opened postcode, I want to replace it for continuity. Accordingly, I have identified year 2012 to be the year whose info I will be using. Student 123 was in postcode B in 2012, hence I want to replace A with B in 2012 for 123.

What I had tried initially, which was wrong was

Code:

sort studentid acadyr
by studentid: gen replacepcd=postcode if replaceyear==acadyr

Which generated variable with all missing values because for no student-acadyear combination, is there a match between replaceyear and acadyr. Had there been a single replaceyears for each student, i could have generated that year against all observations of the student and then found the match. However, since there are multiple replaceyears , that is not an option.

Would appreciate any suggestion.

↧

Triple Difference with Poisson PPML Interpretation of Interaction

October 27, 2023, 12:21 pm

≫ Next: Error with MAHAPICK: "matrix has missing values"

≪ Previous: How to replace generate a variable that takes values based on some conditions?

Hi all,

I would like to conduct a triple difference estimation. My dependent variable is a count variable. I am using the user-written command ppmlhdfe:

Code:

pmlhdfe count_appr 1.post#1.treated_occupation#1.treated_state, cluster(state1) abs(i.firsttwodig_occ_ind#i.yearmonth i.state1#i.firsttwodig_occ_ind i.state1#i.yearmonth)

count_appr | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------------------+----------------------------------------------------------------
post#treated_occupation#treated_state |
                  1 1 1  |  -.2446718   .3410114    -0.72   0.473    -.9130419    .4236983

I have read a lot of papers on the interpretation of coefficients on interaction terms, including this paper by Shang et al. (2017): https://onlinelibrary.wiley.com/doi/...?saml_referrer. It says that in the case of a normal interaction term between a continuous and a binary variable, the coefficient on the interaction term can be directly interpreted as a difference in semi-elasticity. They say this is not the case for the interaction between two binary variables.

The results shown here are the results from the triple interaction term; the interaction between three binary variables.

My main question is: how can I interpret this coefficient into something meaningful? Does this coefficient, as it is shown, mean anything meaningful? How could I calculate an interaction effect, and if I do, will it correspond to the average treatment effect on the treated? Would the margins command somehow be of any use? (I know the marginal effect of an interaction term does not exist, and interaction terms, in the sense of Ai and Norton, 2003, are difficult...)

↧

Error with MAHAPICK: "matrix has missing values"

October 27, 2023, 2:30 pm

≫ Next: STATA programs for drawing observations from an existing dataset

≪ Previous: Triple Difference with Poisson PPML Interpretation of Interaction

Hello!

I have been trying to use the MAHAPICK function to match one control observation with each treated observation using Mahalanobis distance. Currently, I am trying to match based on more than 50 different variables, each representing the percentage of observations within one US state/territory. Thus, most of the observations are 0 by design, but I think this captures the difference in operating areas very well. However, when I try to run MAHAPICK on these variables, it gives me the error of "matrix has missing values." When I do it on one or two state variables, it seems to work well, but any more than that gives the error. If I understand the function correctly, it calculates the distance between variables and synthesizes them to find the "closest" control observation to each treated observation. I do not see what the problem is, as the distance should always be calculable even if both observations are 0. I also checked all of my variables, and none of them have any missing observations. The dataset is quite clean. If anyone has experience with this function, I would appreciate any help I can get! Thank you in advance!

Best regards,
Samuel

↧

STATA programs for drawing observations from an existing dataset

October 28, 2023, 12:04 am

≫ Next: Why Winsor and Winsor2 Function will add new missing values

≪ Previous: Error with MAHAPICK: "matrix has missing values"

Hello,

I'm seeking some general tips on writing a STATA routine, hopefully this is the right place.

I'm working on a simulation program using cross-sectional data of households. The idea is to feed the program input parameters (for example, if X1=1234, X2=123 and X3=12, pull 1234, 123 and 12 observations from my data and classify them as group 1, 2 and 3). I do need the program to be flexible enough such that sometimes I want to draw from a sub-sample of my entire dataset (for example, for group2, only draw from households with a female head of household).

Currently, I'm using things like "gsort variable" and a lot of transforms to achieve this and it's mostly working. However, given how many "if" statements I have, it's quickly becoming very hard to debug issues and work through the logic. I've tried using "subsample", but unfortunately couldn't get it to work as if X1=0, it gives me an error (I could parse through the possible combinations and use a lot of "if" statements again, but then it wouldn't really make the code easier to work with. Would someone have some tips/examples on this type of program? I've (hopefully correctly) copied in my code below, it's quite a doosey! This is just the random draw part and is part of a larger program.

Thanks in advance for any tips/ideas!

Best regards,
HH

PHP Code:


*********************************************************************
* unstar below if you want to run this do.file only

clear
set seed 1234
local iso="COD"         // Country iso code
local simNb=4            // Simulation number
local MC_iterations=1     // Montecarlo iterations
qui use "${PathWorkfiles}/`iso'_`simNb'_temp-import-shock-factors.dta", clear
* Setting global variables for targeting
    global targeting_fcs 1
    *Enter "0" to turn off targeting by FCS
    *Enter "1" for FCS targeting
    global lasso_PMT 0
    *Enter "1" for a lasso based scorecard method of determining who gets assistance
    *Enter "0" to turn this off
    global targeting_error 0
    global error_rate 0.20

*********************************************************************
/* NOTE: This do file assigns or removes assistance from a pool of housheolds that are considered eligible for assistance (variable Targeting=1). This is done with three prioritization procedures:

1. Random prioritization from the eligible pool (variable "Targeting"):  ${targeting_fcs}=0 AND global ${lasso_PMT}=0
2. Ideal prioritization based on FCS: ${targeting_fcs}=1 AND global ${lasso_PMT}=1
3. Scorecard prioritization based on LASSO procedure: global ${lasso_PMT}=1 AND ${targeting_fcs}=1

*/
********************************************************************************

if "`it'"==""{
    loc it=1
}
********************************************************************************
** Baseline assistance montecarlo
gl spillover_sh = 0.21 // Filipski et al. 2022 - Most conservative estimate within a review of LEWIE studies in 13 countries (https://doi.org/10.1111/agec.12687)
loc seed2=`it'+1000
set seed `seed2'

if ${targeting_fcs}==1 & ${lasso_PMT}==1 {
    di in red "You cannot target FCS and use a PMT targeting approach at the same time, shutting down"
    exit
}

********************************************************************************

gen sample_pop=_N
cap drop s_pop
egen s_pop=total(HHWeight)
    label var s_pop "Shocked population value"
********************************************************************************
* This one is problematic, because it artificially reduces the number of people assisted
*gl capassist = 6 //Maximum number of household members assisted
cap gen HHSize_capped=HHSize
*    replace HHSize_capped=${capassist} if HHSize>=${capassist}
*    gen HHWeight_capped=HHWeight/HHSize*HHSize_capped // to use HHWeight for capping
********************************************************************************
    if base_BeneficiariesNbUnique>s_pop {
        di in red "More beneficiaries than population in Simulation ${SimID} ISO=`iso' "
        exit
    }
*    
********************************************************************************
capture egen IndNb=total(HHWeight) // population. Whenever there is 'Ind' the variable indicates individuals, if 'HH' the variable indicates households
capture gen SampleHHNb=_N

gen base_BeneficiariesNbCombo=cond(base_BeneficiariesNbUnique>=base_BeneficiariesNbIK+base_BeneficiariesNbCbt,0,-(base_BeneficiariesNbUnique-(base_BeneficiariesNbIK+base_BeneficiariesNbCbt))) // overlapping beneficiaries for base
gen sample_base_BenHHUnique    = round(base_BeneficiariesNbUnique* SampleHHNb/IndNb)
gen sample_base_BenHHIK        = round(base_BeneficiariesNbIK    * SampleHHNb/IndNb)
gen sample_base_BenHHCbt    = round(base_BeneficiariesNbCbt    * SampleHHNb/IndNb)
gen sample_base_BenHHCombo    = round(base_BeneficiariesNbCombo    * SampleHHNb/IndNb)

gen BeneficiariesNbCombo=cond(BeneficiariesNbUnique2>=BeneficiariesNbIK+BeneficiariesNbCbt,0,-(BeneficiariesNbUnique2-(BeneficiariesNbIK+BeneficiariesNbCbt))) // overlapping beneficiaries for other sims
gen sample_sim_BenHHUnique    = round(BeneficiariesNbUnique2* SampleHHNb/IndNb)
gen sample_sim_BenHHIK        = round(BeneficiariesNbIK        * SampleHHNb/IndNb)
gen sample_sim_BenHHCbt        = round(BeneficiariesNbCbt    * SampleHHNb/IndNb)
gen sample_sim_BenHHCombo    = round(BeneficiariesNbCombo    * SampleHHNb/IndNb)

***

************************************************
*** Defining beneficiary households for base ***
************************************************
* Prioritization/1. Random prioritization from the eligible pool
if ${targeting_fcs}==0 & ${lasso_PMT}==0 {
*set seed 1234
gen RNB_base=runiform() if Targeting==1             // this will change at each iteration of the Montecarlo
gsort RNB_base
gen Cumulative_base=_n                     // this is needed to randomly assign assistance to the number of households as indicated by the variables sample_BenHH*

** Both IK and CBT recipients
capture gen treated_sample_base_BenHHCombo=0
replace treated_sample_base_BenHHCombo=cond(Cumulative_base<=sample_base_BenHHCombo,1,0) if Targeting==1 & sample_base_BenHHCombo>0

** IK recipients
capture gen treated_sample_base_BenHHIK=0
replace treated_sample_base_BenHHIK = 1 if Targeting==1 & treated_sample_base_BenHHCombo==1
replace treated_sample_base_BenHHIK = cond(Cumulative_base<=sample_base_BenHHIK,1,0) if Targeting==1 & treated_sample_base_BenHHCombo==0
    
** CBT recipients
capture gen treated_sample_base_BenHHCbt=0
replace treated_sample_base_BenHHCbt=1 if Targeting==1 & treated_sample_base_BenHHCombo==1
replace treated_sample_base_BenHHCbt=cond(Cumulative_base<=sample_base_BenHHCbt+sample_base_BenHHIK-sample_base_BenHHCombo,1,0) if Targeting==1 & treated_sample_base_BenHHCombo==0 & treated_sample_base_BenHHIK==0

** Creating treated_base variable
gen treated_base=.
replace treated_base=0 if Targeting==0                                         // non-eligible
replace treated_base=1 if Targeting==1                                         // non-treated
replace treated_base=2 if Targeting==1 & treated_sample_base_BenHHCombo==1     // CBT+IK
replace treated_base=3 if Targeting==1 & treated_base!=2 & treated_sample_base_BenHHIK==1 & treated_sample_base_BenHHCbt==0        // IK
replace treated_base=4 if Targeting==1 & treated_base!=2 & treated_sample_base_BenHHIK==0 & treated_sample_base_BenHHCbt==1        // CBT
}
*

* Prioritization/2. Ideal targeting based on FCS
if ${targeting_fcs}==1 & ${lasso_PMT}==0 {
gsort -Targeting FCS
gen Cumulative_base=_n


** Both IK and CBT recipients
capture gen treated_sample_base_BenHHCombo=0
replace treated_sample_base_BenHHCombo=cond(Cumulative_base<=sample_base_BenHHCombo,1,0) if Targeting==1 & sample_base_BenHHCombo>0

** IK recipients
capture gen treated_sample_base_BenHHIK=0
replace treated_sample_base_BenHHIK = 1 if Targeting==1 & treated_sample_base_BenHHCombo==1
replace treated_sample_base_BenHHIK = cond(Cumulative_base<=sample_base_BenHHIK,1,0) if Targeting==1 & treated_sample_base_BenHHCombo==0
    
** CBT recipients
capture gen treated_sample_base_BenHHCbt=0
replace treated_sample_base_BenHHCbt=1 if Targeting==1 & treated_sample_base_BenHHCombo==1
replace treated_sample_base_BenHHCbt=cond(Cumulative_base<=sample_base_BenHHCbt+sample_base_BenHHIK-sample_base_BenHHCombo,1,0) if Targeting==1 & treated_sample_base_BenHHCombo==0 & treated_sample_base_BenHHIK==0
*
** Creating treated_base variable
gen treated_base=.
replace treated_base=0 if Targeting==0                                         // non-eligible
replace treated_base=1 if Targeting==1                                         // non-treated
replace treated_base=2 if Targeting==1 & treated_sample_base_BenHHCombo==1     // CBT+IK
replace treated_base=3 if Targeting==1 & treated_base!=2 & treated_sample_base_BenHHIK==1 & treated_sample_base_BenHHCbt==0        // IK
replace treated_base=4 if Targeting==1 & treated_base!=2 & treated_sample_base_BenHHIK==0 & treated_sample_base_BenHHCbt==1        // CBT
}
*
* Prioritization/3. Scorecard prioritization based on LASSO procedure
if ${targeting_fcs}==0 & ${lasso_PMT}==1 {
di in red "not yet implemented"
exit
}

* Exclusion error at base
if (${lasso_PMT}==1 | ${targeting_fcs}==1) & ${targeting_error}==1  {

gen eligible_replacement_base=0

forval i= 2/4 {
    gen temp_treated=1 if treated_base==`i'
    egen float Cumulative_treated = total(temp_treated)
    set seed 1234
    gen RNG_exclusion_error=runiform() if temp_treated==1
    gsort RNG_exclusion_error
    replace eligible_replacement_base=-1 if RNG_exclusion_error<=${error_rate} & temp_treated==1
    drop temp_treated Cumulative_treated RNG_exclusion_error
    egen float dropped_treated_`i'=total(abs(eligible_replacement_base)) if treated_base==`i' // this is the number of those that don't get assistance for the exclusion error by transfer modality
    sum dropped_treated_`i'
    scalar scalar_dropped_number_`i'=r(max)
    gen dropout_`i'=scalar_dropped_number_`i'
    }
gen dropin_2=dropout_2
egen float dropin_3=rowtotal(dropout_2 dropout_3)
egen float dropin_4=rowtotal(dropout_2 dropout_3 dropout_4)
    *
gen float dropout=dropin_4 // total number of households in the error

gen temp_nontreated=1 if treated_base==1
*gen temp_nontreated=1 if treated_base<=1 // if we want to relax the condition above and allow the inclusion error also to non-eligible households

egen float Cumulative_nontreated = total(temp_nontreated)
set seed 1234
gen RNG_inclusion_error=runiform() if temp_nontreated==1
gsort RNG_inclusion_error
gen Progressive_nontreated=_n
replace eligible_replacement_base=1 if Progressive_nontreated<=dropout & temp_nontreated==1

gsort -eligible_replacement_base
replace treated_base=2 if eligible_replacement_base==1 & treated_base==1 & _n<=dropin_2 & dropin_2!=.
replace treated_base=3 if eligible_replacement_base==1 & treated_base==1 & _n<=dropin_3 & dropin_3!=.
replace treated_base=4 if eligible_replacement_base==1 & treated_base==1 & _n<=dropin_4 & dropin_4!=.

replace treated_base=1 if eligible_replacement_base==-1

drop temp_nontreated dropped_treated_2 dropped_treated_3 dropped_treated_4 dropped_treated_2  dropout dropout_2 dropin_2 dropout_3 dropin_3 dropout_4 dropin_4 Progressive_nontreated    RNG_inclusion_error  Cumulative_nontreated

tab eligible_replacement_base treated_base
}
*

tabstat HHWeight, statistics( sum ) by(treated_base)


*************************************************************
*** Defining beneficiary Households for other simulations ***
*************************************************************

** Creating a variable for HH receiving assistance (treated)
capture gen treated_sample_sim_BenHHCombo=treated_sample_base_BenHHCombo
capture gen treated_sample_sim_BenHHIK=treated_sample_base_BenHHIK
capture gen treated_sample_sim_BenHHCbt=treated_sample_base_BenHHCbt

/*
***Step1. check if the number of beneficiaries in the simulation as compared to the baseline increases (scale-up), decreases (scale-down) or stays put:

Scale-up     aka Increased number of beneficiaries:      1
No change     aka Same number of beneficiaries:          0
Scale-down     aka Decreased number of beneficiaries:    -1

*/

gen cash_benef=.
gen ik_benef=.
gen combo_benef=.

*define values
replace cash_benef=1     if BeneficiariesNbCbt>base_BeneficiariesNbCbt
replace cash_benef=0     if BeneficiariesNbCbt==base_BeneficiariesNbCbt
replace cash_benef=-1     if BeneficiariesNbCbt<base_BeneficiariesNbCbt
replace ik_benef=1         if BeneficiariesNbIK>base_BeneficiariesNbIK
replace ik_benef=0         if BeneficiariesNbIK==base_BeneficiariesNbIK
replace ik_benef=-1     if BeneficiariesNbIK<base_BeneficiariesNbIK
replace combo_benef=1     if BeneficiariesNbCombo>base_BeneficiariesNbCombo
replace combo_benef=0     if BeneficiariesNbCombo==base_BeneficiariesNbCombo
replace combo_benef=-1     if BeneficiariesNbCombo<base_BeneficiariesNbCombo

***Step2. generate treated_sim, initialize as treated_base
gen treated_sim=treated_base
*gen treated_sim=treated_base  // here we assume that the same beneficiaries assisted at the base continue to receive assistance, meaning no full retargeting applies

***Step3. First we need to deal with reductions, as this moves people back to eligible

*************************
* ASSISTANCE SCALE-DOWN *
*************************
* Prioritization/1. Random prioritization from the eligible pool
if ${targeting_fcs}==0 & ${lasso_PMT}==0 {
    if combo_benef==-1 {
        set seed 1234
        gen RNG=runiform() if treated_sim==2
        sort RNG
        replace treated_sim=1 if treated_sim==2 & (_n<=abs(sample_sim_BenHHCombo-sample_base_BenHHCombo))
        drop RNG
    }
    *    
    if ik_benef==-1 {
        set seed 1234
        gen RNG=runiform() if treated_sim==3
        sort RNG
        replace treated_sim=1 if treated_sim==3 & (_n<=abs(sample_sim_BenHHIK-sample_base_BenHHIK))
        drop RNG
    }
    *
    if cash_benef==-1 {
        set seed 1234
        gen RNG=runiform() if treated_sim==4
        sort RNG
        replace treated_sim=1 if treated_sim==4 & (_n<=abs(sample_sim_BenHHCbt-sample_base_BenHHCbt))
        drop RNG
    }
}
*
* Prioritization/2. Ideal targeting based on FCS
if ${targeting_fcs}==1 & ${lasso_PMT}==0 {
    if combo_benef==-1 {
        gsort treated_sim -FCS
        gen temp=cond(treated_sim==2,1,0)
        gsort -temp -FCS
        gen Cumulative_base_fcs_scaledown=0
        replace Cumulative_base_fcs_scaledown=_n if treated_sim==2
        replace treated_sim=1 if treated_base==2 & (Cumulative_base_fcs_scaledown<=sample_base_BenHHCombo-sample_sim_BenHHCombo)
        drop temp Cumulative_base_fcs_scaledown
    }
    *    
    if ik_benef==-1 {
        gsort treated_sim -FCS
        gen temp=cond(treated_sim==3,1,0)
        gsort -temp -FCS
        gen Cumulative_base_fcs_scaledown=0
        replace Cumulative_base_fcs_scaledown=_n if treated_sim==3
        replace treated_sim=1 if treated_base==3 & (Cumulative_base_fcs_scaledown<=sample_base_BenHHIK-sample_sim_BenHHIK)
        drop temp Cumulative_base_fcs_scaledown
    }
    *
    if cash_benef==-1 {
        gsort treated_sim -FCS
        gen temp=cond(treated_sim==4,1,0)
        gsort -temp -FCS
        gen Cumulative_base_fcs_scaledown=0
        replace Cumulative_base_fcs_scaledown=_n if treated_sim==4
        replace treated_sim=1 if treated_base==4 & (Cumulative_base_fcs_scaledown<=sample_base_BenHHCbt-sample_sim_BenHHCbt)
        drop temp Cumulative_base_fcs_scaledown
    }

}
*
* Prioritization/3. Scorecard prioritization based on LASSO procedure
if ${targeting_fcs}==0 & ${lasso_PMT}==1 {
    di in red "not yet implemented"
exit
}
*

tabstat HHWeight, statistics( sum ) by(treated_base)
tabstat HHWeight, statistics( sum ) by(treated_sim)

*
*******************************************

***Step4. Next apply additively random increases (here only drawing from the updated eligible pool)

***********************
* ASSISTANCE SCALE-UP *
***********************

* Prioritization/1. Random prioritization from the eligible pool
if ${targeting_fcs}==0 & ${lasso_PMT}==0 {

    if ik_benef==1 {
        set seed 1234
        gen RNG=runiform() if treated_sim==1
        sort RNG
        replace treated_sim=3 if treated_sim==1 & (_n<=abs(sample_sim_BenHHIK-sample_base_BenHHIK)) // we assign the additional assistance to those not assisted at baseline (i.e. treated_sim==1 which is equal to treated_base==1 as per line 134)
        drop RNG
    }
    if cash_benef==1 {
        set seed 1234
        gen RNG=runiform() if treated_sim==1
        sort RNG
        replace treated_sim=4 if treated_sim==1 & (_n<=abs(sample_sim_BenHHCbt-sample_base_BenHHCbt))
        drop RNG
    }
    if combo_benef==1 {
        set seed 1234
        gen RNG=runiform() if treated_sim==1
        sort RNG
        replace treated_sim=2 if treated_sim==2 & (_n<=abs(sample_sim_BenHHCombo-sample_base_BenHHCombo))
        drop RNG
    }
tab treated_base treated_sim
}
*
* Prioritization/2. Ideal targeting based on FCS
if ${targeting_fcs}==1 & ${lasso_PMT}==0 {
    *calculate total number of people that needs to be drawn (all assistance types)
    gen fcs_draw_IK=(sample_sim_BenHHIK-sample_base_BenHHIK)
    replace fcs_draw_IK=0 if fcs_draw_IK<0
    gen fcs_draw_Cbt=(sample_sim_BenHHCbt-sample_base_BenHHCbt)
    replace fcs_draw_Cbt=0 if fcs_draw_Cbt<0
    gen fcs_draw_Combo=(sample_sim_BenHHCombo-sample_base_BenHHCombo)
    replace fcs_draw_Combo=0 if fcs_draw_Combo<0
    egen fcs_draw_tot=rowtotal(fcs_draw_Combo fcs_draw_Cbt fcs_draw_IK)
    sum fcs_draw_tot
    scalar sfcs_draw_tot=r(mean)
    *first draw a full pool
    gen low_fcs=0
    gen FCS_target=FCS if treated_sim==1
    sort FCS_target
    replace low_fcs=1 if _n<=sfcs_draw_tot
    *break the sort
    set seed 1234
    gen RNG=runiform() if low_fcs==1
    sort RNG
    drop RNG
    
if ik_benef==1 {
    replace treated_sim=3 if treated_sim==1 & (_n<=fcs_draw_IK)
}
if cash_benef==1 {
    replace treated_sim=4 if treated_sim==1 & (_n<=fcs_draw_Cbt+fcs_draw_IK)
}
if combo_benef==1 {
    replace treated_sim=2 if treated_sim==1 & (_n<=fcs_draw_Combo+fcs_draw_Cbt+fcs_draw_IK)

}
tab treated_base treated_sim
}

* Prioritization/3. Scorecard prioritization based on LASSO procedure
if ${targeting_fcs}==0 & ${lasso_PMT}==1 {
    *calculate total number of poeple that needs to be drawn (all assistance types)
    gen fcs_draw_IK=(sample_sim_BenHHIK-sample_base_BenHHIK)
    replace fcs_draw_IK=0 if fcs_draw_IK<0
    gen fcs_draw_Cbt=(sample_sim_BenHHCbt-sample_base_BenHHCbt)
    replace fcs_draw_Cbt=0 if fcs_draw_Cbt<0
    gen fcs_draw_Combo=(sample_sim_BenHHCombo-sample_base_BenHHCombo)
    replace fcs_draw_Combo=0 if fcs_draw_Combo<0
    egen fcs_draw_tot=rowtotal(fcs_draw_IK fcs_draw_Cbt fcs_draw_Combo)
    sum fcs_draw_tot
    scalar sfcs_draw_tot=r(mean)
    
    *applies lasso regression to FCS (as a comparison point to current FCS)
    lasso2 FCS lasso_*
    lasso2, lic(ebic) postres
    reg FCS `e(selected)', cluster(Adm2Name)
    predict latent_fcs
    
    ***generate potential list of beneficiaries
    gen low_fcs_hat=0
    gen latent_fcs_target=latent_fcs if treated_sim==1
    sort latent_fcs_target
    replace low_fcs_hat=1 if _n<=sfcs_draw_tot
    *break the sort
    set seed 1234
    gen RNG=runiform() if low_fcs_hat==1
    sort RNG
    drop RNG
    
if ik_benef==1 {
    replace treated_sim=3 if treated_sim==1 & (_n<=fcs_draw_IK)
}
if cash_benef==1 {
    replace treated_sim=4 if treated_sim==1 & (_n<=fcs_draw_Cbt+fcs_draw_IK)
}
if combo_benef==1 {
    replace treated_sim=2 if treated_sim==1 & (_n<=fcs_draw_Combo+fcs_draw_Cbt+fcs_draw_IK)
}
}

* Exclusion error at simulation
if (${lasso_PMT}==1 | ${targeting_fcs}==1) & ${targeting_error}==1 {

gen eligible_replacement_sim=0

    forval i= 2/4 {
        gen temp_treated=1 if treated_sim==`i'
        egen float Cumulative_treated = total(temp_treated)
        set seed 1234
        gen RNG_exclusion_error=runiform() if temp_treated==1
        gsort RNG_exclusion_error
        replace eligible_replacement_sim=-1 if RNG_exclusion_error<=${error_rate} & temp_treated==1
        drop temp_treated Cumulative_treated RNG_exclusion_error
        egen float dropped_treated_`i'=total(abs(eligible_replacement_sim)) if treated_sim==`i' // this is the number of those that don't get assistance for the exclusion error by transfer modality
        sum dropped_treated_`i'
        scalar scalar_dropped_number_`i'=r(max)
        gen dropout_`i'=scalar_dropped_number_`i'
        *drop dropped_treated_`i'
        }
gen dropin_2=dropout_2
egen float dropin_3=rowtotal(dropout_2 dropout_3)
egen float dropin_4=rowtotal(dropout_2 dropout_3 dropout_4)
    *
gen float dropout=dropin_4 // total number of households in the error

gen temp_nontreated=1 if treated_sim==1
*gen temp_nontreated=1 if treated_sim<=1 // if we want to relax the condition above and allow the inclusion error also to non-eligible households

egen float Cumulative_nontreated = total(temp_nontreated)
set seed 1234
gen RNG_inclusion_error=round(runiform()) if temp_nontreated==1
gsort RNG_inclusion_error
gen Progressive_nontreated=_n
replace eligible_replacement_sim=1 if Progressive_nontreated<=dropout & temp_nontreated==1

gsort -eligible_replacement_sim
replace treated_sim=2 if eligible_replacement_sim==1 & treated_sim==1 & _n<=dropin_2 & dropin_2!=.
replace treated_sim=3 if eligible_replacement_sim==1 & treated_sim==1 & _n<=dropin_3 & dropin_3!=.
replace treated_sim=4 if eligible_replacement_sim==1 & treated_sim==1 & _n<=dropin_4 & dropin_4!=.

replace treated_sim=1 if eligible_replacement_sim==-1

drop temp_nontreated dropped_treated_2 dropped_treated_3 dropped_treated_4 dropped_treated_2  dropout dropout_2 dropin_2 dropout_3 dropin_3 dropout_4 dropin_4 Progressive_nontreated    RNG_inclusion_error Cumulative_nontreated

tab eligible_replacement_sim treated_sim
}
*

***************************END of targeting error section***********************


cap drop cash_benef
cap drop ik_benef
cap drop combo_benef

capture label define treated_lab 0 "non-eligibile" 1 "non-treated" 2 "CBT+IK" 3 "IK" 4 "CBT"
label values treated_base treated_lab
label values treated_sim treated_lab


*tabstat HHWeight, statistics( sum ) by(treated_base)
*tabstat HHWeight, statistics( sum ) by(treated_sim)


*************************************************************
*** Defining Transfer value                                 ***
*************************************************************

*** Base
rename PCAssistanceValueIK_1M sim_PCAssistanceValueIK_1M
rename PCAssistanceValueCbt_1M sim_PCAssistanceValueCbt_1M

gen base_HHAssistanceValueIK_1M     = base_PCAssistanceValueIK_1M     * HHSize_capped if (treated_base==2 | treated_base==3)
gen base_HHAssistanceValueCbt_1M     = base_PCAssistanceValueCbt_1M     * HHSize_capped if (treated_base==2 | treated_base==4)
egen base_HHAssistanceValueTot_1M     = rowtotal(base_HHAssistanceValueCbt_1M base_HHAssistanceValueIK_1M)

gen sim_HHAssistanceValueIK_1M         = sim_PCAssistanceValueIK_1M     * HHSize_capped if (treated_sim==2 | treated_sim==3)
gen sim_HHAssistanceValueCbt_1M     = sim_PCAssistanceValueCbt_1M     * HHSize_capped if (treated_sim==2 | treated_sim==4)

***Step1. intialize changes to income at 0
gen delta_HHAssistanceValueIK_1M=0
gen delta_HHAssistanceValueCbt_1M=0

***Step2. For those who were on assistance and then lost assistance, remove base transfer value
replace delta_HHAssistanceValueIK_1M=-base_HHAssistanceValueIK_1M if treated_sim==1 & (treated_base==2 | treated_base==3)
replace delta_HHAssistanceValueCbt_1M=-base_HHAssistanceValueCbt_1M if treated_sim==1 & (treated_base==2 | treated_base==4)

***Step3. For those who have been added to assistance, add simulated transfer value
replace delta_HHAssistanceValueIK_1M=sim_HHAssistanceValueIK_1M if treated_base==1 & (treated_sim==2 | treated_sim==3)
replace delta_HHAssistanceValueCbt_1M=sim_HHAssistanceValueCbt_1M if treated_base==1 & (treated_sim==2 | treated_sim==4)

***Step4. Those who stay on assistance, we need to adjust assistance levels (if no changes, then the levels shouldn't change)
replace delta_HHAssistanceValueIK_1M=sim_HHAssistanceValueIK_1M-base_HHAssistanceValueIK_1M if (treated_base==2 | treated_base==3) & (treated_sim==2 | treated_sim==3)
replace delta_HHAssistanceValueCbt_1M=sim_HHAssistanceValueCbt_1M-base_HHAssistanceValueCbt_1M if (treated_base==2 | treated_base==4) & (treated_sim==2 | treated_sim==4)

egen delta_HHAssistanceValueTot_1M     = rowtotal(delta_HHAssistanceValueCbt_1M delta_HHAssistanceValueIK_1M)


*************************************************************
*** Defining spillover effects                               ***
*************************************************************

cap drop base_transfer_spill_sample
gen base_spill_HHValueIK_1M = base_PCAssistanceValueIK_1M*HHSize_capped if (treated_base==2 | treated_base==3)
gen base_spill_HHValueCbt_1M = base_PCAssistanceValueCbt_1M*HHSize_capped if (treated_base==2 | treated_base==4)
egen base_spill_HHValueTot_1M = rowtotal(base_spill_HHValueCbt_1M base_spill_HHValueIK_1M)
bys Adm1Code: egen base_transfer_spill_sample=mean(base_spill_HHValueTot_1M)
replace base_transfer_spill_sample=base_transfer_spill_sample*(${spillover_sh}) // this one is > 0 and quantifies the spillover effects that have already materialized in the income (HSI) at the base
recode base_transfer_spill_sample (.=0)

cap drop sim_transfer_spill_sample
gen sim_spill_HHValueIK_1M = sim_PCAssistanceValueIK_1M*HHSize_capped if (treated_sim==2 | treated_sim==3)
gen sim_spill_HHValueCbt_1M = sim_PCAssistanceValueCbt_1M*HHSize_capped if (treated_sim==2 | treated_sim==4)
egen sim_spill_HHValueTot_1M = rowtotal(sim_spill_HHValueCbt_1M sim_spill_HHValueIK_1M)
gen delta_spill_HHValueTot_1M=sim_spill_HHValueTot_1M-base_spill_HHValueTot_1M
bys Adm1Code: egen delta_transfer_spill_sample=mean(delta_spill_HHValueTot_1M)
replace delta_transfer_spill_sample=delta_transfer_spill_sample*(${spillover_sh}) // this one is <= 0
recode delta_transfer_spill_sample (.=0)

gen HSI_S=0
replace HSI_S=HSI+delta_HHAssistanceValueTot_1M+delta_transfer_spill_sample  // this removes part of the spillover effects as a result of budget cuts
replace HSI_S=0 if HSI_S<0

gen r_inc=HSI_S/HSI

drop r_inc

↧

Why Winsor and Winsor2 Function will add new missing values

October 28, 2023, 12:06 am

≫ Next: Adding two sets of significance levels for tables using estout

≪ Previous: STATA programs for drawing observations from an existing dataset

I want to use winsorize my data so here I use winsor and winsor2 function. But after I winsorize it,the missing values become more. How could this case happen?

Here is my result:Array

↧

Adding two sets of significance levels for tables using estout

October 28, 2023, 12:07 am

≫ Next: Coarsened Exact Matching (CEM): assigning to treated and control groups

≪ Previous: Why Winsor and Winsor2 Function will add new missing values

Hello everyone,

I’m using estout from SSC.
Stata ver. 18

I would like to create a table where I can have two sets of significance "stars" for two different p-values and use this table in a LaTeX environment.
One group of significance "stars" would be for the differences between two categories for the same gender group (***), and another one for the differences between gender (+++).

I've been searching for solutions but haven't have a lot of luck so far. I would appreciate it if anyone has any tips to reach this result.
I've been using esttab to generate all of my tables.

I'm attaching an example of how my code looks so far by using the bplong dataset.

My goal would be to add another comparison between both genders with bp == 1 and between those with bp == 2.
So, difference between male and female with bp == 1; and difference between male and female with bp == 2 and add the significance levels with +, ++, +++ while keeping the current ones (*, **, ***).

Code:

. sysuse bplong
(Fictional blood-pressure data)

. 
. eststo m1: mean bp if sex == 0 & when == 1

Mean estimation                             Number of obs = 60

--------------------------------------------------------------
             |       Mean   Std. err.     [95% conf. interval]
-------------+------------------------------------------------
          bp |   159.2667   1.473469      156.3183    162.2151
--------------------------------------------------------------

. 
. eststo m2: mean bp if sex == 0 & when == 2

Mean estimation                             Number of obs = 60

--------------------------------------------------------------
             |       Mean   Std. err.     [95% conf. interval]
-------------+------------------------------------------------
          bp |   155.5167   1.967891      151.5789    159.4544
--------------------------------------------------------------

. 
. eststo m3: mean bp if sex == 1 & when == 1

Mean estimation                             Number of obs = 60

--------------------------------------------------------------
             |       Mean   Std. err.     [95% conf. interval]
-------------+------------------------------------------------
          bp |   153.6333    1.38596        150.86    156.4066
--------------------------------------------------------------

. 
. eststo m4: mean bp if sex == 1 & when == 2

Mean estimation                             Number of obs = 60

--------------------------------------------------------------
             |       Mean   Std. err.     [95% conf. interval]
-------------+------------------------------------------------
          bp |      147.2   1.515979      144.1665    150.2335
--------------------------------------------------------------

. 
. eststo m5: mean bp if sex == 0

Mean estimation                            Number of obs = 120

--------------------------------------------------------------
             |       Mean   Std. err.     [95% conf. interval]
-------------+------------------------------------------------
          bp |   157.3917   1.236031      154.9442    159.8391
--------------------------------------------------------------

. 
. mat pval= r(table)["pvalue", 1...]

. 
. estadd mat pval=pval

added matrix:
               e(pval) :  1 x 1

. 
. eststo m6: mean bp if sex == 1

Mean estimation                            Number of obs = 120

--------------------------------------------------------------
             |       Mean   Std. err.     [95% conf. interval]
-------------+------------------------------------------------
          bp |   150.4167   1.064357      148.3091    152.5242
--------------------------------------------------------------

. 
. mat pval= r(table)["pvalue", 1...]

. 
. estadd mat pval=pval

added matrix:
               e(pval) :  1 x 1

. 
. esttab , main(b) aux(sd)

------------------------------------------------------------------------------------------------------------
                      (1)             (2)             (3)             (4)             (5)             (6)   
                     Mean            Mean            Mean            Mean            Mean            Mean   
------------------------------------------------------------------------------------------------------------
bp                  159.3***        155.5***        153.6***        147.2***        157.4***        150.4***
                  (11.41)         (15.24)         (10.74)         (11.74)         (13.54)         (11.66)   
------------------------------------------------------------------------------------------------------------
N                      60              60              60              60             120             120   
------------------------------------------------------------------------------------------------------------
b coefficients; sd in parentheses
* p<0.05, ** p<0.01, *** p<0.001

↧

Coarsened Exact Matching (CEM): assigning to treated and control groups

October 28, 2023, 3:13 am

≫ Next: Partial-out variables with respect to multiple levels of fixed-effects for Poisson regression (Frisch-Waugh-Lovell for PPML)

≪ Previous: Adding two sets of significance levels for tables using estout

Hey Stata List,

I am exploring the difference in school discipline administration on different ethnic groups. I have a dichotomous DV (1 = yes, 2 = no) and a categorical IV (eight categories). I would like to create seven treated groups and one control group (pertaining to White race) to see if disproportionality in school suspension exists between White and non-White school children.

How do I create treated and control groups based on the data snippet below:

Code:

. ssc install dataex
checking dataex consistency and verifying not already installed...
all files already exist and are up to date.

. dataex W1ExcludeYP W1ethgrpYP

----------------------- copy starting from the next line -----------------------


	Code:
	* Example generated by -dataex-. To install: ssc install dataex
clear
input float W1ExcludeYP int W1ethgrpYP
0 1
0 4
1 2
. 1
0 1
0 1
0 3
0 1
1 1
0 1
0 1
0 1
0 6
0 1
0 1
0 1
0 1
0 3
0 3
0 1
0 1
. 4
0 5
0 1
0 1
0 1
0 3
0 1
0 5
0 1
0 1
0 1
0 6
1 1
1 4
0 1
0 1
1 1
0 1
0 1
0 1
0 1
0 1
0 1
0 1
0 1
0 1
0 1
0 8
0 1
0 1
0 1
0 1
0 1
. 1
0 4
0 6
0 1
0 1
0 1
. 4
0 5
0 1
1 6
0 1
0 1
. 7
. 1
0 4
0 5
0 1
0 1
0 7
0 1
0 1
0 1
. 8
1 1
0 1
0 2
0 1
0 1
0 1
0 4
0 1
0 1
0 1
0 1
1 3
1 1
0 1
0 1
0 1
1 4
0 1
0 1
0 4
0 1
. 5
0 3
end
label values W1ethgrpYP W1ethgrpYP
label def W1ethgrpYP 1 "White", modify
label def W1ethgrpYP 2 "Mixed", modify
label def W1ethgrpYP 3 "Indian", modify
label def W1ethgrpYP 4 "Pakistani", modify
label def W1ethgrpYP 5 "Bangladeshi", modify
label def W1ethgrpYP 6 "Black Caribbean", modify
label def W1ethgrpYP 7 "Black African", modify
label def W1ethgrpYP 8 "Other", modify
------------------ copy up to and including the previous line ------------------

Listed 100 out of 15770 observations
Use the count() option to list more

.

Thank you!

↧

Partial-out variables with respect to multiple levels of fixed-effects for Poisson regression (Frisch-Waugh-Lovell for PPML)

October 28, 2023, 3:45 am

≫ Next: summarize descriptive statistic with T-Value, star (all)

≪ Previous: Coarsened Exact Matching (CEM): assigning to treated and control groups

Dear community,

I need to produce a jackknife variance estimator for a Poisson regression that implements a double difference design (difference-in-difference)

This is the command I run

glm ///
Naloxone /// Outcome on interest
HN /// Treatment dummy
i.t c.t##i.mt /// Double difference with flexible trends
LogPoliceRate /// Controls
trpf physicianexam /// Controls
pharmacistverification requireid /// Controls
T_GS_HasLaw pdmp doctorshopping painclinic /// Controls
, ///
family(poisson) ///
vce(jackknife, cluster(st) idcluster(ST)) //

Jackknife fails to compute, i.e., all crosses are red. I suspect that this is a computational issue driven by a large number of fixed effects. Therefore, I'd like to partially out fixed effects (i.t c.t##i.mt) and run ppml without them. Section 3.4. of Cluster-robust inference: A guide to empirical practice by James G. MacKinnon and others suggests doing so for jackknife variance in the context of linear regression.

For linear models, Stata's package HDFE trivialises this task
http://scorreia.com/demo/hdfe.html
However, for Poisson regression, the implementation is unclear.

Stata's PPMLHDFE
http://scorreia.com/help/ppmlhdfe.html
as explained here
http://arxiv.org/abs/1903.01690
talks about Frisch-Waugh-Lovell for PPML, but I do know how to implement it exactly.

For concreteness, consider the following:

Code:

sysuse auto, clear
* Benchmark | FWL Theorem for linear model
reghdfe ///
price ///
weight ///
length ///
, ///
a(turn trunk)
 
* Demean variables
hdfe ///
price ///
weight ///
length ///
, ///
a(turn trunk) ///
 gen(RESID_)
 
* Same point estimates as in reghdfe without fixed effects | very nice
reg ///
RESID_price ///
RESID_weight ///
RESID_length ///
, ///
nocons
 
* Then the question is how to get residuals to estimate the following
glm ///
RESID_price ///
RESID_weight ///
RESID_length ///
, ///
family(poisson) //
 
* so that it gives the same point estimates as this
glm ///
price ///
weight ///
length ///
i.turn i.trunk ///
, ///
family(poisson) //

Thank you, and I hope everyone is having a fantastic day.

Warm regards,
Sergey Alexeev
https://alexeev.pw/

↧

summarize descriptive statistic with T-Value, star (all)

October 28, 2023, 4:40 am

≫ Next: problem in using *ivreg2 for the SLS tes

≪ Previous: Partial-out variables with respect to multiple levels of fixed-effects for Poisson regression (Frisch-Waugh-Lovell for PPML)

Dear I am using this comand with support stata research, so how to export using asdoc, Collect preview star (all).
See the comanad below:

label define CATH 0 "Non-Chatolic" 1 "Chatolic"
label values CATH CATH
label define PROT 0 "Non-Prot" 1 "Prot"
label values PROT PROT
drawnorm x1-x26
label var x1 IPOV
label var x2 IPOC
label var x3 lnIPOV
label var x4 lnIPOC
label var x5 PD
label var x6 IND
label var x7 MAS
label var x8 UAI
label var x9 LTO
label var x10 MO
label var x11 GDPg
label var x12 VALUE
label var x13 SAVING
label var x14 PRIVAT
label var x15 INVEST
label var x16 BANK
label var x17 LDGP_1
label var x18 MCP_1
label var x19 TURN_1
label var x20 PCW
label var x21 ESG
label var x22 BUR
label var x23 COR
label var x24 DEM
label var x25 LORD
label var x26 GST

collect results
foreach x of varlist x* {
ttest `x', by(CATH)
collect get mean=(r(mu_1)), tags(var[`x'] CATH[0])
collect get mean=(r(mu_2)), tags(var[`x'] CATH[1])
collect get t=(r(t)) p=(r(p)), tags(var[`x'] CATH[_hide])
ttest `x', by(PROT)
collect get mean=(r(mu_1)), tags(var[`x'] PROT[0])
collect get mean=(r(mu_2)), tags(var[`x'] PROT[1])
collect get t=(r(t)) p=(r(p)), tags(var[`x'] PROT[_hide])
}
* define the rules for significance labels
collect stars p .1 "*" .01 "**" .001 "***" , attach(t) dimension

collect layout (var) (CATH#result#stars PROT#result#stars)

* select results to show in table
collect style autolevels result mean t, clear
* change display order of grouping variables
collect style autolevels CATH 1 0 _hide
collect style autolevels PROT 1 0 _hide
* label the test statistic
collect label levels result t "Test t", modify
* hide column labels for the mean
collect style header result[mean], level(hide)
* left align the stars labels
collect style cell stars[label], halign(left)
* other misc style choices
collect style cell result[mean t], nformat(%9.3f)
collect style cell border_block[corner row-header], ///
border(right, pattern(none))
collect style column, dups(center)
collect preview

I get this table: I Think the table does not report all significance level ( 0.01; 0.05; 0.1) Array

The table must to be like belows , where we can see all the significance level. Someone can Help: Array

Wait for some Help
Best regards

↧

problem in using *ivreg2 for the SLS tes

October 28, 2023, 5:28 am

≫ Next: Cluster-robust wald test of linear hypthesis

≪ Previous: summarize descriptive statistic with T-Value, star (all)

HI please im using ivreg2 for 2SLS

this is the command i used

ivreg2 EQUITY FSIZE OP_CF SD_OCF TAX BM LEV MA DPP RPP PSIZE DR Gov_score Sustain_Perf Sust_Commit i.id i.year (CSO = Instrumental ) , first robust endog(CSO)

at the end of the result in Stata

Warning: estimated covariance matrix of moment conditions not of full rank.
overidentification statistic not reported, and standard errors and
model tests should be interpreted with caution.Possible causes:
singleton dummy variable (dummy with one 1 and N-1 0s or vice versa)
partial option may address problem.

Im not sure if there is a problem in my steps

↧

Cluster-robust wald test of linear hypthesis

October 28, 2023, 9:19 am

≫ Next: Peers of peers dataset

≪ Previous: problem in using *ivreg2 for the SLS tes

Hello,

I am estimating a DiD-model of student gpa on hours worked with treatment taking place at the industry level in 2021. My dataset ranges from 2010-2021. To account for autocorrelation within industry I cluster the standard errors at the industry-level (with 48 clusters). I estimate the following equation:

Code:

reghdfe gpa treat2021, absorb(year industry) vce(cluster industry)

Where treat2021 is my treatment dummy. I want to perform a test for parallel trends by estimating a second equation with a (placebo) treatment indicator in every year (leaving one out):

Code:

reghdfe gpa treat2011-treat2021, absorb(year industry) vce(cluster industry)

I then use a Wald-test:

Code:

testparm treat2011-treat2020

However, this gives me nonsensical results. I very strongly reject the null-hypothesis (p=0.000), even though each parameter is very insignificant on its own. If I instead use normal heteroscedasticity-robust standard errors, the standard errors on the parameters become only slightly smaller, but now I get a p-value of 0.202 in my Wald test. Wouldn't I expect that smaller standard errors leads to, if anything, a lower p-value? And in any case, the change seems very dramatic. Can this be because the standard Wald-test is not cluster-robust? And if so, does there exist an equivalent cluster-robust test for linear hypotheses in Stata?

↧

Peers of peers dataset

October 28, 2023, 9:43 am

≫ Next: Sign and zero Restricted VAR model

≪ Previous: Cluster-robust wald test of linear hypthesis

Dear all,
using an individuals nested in schools dataset, I am trying to create a dataset with the peers or peers which meet the following condition: the primary school (ks2) peers’ of the secondary school (ks4) peers, who attended a different primary school than that of the individual of interest.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float(id ks2perf) str1(ks4_school_id ks2_school_id)
1 14 "a" "b"
2 11 "a" "b"
3  9 "a" "c"
4 17 "a" "c"
5 22 "a" "c"
6  1 "d" "c"
7 18 "d" "b"
end

In this toy example case 6 is a relevant peer of peer for the peers 3, 4, 5, which in turn are the relevant peers for individuals 1 and 2.
How do I create an extra column that indicates if the peers of peers condition is met? I think this requires creating additional datasets with different variables names and then merging them, but my attempts do not seem to work.
Any help is much appreciated, Nic

↧

Sign and zero Restricted VAR model

October 28, 2023, 9:55 am

≫ Next: How to combine primary diagnosis variable and secondary diagnosis variable in stata when using ICD 10 codes

≪ Previous: Peers of peers dataset

Hello, i want to identify the fiscal shocks in my dataset using an agnostic sign restriction approach that sets a minimum number of restrictions on impulse responses, while controlling for other macroeconomic shocks (monetary shock, business shock etc). I have reviewed the literature and I know this can be done by implementing Sign and zero Restricted VAR model but I can not find the commands to do that in Stata. I have checked in Stata manuals for Time Series and VAR but no help. Can u please tell me if u know a manual or where I can find these commands.

Thank you in advance!!!

↧

How to combine primary diagnosis variable and secondary diagnosis variable in stata when using ICD 10 codes

October 28, 2023, 10:58 am

≫ Next: What is wrong with my code by using "if" in foreach loop? Why "{ required" shows up?

≪ Previous: Sign and zero Restricted VAR model

What stata command is used to combine primary diagnosis variable and secondary diagnosis variable in stata when using ICD 10 codes

↧

What is wrong with my code by using "if" in foreach loop? Why "{ required" shows up?

October 28, 2023, 1:59 pm

≫ Next: Calculating 5 minute frequency stock price using last tick interpolation

≪ Previous: How to combine primary diagnosis variable and secondary diagnosis variable in stata when using ICD 10 codes

Hello. I typed the following

count
foreach i of local r(N){

foreach x of numlist 2000/2009{

if year_sign<=`x'{
replace year`x'=1}
}
}
Then, { required shows up.
I don't know why. Could you help me and show some reference like Stata help file?

Here is the data

clear
input str5(ISO1 ISO2) float year str6 number int base_treaty float(year_sign yearsign2000 yearsign2001 yearsign2002 yearsign2003 yearsign2004 yearsign2005 yearsign2006 yearsign2007 yearsign2008 yearsign2009)
"AGO" "BDI" 1991 "3" 3 1991 0 0 0 0 0 0 0 0 0 0
"AGO" "BEN" 1991 "3" 3 1991 0 0 0 0 0 0 0 0 0 0
"AGO" "BFA" 1991 "3" 3 1991 0 0 0 0 0 0 0 0 0 0
"AGO" "BWA" 1991 "3" 3 1991 0 0 0 0 0 0 0 0 0 0
"AGO" "CAF" 1991 "3" 3 1991 0 0 0 0 0 0 0 0 0 0
"AGO" "CIV" 1991 "3" 3 1991 0 0 0 0 0 0 0 0 0 0
"AGO" "CMR" 1991 "3" 3 1991 0 0 0 0 0 0 0 0 0 0
"AGO" "COD" 1991 "3" 3 1991 0 0 0 0 0 0 0 0 0 0
"AGO" "COG" 1991 "3" 3 1991 0 0 0 0 0 0 0 0 0 0
"AGO" "COM" 1991 "3" 3 1991 0 0 0 0 0 0 0 0 0 0
end
[/CODE]

↧

Calculating 5 minute frequency stock price using last tick interpolation

October 28, 2023, 2:34 pm

≫ Next: How to assign the same variable value to the same individual in different observations

≪ Previous: What is wrong with my code by using "if" in foreach loop? Why "{ required" shows up?

Dears,

I have a dataset, includes variables serial, year, month, day, second, and price. I want to calculate the 5 minute frequency, using last tick interpolation and construct 5 minutes log-returns. Below is a sample of the data. Could you aid me to get the code?

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input byte serial int year byte(month day) int second float price
1 2005 1 3   2   121.56
1 2005 1 3   4   121.56
1 2005 1 3  34   121.56
1 2005 1 3  35  121.535
1 2005 1 3  80   121.51
1 2005 1 3  83   121.52
1 2005 1 3  91   121.52
1 2005 1 3  96   121.54
1 2005 1 3 112   121.55
1 2005 1 3 114   121.55
1 2005 1 3 147   121.56
1 2005 1 3 162   121.55
1 2005 1 3 172   121.54
1 2005 1 3 180   121.55
1 2005 1 3 182   121.56
1 2005 1 3 185   121.55
1 2005 1 3 209   121.59
1 2005 1 3 223   121.59
1 2005 1 3 224   121.59
1 2005 1 3 250   121.61
1 2005 1 3 251   121.62
1 2005 1 3 254   121.63
1 2005 1 3 262   121.64
1 2005 1 3 263   121.64
1 2005 1 3 264   121.64
1 2005 1 3 266   121.65
1 2005 1 3 282   121.62
1 2005 1 3 283   121.62
1 2005 1 3 285    121.6
1 2005 1 3 290   121.62
1 2005 1 3 296   121.62
1 2005 1 3 297   121.64
1 2005 1 3 304   121.63
1 2005 1 3 314   121.62
1 2005 1 3 331   121.66
1 2005 1 3 332   121.66
1 2005 1 3 333   121.66
1 2005 1 3 334   121.67
1 2005 1 3 335   121.65
1 2005 1 3 336   121.65
1 2005 1 3 337   121.68
1 2005 1 3 363   121.67
1 2005 1 3 365 121.6702
1 2005 1 3 366   121.65
1 2005 1 3 404   121.65
1 2005 1 3 410   121.67
1 2005 1 3 411    121.7
1 2005 1 3 412    121.7
1 2005 1 3 413   121.71
1 2005 1 3 415   121.71
1 2005 1 3 416   121.71
1 2005 1 3 418   121.71
1 2005 1 3 419   121.71
1 2005 1 3 421   121.71
1 2005 1 3 426   121.72
1 2005 1 3 446   121.71
1 2005 1 3 457   121.71
1 2005 1 3 471   121.71
1 2005 1 3 473   121.71
1 2005 1 3 476   121.71
1 2005 1 3 484   121.72
1 2005 1 3 499   121.72
1 2005 1 3 521   121.72
1 2005 1 3 531   121.71
1 2005 1 3 532   121.71
1 2005 1 3 534    121.7
1 2005 1 3 573   121.72
1 2005 1 3 574   121.72
1 2005 1 3 588   121.71
1 2005 1 3 597   121.69
1 2005 1 3 682   121.65
1 2005 1 3 694   121.65
1 2005 1 3 710   121.67
1 2005 1 3 712   121.67
1 2005 1 3 713   121.67
1 2005 1 3 739   121.69
1 2005 1 3 744   121.69
1 2005 1 3 745   121.67
1 2005 1 3 746   121.66
1 2005 1 3 777   121.71
1 2005 1 3 781   121.66
1 2005 1 3 801   121.65
1 2005 1 3 811   121.65
1 2005 1 3 812  121.655
1 2005 1 3 815   121.64
1 2005 1 3 816   121.68
1 2005 1 3 825   121.65
1 2005 1 3 842   121.62
1 2005 1 3 844   121.62
1 2005 1 3 849   121.64
1 2005 1 3 851   121.66
1 2005 1 3 857   121.66
1 2005 1 3 858   121.67
1 2005 1 3 864   121.66
1 2005 1 3 868   121.64
1 2005 1 3 896   121.65
1 2005 1 3 923   121.66
1 2005 1 3 961   121.66
1 2005 1 3 964   121.66
1 2005 1 3 994   121.66
end

↧

How to assign the same variable value to the same individual in different observations

October 28, 2023, 2:54 pm

≫ Next: xline shall only cover a certain range in event plot

≪ Previous: Calculating 5 minute frequency stock price using last tick interpolation

Hello. The following "dataset 1" is the dataset I am using. I want to replace year_withdrawal with 2007 in observation 1, replace year_sign with 1993 in observation 2, and replace withdrawalnumber with 243 in obseravtion 1. The final goal is change the "dataset 1" to the "dataset 2"

dataset1
clear
input str5(ISO1 ISO2) float year str6 number int base_treaty float(year_sign year_withdrawal withdrawalnumber)
"AGO" "BDI" 1993 "243" 243 1993 . .
"AGO" "BDI" 2007 "243_4" 243 . 2007 243
"AGO" "COD" 1993 "243" 243 1993 . .
"AGO" "COD" 2007 "243_4" 243 . 2007 243
"AGO" "COM" 1993 "243" 243 1993 . .
"AGO" "COM" 2007 "243_4" 243 . 2007 243
"AGO" "DJI" 1993 "243" 243 1993 . .
"AGO" "DJI" 2007 "243_4" 243 . 2007 243
"AGO" "EGY" 1998 "243+1" 243 1998 . .
"AGO" "EGY" 2007 "243_4" 243 . 2007 243
end

dataset2
clear
input str5(ISO1 ISO2) float year str6 number int base_treaty float(year_sign year_withdrawal withdrawalnumber)
"AGO" "BDI" 1993 "243" 243 1993 2007 243
"AGO" "BDI" 2007 "243_4" 243 1993 2007 243
"AGO" "COD" 1993 "243" 243 1993 2007 243
"AGO" "COD" 2007 "243_4" 243 1993 2007 243
"AGO" "COM" 1993 "243" 243 1993 2007 243
"AGO" "COM" 2007 "243_4" 243 1993 2007 243
"AGO" "DJI" 1993 "243" 243 1993 2007 243
"AGO" "DJI" 2007 "243_4" 243 1993 2007 243
"AGO" "EGY" 1998 "243+1" 243 1998 2007 243
"AGO" "EGY" 2007 "243_4" 243 1998 2007 243
end

ISO1: country name
ISO2:country name
number: different versions of base_treaty
base_treaty: number of treaty
year_sign: the year when ISO1 and ISO2 signed a treaty
year_withdrawal:the year when ISO1 and ISO2 withdrawal a treaty

↧

xline shall only cover a certain range in event plot

October 28, 2023, 3:54 pm

≫ Next: reshaping data

≪ Previous: How to assign the same variable value to the same individual in different observations

Dear community,

I recently started exploring the fascinating package did_imputation by Borusyak et al. (2021) for a diff-in-diff imputation design (here is a reference). Now, I want to add a horizontal line to my event plot that only starts at the treatment time. Unfortunately, I haven't found a way to implement this, yet. I checked yline but this covers the whole range of the plot. I also checked addplot but this does not seem to be allowed when using the event_plot function. I also tried this option using pipes and scatteri but these commands also seem not to be allowed.
I come from R and I am quite unsure why there isn't an easy option to add a custom line/plot to an existing one. Can someone please help me?

I am using the example code by Borusyak et al. (2021):

Code:

// from https://github.com/borusyak/did_imputation/blob/main/five_estimators_example.do

clear all
timer clear
set seed 10
global T = 15
global I = 300

set obs `=$I*$T'
gen i = int((_n-1)/$T )+1                     // unit id
gen t = mod((_n-1),$T )+1                    // calendar period
tsset i t

// Randomly generate treatment rollout years uniformly across Ei=10..16 (note that periods t>=16 would not be useful since all units are treated by then)
gen Ei = ceil(runiform()*7)+$T -6 if t==1    // year when unit is first treated
bys i (t): replace Ei = Ei[1]
gen K = t-Ei                                 // "relative time", i.e. the number periods since treated (could be missing if never-treated)
gen D = K>=0 & Ei!=.                         // treatment indicator

// Generate the outcome with parallel trends and heterogeneous treatment effects
gen tau = cond(D==1, (t-12.5), 0)             // heterogeneous treatment effects (in this case vary over calendar periods)
gen eps = rnormal()                            // error term
gen Y = i + 3*t + tau*D + eps                 // the outcome (FEs play no role since all methods control for them)
//save five_estimators_data, replace

// Estimation with did_imputation of Borusyak et al. (2021)
did_imputation Y i t Ei,allhorizons pretrend(5) shift(1) autosample
ereturn list
event_plot, default_look graph_opt(xtitle("Periods since the event") ytitle("Average causal effect") ///
    title("Borusyak et al. (2021) imputation estimator") xlabel(-5(1)5))

estimates store bjs // storing the estimates for later

This is what I get:
Array

And this is what I want to have:

Array

Thank you in advance!

↧