Quantcast
Channel: Statalist
Viewing all articles
Browse latest Browse all 65574

a complex matching problem

$
0
0
Hi all, I got a problem and stuck there for quite a while.
I have two data sets: case and control. they have exactly same variables but different firms. there are few things I need to do .

1, match the case firms with control firms by industrycode2 and IR in range 0.9-1.1 of IR of case firm for a given year.
2, once matching complete, I need to compute the median of the matched firms of the case firm for the given year and the year after.
a little example,
data of the case data set
Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input float year str9 firmid float(industrycode2 priva_year) double IR
1998 "101101628" 3300 2003   .5110084414482117
1999 "101101628" 3300 2003   .3630363345146179
2000 "101101628" 3300 2003  .38560914993286133
2001 "101101628" 3300 2003  .33924877643585205
2002 "101101628" 3300 2003  .33550626039505005
2003 "101101628" 3300 2003  .20422077178955078
2004 "101101628" 3300 2003  .11267662048339844
2005 "101101628" 3300 2003  .09437507390975952
2006 "101101628" 3300 2003  .08572352677583695
2007 "101101628" 3300 2003  .11519040167331696
1998 "101105573" 1300 1999  .07120344787836075
1999 "101105573" 1300 1999  .10620743781328201
2000 "101105573" 1300 1999  .16636085510253906
2001 "101105573" 1300 1999  .07258064299821854
2002 "101105573" 1300 1999 .021276595070958138
1999 "101113645" 4000 2002  .34291747212409973
2002 "101113645" 4000 2002   .7364082932472229
2003 "101113645" 4000 2002   .9800625443458557
2004 "101113645" 4000 2002   .8390606045722961
2005 "101113645" 4000 2002  .37066158652305603
2006 "101113645" 4000 2002  .31881824135780334
2007 "101113645" 4000 2002   .4274105727672577
end
data of the control data set
Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input float year str9 firmid float industrycode2 double IR
1998 "160236012" 3300  .1932249516248703
1999 "160236012" 3300 .24407915771007538
2000 "160236012" 3300 .18691548705101013
2001 "160236012" 3300  .1931752860546112
2002 "160236012" 3300 .13328635692596436
2003 "160236012" 3300  .1913904845714569
2004 "160236012" 3300 .16301876306533813
2005 "160236012" 3300 .15310858190059662
2006 "160236012" 3300 .14023752510547638
2007 "160236012" 3300  .1261952966451645
1998 "180963807" 3300  .2148050218820572
1999 "180963807" 3300 .24316874146461487
2000 "180963807" 3300  2.407111167907715
2001 "180963807" 3300 .34913963079452515
2003 "180963807" 3300 .19960474967956543
2004 "180963807" 3300   .110478475689888
2002 "715910825" 3300  .1362578570842743
2003 "715910825" 3300 .19080890715122223
2004 "715910825" 3300 .14886681735515594
2005 "715910825" 3300 .08703862130641937
2007 "715910825" 3300 .27025702595710754
end
then I run the code below to complete the step 1
Code:
clear
use "E:\case.dta"
by firmid: keep if year ==priva_year
gen lower_IR = IR*0.9
gen upper_IR = IR*1.1
// matching
rangejoin IR lower_IR upper_IR using "E:\control.dta", by(industrycode2) all // industry and IR match
bys firmid: keep if year ==year_U
save "E:\Research\privatization\match.dta",replace
and then the example of matching result.
Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input float year str9 firmid float(industrycode2 priva_year) double IR str9 firmid_U float year_U double IR_U
2003 "101101628" 3300 2003 .20422077178955078 "715910825" 2003 .19080890715122223
2003 "101101628" 3300 2003 .20422077178955078 "160236012" 2003  .1913904845714569
2003 "101101628" 3300 2003 .20422077178955078 "180963807" 2003 .19960474967956543
end
now I need to proceed to step 2, which means to compute the median of IR of firm "715910825" , "160236012" , "180963807" (which are matched with firm "101101628") ot only in year 2003 but also 2004 (one year later of the priva_year)
and tried merge, but it won't work since there are multiple case firms matched by one control firm , and if the observations of the control firm are less than the number of matched case firms , there will be some kind of mess, cause some of the matching connection will no longer exist, thus I can't compute the median based on this method.

thanks in advance for any help suggestion.

Viewing all articles
Browse latest Browse all 65574

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>