Quantcast
Channel: Statalist
Viewing all articles
Browse latest Browse all 65044

Creating group ID for observed and non-observed groups (egen group)

$
0
0
I am using Stata 13 and working with two datasets, one of which contains tax information and the other does not. For the dataset without tax information, I will assign tax information to the observations using group averages. To do this, I would like to first create group averages in dataset 1 and in dataset 2. The characteristics I use to create group averages (simplified example) are: marital status (married), gender (sex) and number of children (children). I used the egen:

egen g=group(sex married children)

Say this command generates 22 different groups, numbered 1-22, one for each uniquely observed combination of these three characteristics. Two combinations of these variables are not occupied in dataset 1, as there are 24 possible combinations of sex, married, and number of children. Then I create group averages of tax rates ("taxmed" below) for these 22 observed combinations in dataset 1:

egen taxmed = median(tax) if !missing(tax), by(g)

After creating groups for dataset 2, I merge the datasets by group and assign the groups in dataset 2 the tax rate from dataset 1. Here is the problem: There might also be 22 uniquely observed combinations of the characteristics in dataset 2, but they might not be the SAME 22. For example, group 20 in dataset 1 might be married, male with 3 children and in dataset 2, group 20 is married, male with 1 child and I do not want to assign these groups to the same tax group. The group command simply numbers the unique combinations per dataset and I am looking for a command or other method that will create a group number for every possible combination rather than every observed combination of these variables. Otherwise, the code becomes very long if I do this by hand:

gen g=.
replace g=1 if sex==0 & married==0 & children==0
replace g=2 if sex==0 & married==0 & children==1
replace g=3 if sex==0 & married==0 & children==2
replace g=4 if sex==0 & married==1 & children==0
... etc for every possible combination.

Is there a shorter version that is less prone to mistakes/accidental omission of a possible combination?

Thank you in advance for taking the time to read my question and for any advice you can give!

Best regards,
Cortnie

Viewing all articles
Browse latest Browse all 65044

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>