Quantcast
Channel: Statalist
Viewing all articles
Browse latest Browse all 65645

How do I merge data files with multiple identifiers that are different?

$
0
0
Dear Statalisters,

I am a new stata user currently working on a socioeconomic dataset of about 5,000 households, and need help to merge some of the data. I am trying to merge 5 files, each with two variables as unique identifiers, jointly. One of the two identifiers in each file (hhno – household id) is common across all the 5 files but the rest are different. The files and their identifiers are as follows:
No File Identifiers
1 Land data hhno plotno
2 Finance data hhno finc_no
3 Asset data hhno dasset
4 Tools data hhno ftool
5 Extension data hhno org_id
I am doing plot level analysis which duplicates hhno. Except m:m merge, the other more assured merge types predictably return the error “variable not found”, respectively referring to the data item identifiers. As the m:m merge output (matches) does not look convincing, I tried using –joinby– whose output does not look convincing either. What I always expect to match in some cases don’t. Next, I tried grouping the 2 unique identifiers in each file using ege, group() concatenation command but doubt if that is reasonable and/or appropriate for my purpose.

If I can be sure that using this -egen, group()- command to concatenate the pairs of each file into 1 unique id will be centered on the hhno, I can proceed with the merging, but I don’t get that impression in both reading and experimenting with that. What I want to do (merging) seems basic but I am stuck, and hope and look forward to your kind assistance guiding me on how to merge the data.

Thank you!
Francis

Viewing all articles
Browse latest Browse all 65645

Latest Images

Trending Articles



Latest Images

<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>