Quantcast
Channel: Statalist
Viewing all articles
Browse latest Browse all 65068

Identifying duplicates which are not system duplicates

$
0
0
Hello

In my dataset, I have, let's say:
ID Name Birthday
9559 Jose Tadeu Silva 1960-08-25
9560 José Tadeu Silva 1960-08-25
9561 Maria dos Santos 1960-08-25
What is the best way to identify duplicates in my dataset?

If I try to use duplicates using Name as a variable, it won't show because you may have (or not) that "é" in one of them or double space between names (just like in the table) and if we try to use Birthday, we will have false duplicates
I tried editing the duplicates .ado file by adding a variable that may help in the "duplicates list" but it won't help at all. Any ideas?

edit: the double space is not shown here, but suppose it exists

Thanks,
George

Viewing all articles
Browse latest Browse all 65068

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>