Hello
In my dataset, I have, let's say:
What is the best way to identify duplicates in my dataset?
If I try to use duplicates using Name as a variable, it won't show because you may have (or not) that "é" in one of them or double space between names (just like in the table) and if we try to use Birthday, we will have false duplicates
I tried editing the duplicates .ado file by adding a variable that may help in the "duplicates list" but it won't help at all. Any ideas?
edit: the double space is not shown here, but suppose it exists
Thanks,
George
In my dataset, I have, let's say:
ID | Name | Birthday |
9559 | Jose Tadeu Silva | 1960-08-25 |
9560 | José Tadeu Silva | 1960-08-25 |
9561 | Maria dos Santos | 1960-08-25 |
If I try to use duplicates using Name as a variable, it won't show because you may have (or not) that "é" in one of them or double space between names (just like in the table) and if we try to use Birthday, we will have false duplicates
I tried editing the duplicates .ado file by adding a variable that may help in the "duplicates list" but it won't help at all. Any ideas?
edit: the double space is not shown here, but suppose it exists
Thanks,
George