Hello,
I am trying to run a loop that does the following:
Within each year, each observation (lookuprow) will look through all the other rows and find the observation (matchedrow) that best matches (lowest difference in terms average absolute percentages) lookuprows's revenue and assets, with the limitation that the match must be within 1% for each variable. The primarykey of matchedrow would be imputed into a column (matchedkey) for lookuprow if lookuprow's primarykey is non-missing. If matchedrow's primary key is also missing, then the next closest matches' primarykey will be imputed, and this process would continue until lookuprow's primarykey is filled or until the 1% limitation for either variable is surpassed, whichever is occurs.
To summarize, sorting is by Year and a search is performed for each row within Year based on the combined asset and revenue criteria. Once the closest matched is obtained, the primary key of the matchedrow is inputted into matchedkey for the lookuprow, unless both lookuprow's primary key and matched row's primary keys are missing. If this is the case, the process continues until the asset and revenue limit is surpassed.
So basically, I have five variables (Primarykey Year Revenue Assets and Matchedkey) and I am trying to figure out a loop that would perform the process suggested in the title. Matchedkey serves as the only output of the process. Any help would be appreciated.
Thanks,
Michael
I am trying to run a loop that does the following:
Within each year, each observation (lookuprow) will look through all the other rows and find the observation (matchedrow) that best matches (lowest difference in terms average absolute percentages) lookuprows's revenue and assets, with the limitation that the match must be within 1% for each variable. The primarykey of matchedrow would be imputed into a column (matchedkey) for lookuprow if lookuprow's primarykey is non-missing. If matchedrow's primary key is also missing, then the next closest matches' primarykey will be imputed, and this process would continue until lookuprow's primarykey is filled or until the 1% limitation for either variable is surpassed, whichever is occurs.
To summarize, sorting is by Year and a search is performed for each row within Year based on the combined asset and revenue criteria. Once the closest matched is obtained, the primary key of the matchedrow is inputted into matchedkey for the lookuprow, unless both lookuprow's primary key and matched row's primary keys are missing. If this is the case, the process continues until the asset and revenue limit is surpassed.
So basically, I have five variables (Primarykey Year Revenue Assets and Matchedkey) and I am trying to figure out a loop that would perform the process suggested in the title. Matchedkey serves as the only output of the process. Any help would be appreciated.
Thanks,
Michael