Hello everyone,
I encounter a problem and couldn’t find the solution from the past threads, so I decided to post a new thread seeking for advice.
I have a panel data set regarding the holding information of an institutional investor at a certain time point. The sample period spans from 2002Q1 to 2007Q2. For each stock held by investor i, I want to fill in the time gaps. Secondly, if the last period of a stock is not 2007Q2, then I want to expand one extra period for that stock held by investor i. The variables used are: manager number (mgrno), CUSIP, date, and shares (shares held at time t).
For example:
For stock 00184A10 held by investor 110, the holding period begins from 2002m3 to 2003m3. I want to fill the time gap between 2002m6 and 2002m12, which is 2002m9. Also, I want to add an extra period after 2003m3, since it doesn’t meet the limitation of the sample period.
The expected result (partial) will be:
The second example:
Since there is no time gap between the 2 observations and the last period is 2007m6, there is no need to do anything to this stock held by investor 500.
I have tried the tsfill command but I couldn’t define the dataset as a penal dataset. The reason is that at time t, stock x can be held by numerous investors. There are several observations for a certain stock at time t. It is required that mgrno and cusip are combined to generate a composite categorical variable in order to uniquely identify an observation. I also tried the command: egen both = group(mgrno cusip), label. However, there are too many observations in my dataset (13,148,727 observations), so the software couldn’t generate the result I want. I have already searched for potential materials for a while, but still didn’t find useful resources perhaps due to my capability. I hope someone can generously offer some suggestions to my problem. Thank you.
Kind regards,
Chihhao
References
1. tsfill
2. composite categorical variables
I encounter a problem and couldn’t find the solution from the past threads, so I decided to post a new thread seeking for advice.
I have a panel data set regarding the holding information of an institutional investor at a certain time point. The sample period spans from 2002Q1 to 2007Q2. For each stock held by investor i, I want to fill in the time gaps. Secondly, if the last period of a stock is not 2007Q2, then I want to expand one extra period for that stock held by investor i. The variables used are: manager number (mgrno), CUSIP, date, and shares (shares held at time t).
For example:
mgrno | cusip | date | shares |
110 | 00184A10 | 2002m3 | 49825 |
110 | 00184A10 | 2002m6 | 56325 |
110 | 00184A10 | 2002m12 | 56625 |
110 | 00184A10 | 2003m3 | 56625 |
110 | 00206R10 | 2005m12 | 28111 |
110 | 00206R10 | 2006m3 | 27711 |
110 | 00206R10 | 2006m12 | 17691 |
110 | 00206R10 | 2007m3 | 23423 |
500 | 26101810 | 2003m6 | 158060 |
500 | 26101810 | 2003m9 | 57760 |
500 | 26101810 | 2003m12 | 18710 |
500 | 26101810 | 2004m3 | 18310 |
500 | 26101810 | 2004m6 | 21210 |
500 | 26157010 | 2007m3 | 3700 |
500 | 26157010 | 2007m6 | 3700 |
The expected result (partial) will be:
mgrno | cusip | date | shares |
110 | 00184A10 | 2002m3 | 49825 |
110 | 00184A10 | 2002m6 | 56325 |
110 | 00184A10 | 2002m9 | 0 |
110 | 00184A10 | 2002m12 | 56625 |
110 | 00184A10 | 2003m3 | 56625 |
110 | 00184A10 | 2003m6 | 0 |
500 | 26157010 | 2007m3 | 3700 |
500 | 26157010 | 2007m6 | 3700 |
I have tried the tsfill command but I couldn’t define the dataset as a penal dataset. The reason is that at time t, stock x can be held by numerous investors. There are several observations for a certain stock at time t. It is required that mgrno and cusip are combined to generate a composite categorical variable in order to uniquely identify an observation. I also tried the command: egen both = group(mgrno cusip), label. However, there are too many observations in my dataset (13,148,727 observations), so the software couldn’t generate the result I want. I have already searched for potential materials for a while, but still didn’t find useful resources perhaps due to my capability. I hope someone can generously offer some suggestions to my problem. Thank you.
Kind regards,
Chihhao
References
1. tsfill
2. composite categorical variables