Quantcast
Channel: Statalist
Viewing all articles
Browse latest Browse all 65033

I have a 2 id panel data set and I want to fill down/expand observations with respect to a time variable.

$
0
0
Hello everyone,

I encounter a problem and couldn’t find the solution from the past threads, so I decided to post a new thread seeking for advice.

I have a panel data set regarding the holding information of an institutional investor at a certain time point. The sample period spans from 2002Q1 to 2007Q2. For each stock held by investor i, I want to fill in the time gaps. Secondly, if the last period of a stock is not 2007Q2, then I want to expand one extra period for that stock held by investor i. The variables used are: manager number (mgrno), CUSIP, date, and shares (shares held at time t).

For example:
mgrno cusip date shares
110 00184A10 2002m3 49825
110 00184A10 2002m6 56325
110 00184A10 2002m12 56625
110 00184A10 2003m3 56625
110 00206R10 2005m12 28111
110 00206R10 2006m3 27711
110 00206R10 2006m12 17691
110 00206R10 2007m3 23423
500 26101810 2003m6 158060
500 26101810 2003m9 57760
500 26101810 2003m12 18710
500 26101810 2004m3 18310
500 26101810 2004m6 21210
500 26157010 2007m3 3700
500 26157010 2007m6 3700
For stock 00184A10 held by investor 110, the holding period begins from 2002m3 to 2003m3. I want to fill the time gap between 2002m6 and 2002m12, which is 2002m9. Also, I want to add an extra period after 2003m3, since it doesn’t meet the limitation of the sample period.

The expected result (partial) will be:
mgrno cusip date shares
110 00184A10 2002m3 49825
110 00184A10 2002m6 56325
110 00184A10 2002m9 0
110 00184A10 2002m12 56625
110 00184A10 2003m3 56625
110 00184A10 2003m6 0
The second example:
500 26157010 2007m3 3700
500 26157010 2007m6 3700
Since there is no time gap between the 2 observations and the last period is 2007m6, there is no need to do anything to this stock held by investor 500.

I have tried the tsfill command but I couldn’t define the dataset as a penal dataset. The reason is that at time t, stock x can be held by numerous investors. There are several observations for a certain stock at time t. It is required that mgrno and cusip are combined to generate a composite categorical variable in order to uniquely identify an observation. I also tried the command: egen both = group(mgrno cusip), label. However, there are too many observations in my dataset (13,148,727 observations), so the software couldn’t generate the result I want. I have already searched for potential materials for a while, but still didn’t find useful resources perhaps due to my capability. I hope someone can generously offer some suggestions to my problem. Thank you.

Kind regards,

Chihhao

References
1. tsfill
2. composite categorical variables

Viewing all articles
Browse latest Browse all 65033

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>