Dear Stata users,
I am using Stata 16 on Windows 10 and I'm working on a quarterly dataset of over 10,000 companies.
Code:
xtset
panel variable: gvkey (unbalanced)
time variable: fyearq_, 1996q2 to 2008q2, but with gaps
delta: 1 quarter
I was looking at a variable for the
average assets of a company at a given quarter of a year and I noticed something strange. For my work the variable has to be created like this:
'
Average assets = ((Total assets) + (lagged Total assets)) / 2 '. The strange thing that occured is that the variable "Average assets" differs if I use
l1.[Total assets] instead of a
previously generated variable for "lagged Total assets". I provide sample data and the code I used. I will explain at the end why I didn't create new variable names that are straightforward.
Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input double gvkey float fyearq_ double atq2 float(t2_atq_L1 t2_avg_assets t3_avg_assets)
1004 146 449.645 . . .
1004 147 468.55 449.645 459.0975 459.0975
1004 148 523.852 468.55 496.201 496.201
1004 149 529.584 523.852 526.718 526.718
1004 150 542.819 529.584 536.2015 536.2015
1004 151 587.136 542.819 564.9775 564.9775
1004 152 662.345 587.136 624.7405 624.7405
1004 153 670.559 662.345 666.452 666.452
1004 154 707.695 670.559 689.127 689.127
1004 155 737.416 707.695 722.5555 722.5555
1004 156 708.218 737.416 722.817 722.817
1004 157 726.63 708.218 717.424 717.424
1004 158 718.913 726.63 722.7715 722.7715
1004 159 747.043 718.913 732.978 732.978
1004 160 753.755 747.043 750.399 750.399
1004 161 740.998 753.755 747.3765 747.3765
1004 162 747.543 740.998 744.2705 744.2705
1004 163 772.941 747.543 760.242 760.242
1004 164 754.718 772.941 763.8295 763.8295
1004 165 701.854 754.718 728.286 728.286
1004 166 758.503 701.854 730.1785 730.1785
1004 167 714.208 758.503 736.3555 736.3555
1004 168 690.681 714.208 702.4445 702.4445
1004 169 710.199 690.681 700.44 700.44
1004 170 722.944 710.199 716.5715 716.5715
1004 171 727.776 722.944 725.36 725.36
1004 172 723.019 727.776 725.3975 725.3975
1004 173 686.621 723.019 704.82 704.82
1004 174 676.345 686.621 681.483 681.483
1004 175 666.178 676.345 671.2615 671.2615
end
format %tq fyearq_
Now to
really explain the issue, here is the code I used and the output. The variable for "
Total assets" is
atq2
Code:
gen t2_avg_assets=((atq2)+(l1.atq2))/2
(15,545 missing values generated)
. gen t2_atq_L1 = l1.atq2
(14,933 missing values generated)
. gen t3_avg_assets=((atq2)+(t2_atq_L1))/2
(15,545 missing values generated)
. * t2_avg_assets and t3_avg_assets should be same, but they aren't:
. compare t2_avg_assets t3_avg_assets
---------- difference ----------
count minimum average maximum
------------------------------------------------------------------------
t2_avg_~s<t3_avg_~s 14814 -.0078125 -.0000578 -2.33e-10
t2_avg_~s=t3_avg_~s 217381
t2_avg_~s>t3_avg_~s 14735 2.33e-10 .0000563 .0039063
----------
jointly defined 246930 -.0078125 -1.06e-07 .0039063
jointly missing 15545
----------
total 262475
At first I create the '
Average assets' variable
by using the lag operator L. Then I create a
one-lagged variable for atq2 by using the lag operator L. Then I create again a 'Average assets' variable but instead of using the lag operator L I am
using the lagged variable for which I used the lag operator L. To me the variables created should be identical but using the compare command shows that they aren't.
So my question is: How are these two 'Average assets' variables not identical?
In preparation for this post I created variables with easier to understand names. But by doing this another question emerged.
Code:
gen assetstotalqtly = atq2
(741 missing values generated)
. gen assetstotalqtly_L1 = l1.assetstotalqtly
(14,933 missing values generated)
. gen averageassets = ((assetstotalqtly)+(assetstotalqtly_L1))/2
(15,545 missing values generated)
. gen test_averageassets = ((assetstotalqtly)+(l1.assetstotalqtly))/2
(15,545 missing values generated)
. compare averageassets test_averageassets
---------- difference ----------
count minimum average maximum
------------------------------------------------------------------------
average~s=test_av~s 246930
----------
jointly defined 246930 0 0 0
jointly missing 15545
----------
total 262475
compare assetstotalqtly atq2
---------- difference ----------
count minimum average maximum
------------------------------------------------------------------------
assetst~y<atq2 123741 -.00625 -.0000155 -5.96e-11
assetst~y=atq2 12729
assetst~y>atq2 125264 2.61e-11 .0000156 .00625
----------
jointly defined 261734 -.00625 1.46e-07 .00625
jointly missing 741
----------
total 262475
How are assetstotalqtly and atq2 not identical when I created the first by telling Stata it is equal to the latter? And why doesn't the issue described above occure?
I hope I described everything well enough, if not feel free to let me know. Thank you in advance!