Testing for structural break for time series data

April 20, 2020, 9:12 pm

≫ Next: Replacing duplicate values with missing values on a variable with regards to other variable

≪ Previous: extracting unique identifiers from a string variable

Hello everyone,

I have a time series data where I regress air transport traffic against GDP, oil price, and FX rates. I have observations for each variable from 2000 to 2019. I wanted to know if there was a structural break in the time series for GDP and undertook the following command

Code:

regress historicalpassengers gdp, vce(robust)

where I obtained the following output:

Code:

 Linear regression

Number of obs =
20


F(1, 18) =
1742.79


Prob > F =
0.0000


R-squared =
0.9836


Root MSE =
3.4e+06





Robust

historical~s Coef.
Std. Err.
t
P>t [95% Conf.
Interval]




gdp 379.8068
9.097881
41.75
0.000 360.6928
398.9207

_cons -2.81e+07
2504966
-11.23
0.000 -3.34e+07
-2.29e+07

I then ran

Code:

estat sbsingle

and got the following:

Code:

. estat sbsingle
insufficient observations at the specified trim level
r(198);

Wondering if what I did was right? Thank you.

↧

Replacing duplicate values with missing values on a variable with regards to other variable

April 20, 2020, 9:48 pm

≫ Next: Difference-in-difference research design and multi-year lead/lag time

≪ Previous: Testing for structural break for time series data

Hi, I'm currently needing help to replace duplicate entry on a variable as missing (with a dot) instead of dropping them altogether. I am currently using Stata 14. The data set looks as follows:

Code:

ID variable1 variable2
001 . .
001 . .
001 . .
002 500 .
002 500 .
002 500 .
003 453 148
003 453 .
003 453 .
004 529 .
004 529 156
004 529 .
005 514 .
005 514 .
005 514 163
006 453 .
006 453 148
006 453 .

In this case, there are observations where the variable1 are duplicates within the same ID number, while none of the variable2 is duplicate (all of those values are generated through mean command, so there would be the same value on different ID). Is it possible to fill other than the first entry on variable 1 as missing, instead of dropping them? Dropping duplicate observation on variable1 other than the first observation would risk dropping non-missing variable2. I am looking for ways so that my data can look like this

Code:

ID variable1 variable2
001 . .
001 . .
001 . .
002 500 .
002 . .
002 . .
003 453 148
003 . .
003 . .
004 529 .
004 . 156
004 . .
005 514 .
005 . .
005 . 163
006 453 .
006 . 148
006 . .

As a note, it would be better if there can be a conditional on variable 2 so that instead of filling-as-missing other than the first observation on each ID, it would keep variable1 values where there is a corresponding non-missing observation on variable 2 while coding-as-missing other observation, but this is not a strict requirement.

Code:

ID variable1 variable2
001 . .
001 . .
001 . .
002 500 .
002 . .
002 . .
003 453 148
003 . .
003 . .
004 . .
004 529 156
004 . .
005 . .
005 . .
005 514 163
006 . .
006 453 148
006 . .

↧

Difference-in-difference research design and multi-year lead/lag time

April 20, 2020, 10:29 pm

≫ Next: gsem converges with 1 or no latent variable, but not 2

≪ Previous: Replacing duplicate values with missing values on a variable with regards to other variable

Hi all!

I have a panel data (gvkey - firm id and fyear- time). I create a Law indicator for states that adopted a certain law at different times (staggered adoption). It is a dummy taking the value of one after the firm (gvkey) adopts this law during my sample period, zero otherwise. Here is the model I use: reghdfe Dependent LawDummy Controls, absorb() cluster()

Since the dependent variable is constructed using data over the past several years (the estimation period spanning the previous five years might hinder correct inferences from the comparison between the pre- and post- periods), I want to use the following two methods to make enough lead-lag time for my difference-in-differences research design.

First approach: I want to create a dummy variable, one for lead1 lead2 lead3 lead4, and 0 for lag1 lag2 lag3 lag4, with no missing observations within these 8 years. So that I can compare the differences between lead1/2/3/4 vs lag 1/2/3/4.

Another approach to address this concern is to exclude observations from the four years immediately following the Law adoption year.

Anyone knows how to write the code for these analysis? And which approach is better? Thank you in advance for your help!!

↧

gsem converges with 1 or no latent variable, but not 2

April 20, 2020, 11:45 pm

≫ Next: Difference between reghdfe vs xtreg vs cluster2 (in terms of r-squared)

≪ Previous: Difference-in-difference research design and multi-year lead/lag time

I encounter convergence problem with -gsem- when running two-stage probit regressions with unobservables.

To simplify, I fit a basic model without unobserved heterogeneity as the following, which converges without problem:

Code:

gsem (y <- x, probit) (lagged_y <- z, probit)

As I add a common random effects for both regressions, the model still converges:

Code:

gsem (y <- x M[id], probit) (lagged_y <- z M[id], probit)

However, when I specify that I want unique random effects for each regression (because I need to test the correlation between unobservables across the regressions), the model no longer converges, even when I specify other estimation options such as -difficult-, -dnumerical-, -iterate(#)-, startgrid().

Code:

gsem (y <- x M1[id], probit) (lagged_y <- z M2[id], probit)

I wonder why would adding one more latent variable make the convergence impossible, given that with one or zero latent variable the model converges perfectly well? Also, is there other ways to test the correlation between unobservables across the regressions in other ways? Thank you for your advice.

↧

Difference between reghdfe vs xtreg vs cluster2 (in terms of r-squared)

April 21, 2020, 12:22 am

≫ Next: IOFT in STATA (z-scores for BMI)

≪ Previous: gsem converges with 1 or no latent variable, but not 2

Dear all,

First of all, I apologize if my question duplicates with some other questions.
I've seen some posts and answers about the difference between reghdfe vs xtreg ,fe but have a specific question in terms of r-squared.
Whenever I post a question here, I usually upload an example data using dataex but I though this question is more about underlying mechanism/process of each command so I skip using dataex.

My question is, is it natural to get different r-squared between (reghdfe vs cluster2) vs xtreg?
Specifically, I have a country level yearly data set.
I ran:

Code:

reghdfe DV IVs, absorb(country year) vce(cluster country year)
cluster2 DV IVs country_dummy* year_dummy*, fcluster(country) tcluster(year)
xtreg DV IVs i.year, fe

In the above simplified code,
reghdfe has country and year FE with country and year clustered s.e. adjusted
cluster2 has country and year FE with country and year clustered s.e. adjusted (I used cluster2.ado from Prof.Mitchell Pertersen's Programming Advice website)
xtreg has only country and year FE

When I see the results, reghdfe and cluster2 give me the same r-squared which is around 0.95 whereas xtreg gives me 0.67

To sum up,
1.Is it natural to have too high r-squared in some cases? From my perspectives, too high r-squared seems unrealistic.
2.Do xtreg (with 2 way FEs) and cluster2(or reghdfe) generate different r-squared?

I appreciate for all of your comments in advance.

↧

IOFT in STATA (z-scores for BMI)

April 21, 2020, 1:23 am

≫ Next: Propensity score matching

≪ Previous: Difference between reghdfe vs xtreg vs cluster2 (in terms of r-squared)

Dear Everyone,

I using the Zanthro()-function in STATA to BMI z-scores from WHO, CDC etc. I also want to calculate the IOFT z-scores, however I can't find this in the Zanthro()-packkage. Can anyone of you help me?

I've used this command to calculate the z-scores for WHO: egen zbmi_1yFU = zanthro(bmi_1yFU, ba, WHO), xvar(age1yFU_yr) gen(sex) genc(male=1 female=2), which works. So, basically I'm just searching for the command for IOTF.

I hope you can help.

Many thanks in advance,

Best wishes,
Johan

↧

Propensity score matching

April 21, 2020, 1:46 am

≫ Next: Survival Data declaration

≪ Previous: IOFT in STATA (z-scores for BMI)

Dear Stata users,

I am trying to balance the differences in duration of diabetes within each age of diagnosis group (matching set within each age of diagnosis group by duration of diabetes of 2 yrs) so as to allow for mean comparison (annual change in x variable) of these groups. I have looked at the Stata manuals; but not sure how to construct the analysis. I would really appreciate if anyone may guide/help me.

Data looks as below:

PHP Code:


 [CODE]
* Example generated by -dataex-. To install: ssc install dataex
clear
input long ID float diabetes_duration byte gender double age float(age_ds_gr change_x) byte dup
1 26 0 60.8 1 -1.6655158 0
2 14 1   55 2  .19934218 0
3  1 1 57.3 3          . 2
5  6 0 47.1 2 -.29966295 0
6  5 1 56.7 3  -5.175216 0
7 11 0   60 2   .2080694 0
8  6 0   55 2   -.058939 0
9 15 0 69.9 3 -1.2625084 0
10 22 0 70.4 2 -1.0328121 0
11 14 1 52.7 1  -.7431965 0
12 17 1 72.2 3 -2.0354166 0
13  8 0   61 3  -.3058444 0
18 28 0 59.8 1  2.2740536 0
19 17 1 64.3 2   .9247662 0
26 21 0 56.2 1 -2.9573734 0
27 25 1 74.1 2  -.9741772 0

end
[/CODE]

Thank you very much

Oyun

↧

Survival Data declaration

April 21, 2020, 2:22 am

≫ Next: xtlogit model not running

≪ Previous: Propensity score matching

Hi,

I got a quick question on setting survival data using -st set- command. I have survival data set that observes individuals from age of 30 to a max age of 50. During this period a person can be promoted once (variable promo). This variable take 1 if a person gets promoted.

i used the following command initially to set the survival data-

command 1 - stset age, f(promo) id(id)

By default stata sets the origin time to be 0.

Since, i don't observe people before they are 30, I changed the st set command to the following:

command 2- stset age, id(staff_id) failure(promo) origin(time 30)

I compared the failure function using -sts list- after each different setting and it was quite different.

So my question is:
1. Why setting the origin significantly change the failure function?
2. I tried changing the earliest time of entry using en(time 30), while not changing the origin time it still produced the same failure functions as command 1. Why does stata still assume that origin time is zero?

Thanks alot.
Danula

↧

xtlogit model not running

April 21, 2020, 4:58 am

≫ Next: fama french portfolio sorting

≪ Previous: Survival Data declaration

Hi there,

I am attempting to run an xtlogit model for panel data. There are 8,937 observations with a binary dependent variable (0,1) where 0 has n=802 participants and 1 has n=114 participants. There are 6 independent variables, one of which is an integer (gender; coded as 0,1) and the rest are continuous. My outcome of interest is standardised test z-scores (std_Comm_) over 3 years.

My script is as follows:

xtlogit var1 c.age_ i.gender c.birthweight c.familyincome_ c.maternalhighestqualification c.std_Comm_, i(id_new) or

When I run the model, I get this error message:

Fitting comparison model:

Iteration 0: log likelihood = -1371.7743
Iteration 1: log likelihood = -1326.8958
Iteration 2: log likelihood = -1325.9652
Iteration 3: log likelihood = -1325.9647
Iteration 4: log likelihood = -1325.9647

Fitting full model:

tau = 0.0 log likelihood = -1325.9647
tau = 0.1 log likelihood = -1287.0885
tau = 0.2 log likelihood = -1245.3778
tau = 0.3 log likelihood = -1200.9184
tau = 0.4 log likelihood = -1153.703
tau = 0.5 log likelihood = -1103.5098
tau = 0.6 log likelihood = -1049.7339
tau = 0.7 log likelihood = -991.08284
tau = 0.8 log likelihood = -925.17559

Iteration 0: log likelihood = -991.07902
cannot compute an improvement -- discontinuous region encountered
r(430);

.I have tried the following approaches to get the model to run:

The assists
Checked the gradient of the variables
Check the collinearity metrics
Tried a number of maximization algorithms (all coming back with the same answer).
Changed the tolerance levels for acceptable levels of concave.
Edited the var_1 variable type.
Tried a one at a time analysis for each predictor.
Changed the z scores to t scores (small values appear to affect location of the maximum likelihood but I have never had this issue)

When I run the model with both logit and xtprobit I can get results, however I would like to use xtlogit in-line with my other analyses,

Any advice would be appreciated,
Eleanor

↧

fama french portfolio sorting

April 21, 2020, 5:02 am

≫ Next: Wishlist for asdocx (A more versatile and flexible version of asdoc)

≪ Previous: xtlogit model not running

Dear Statalist,

I am new to Stata and trying to make portfolio sorting , but I do still not know how to do it exactly. I have a panel data with the ID of the company (code) and the year (year), size (size) and a book to market (bvmv).

What I want to do is the following:
1. at the end of june each yea Sort stocks into two size groups (Big stocks are those in the top 90% of size, and small stocks are those in the bottom 10%)

2. Sort stocks into three book to market (bvmv) (breakpoints are 30th and 70th percentiles)

3. Sort stocks further with the independent 2x3 sorts on size and book to market produce six portfolios, SG, SN, SV, BG, BN, and BV, where S and B indicate small or big and G, N, and V indicate growth (low bvmv), neutral, and value (high bvmv).

* Example generated by -dataex-. To install: ssc install dataex
clear
input str48 name str12 isincode float(month yearnew size bvmv)
"WINGTECH TECHNOLOGY 'A'" "CNE000000M72" 9 1999 .006493479 .01559449
"SANXIANG IMPRESSION 'A'" "CNE000000T00" 9 1999 .009729746 .0126169
"TONGHUA DONGBAO PHARM. 'A'" "CNE000000H87" 9 1999 .01910826 .009465886
"CHONGQING WANLI NEW EN. 'A'" "CNE000000G96" 9 1999 .013541578 .01363718
"SHANGHAI HUAYI GP.'A'" "CNE0000006G6" 9 1999 .02795306 .013451143
"SHANGHAI LAIYIFEN 'A'" "CNE100002GH3" 9 1999 0 .013634946
"HUADIAN ENERGY 'A'" "CNE000000KX7" 9 1999 .023364574 .01195663
"SICHUAN MINJIANG 'A'" "CNE000000VJ3" 9 1999 .00168346 .01300116
"SHANGHAI WORTH GDN.'A'" "CNE1000021G3" 9 1999 .02755903 .02190554
"FAW CAR 'A'" "CNE000000R85" 9 1999 .0041666296 .01226845
"CITIC GUOAN INFO.IND.'A'" "CNE000000TD0" 9 1999 .00763358 .016245203
"SHN.GT.OCEAN SHIP.'B'" "CNE000001ML6" 9 1999 .012832209 .01527738
"SHANGHAI MALING AQUARIUS 'A'" "CNE000000RS2" 9 1999 .008752727 .014097102
"SHAI.FUDAN FWD.S & T 'A'" "CNE0000006S1" 9 1999 .0136752 .02004609
"SHN.YAN TIAN POR.HDG.'A'" "CNE000000SF7" 9 1999 .05673762 .015145889
"KUNWU JIUDING INVESTMENT HOLDINGS 'A'" "CNE000000PP2" 9 1999 .005128241 .012104352
"SHENZHEN AIRPORT 'A'" "CNE000000VK1" 9 1999 .020979 .012811908
"CHINA SECURITY 'A'" "CNE0000001Y0" 9 1999 .013761492 .01303691
"SHENZHEN HOPEWIND ELEC. 'A'" "CNE100002WM0" 9 1999 .0034188 .012733237
"SHANXI GUOXIN ENERGY 'B'" "CNE000000BM9" 9 1999 .035714254 .017803922
"JIANGSU WUZHONG INDL.'A'" "CNE000000YP4" 9 1999 0 .011289109
"QINGDAO HISENSE ELECTRONICS 'A'" "CNE000000PF3" 9 1999 .008896747 .011306063
"FOUNDER TECH.GP. 'A'" "CNE0000001S2" 9 1999 .004405282 .019085057
"CPT TECH.(GROUP) 'A'" "CNE0000002D2" 9 1999 .05197505 .014956822
"LAO FENG XIANG 'B'" "CNE0000004K3" 9 1999 0 .014583333
"TIBET RHDPHAR.HLDG. 'A'" "CNE000000ZW7" 9 1999 .005360342 .01412058
"GUI ZHOU TYRE 'A'" "CNE000000JH2" 9 1999 .012121201 .021861605
"ZHEJIANG JUHUA 'A'" "CNE000000WQ6" 9 1999 .008230445 .015338186
"SHN.CHINA BICYCLE 'A'" "CNE0000002Q4" 9 1999 .01271859 .013213408
"ANHUI GUOFENG PLSTC.'A'" "CNE000000XF7" 9 1999 .019607875 .01240107
"JIANGSU CHINESE ONLINE LOGISTICS 'A'" "CNE000000065" 9 1999 .003726675 .01076432
"SHANGHAI BELLING 'A'" "CNE000000XB6" 9 1999 .005630652 .015540547
"GAC CHANGFENG MOTOR 'A'" "CNE000001J76" 9 1999 .07482991 .02165474
"SHAI.CHLOR-ALKALI CHM. 'B'" "CNE0000004C0" 9 1999 0 .02034723
"TIANJIN BENEFO TEJING ELECTRIC 'A'" "CNE000001832" 9 1999 .015834361 .014274321
"ZHONGTIAN FINL.GP.'A'" "CNE000000FL2" 9 1999 0 .01415284
"BOHAI LEASING 'A'" "CNE0000009B1" 9 1999 0 .015557416
"XIAMEN XIANGYU 'A'" "CNE000000QN5" 9 1999 .04761907 .014939445
"CHANGHONG MEILING 'A'" "CNE000000BT4" 9 1999 .02234635 .014281092
"CHIN.REFORM HLTH. MAN.&. SSGP.'A'" "CNE000000255" 9 1999 .008316001 .010069065
"FUJIAN YONGAN FOREST.'A'" "CNE000000CS4" 9 1999 .01220936 .012546004
"SHENZHEN KAIFA TECH.'A'" "CNE000000FK4" 9 1999 0 .015677534
"CHONGQING SANXIA PS. 'A'" "CNE000000305" 9 1999 .0272277 .0143041
"PANDA FINL.HDG.'A'" "CNE0000018S6" 9 1999 .004366808 .014253124
"TONGHUA GOLDEN-HORSE PHARM.IND.'A'" "CNE000000735" 9 1999 .01974617 .012933527
"YANZHOU COAL MINING 'A'" "CNE000000WV6" 9 1999 .02170278 .013380827
"CSSC OFFS.& MAR.ENGR.GP. 'A'" "CNE000000BP2" 9 1999 .015520995 .013881355
"NANFANG BLACK SESAME GROUP 'A'" "CNE000000909" 9 1999 .007886466 .017195478
"TIANJIN TEDA 'A'" "CNE0000005D5" 9 1999 .005540161 .012039186
"JOINTO ENERGY INV. 'A'" "CNE000000FT5" 9 1999 .0044576833 .015465764
"SHENZHEN SEA STAR TECH. 'A'" "CNE1000000L7" 9 1999 .0010111454 .013600056
"CASIN REAL ESTATE DEVELOPMENT GROUP 'A'" "CNE0000007R1" 9 1999 .0151515 .010764802
"SHANGHAI HUAYI GROUP 'B'" "CNE0000004L1" 9 1999 0 .013507895
"ZANGGE HOLDING 'A'" "CNE000000L08" 9 1999 .08087092 .017323121
"TIBET AIM PHARM.'A'" "CNE100002C39" 9 1999 .001335144 .006744788
"SHANDONG ALUMINIUM IND. 'A'" "CNE000000ZJ4" 9 1999 .006265648 .009445054
"ZHANGZIDAO GROUP 'A'" "CNE000001NR1" 9 1999 .012106468 .0101811
"SHANGHAI LINGANG HOLDINGS 'B'" "CNE000000GW7" 9 1999 .11111108 .01986111
"HACI 'A'" "CNE000001MM4" 9 1999 .029411836 .01318302
"JIANGMEN SUG.CANE CHM. FAC.(GP.) 'A'" "CNE0000005H6" 9 1999 .016229726 .01377474
"GUANGZHOU GUANGRI STOCK 'A'" "CNE000000JS9" 9 1999 .009174303 .012333922
"MAANSHAN IRON & STL. 'A'" "CNE000000DD4" 9 1999 .02017289 .010490164
"HEBEI JINNIU CHM.IND.'A'" "CNE000000KR9" 9 1999 .02731511 .013249496
"CHINA WU YI 'A'" "CNE000000SD2" 9 1999 .003584226 .012237124
"SHAI.JINJIANG INTL. TRAVEL 'B'" "CNE000000HF0" 9 1999 .008148975 .013311288
"CHGC.DEPT.JITUAN SOE.'A'" "CNE000000GD7" 9 1999 .005633797 .01470124
"CHINA FANGDA GROUP 'B'" "CNE000000JD1" 9 1999 .007692301 .02074768
"HENAN SHUANGHUI INV.& DEV.'A'" "CNE000000XM3" 9 1999 .004694943 .011183902
"SHN.ZHONGJIN LINGNAN NONFEMET 'A'" "CNE000000FS7" 9 1999 .0079365 .012563717
"SHENYANG HUITIA THERMAL PWR.'A'" "CNE0000007K6" 9 1999 .01674279 .015495613
"CHANGCHAI 'A'" "CNE000000GT3" 9 1999 .01992033 .011439384
"SHANGHAI LINGANG HOLDINGS 'A'" "CNE000000C74" 9 1999 .02264802 .017265145
"SHANDONG INTCO MEDICAL PRODUCTS 'A'" "CNE100003456" 9 1999 .004769509 .010715269
"SHANDONG JINJIANG SCI.& CH.'A'" "CNE000001C57" 9 1999 .02159825 .010865006
"GZH.DEV.GPIN.'A'" "CNE000000SB6" 9 1999 .011464927 .012776517
"HENAN YINGE INDL.INV. 'A'" "CNE000000PT4" 9 1999 0 .01135509
"SHAI.YOUNG SUN INV. 'B'" "CNE000000J02" 9 1999 0 .018862223
"MAOYE COMMERCIAL 'A'" "CNE000000FJ6" 9 1999 .01759536 .01677732
"SHANDONG TONGDA NEW MATERIALS 'A'" "CNE100001DL4" 9 1999 .009302316 .014205392
"SHANGHAI COOLTECH POWER 'A'" "CNE100000YD9" 9 1999 .010899172 .01524961
"WUXI LITTLE SWAN 'A'" "CN:WSP" 9 1999 .011857696 .014462519
"LIAOHE JINMA OILFIELD 'A'" "CN0009139006" 9 1999 .03846153 .016373795
"ZHONGSHAN PUB.UTILS.GP. 'A'" "CNE0000006B7" 9 1999 .003891047 .019764414
"WUHAN ZHONGYUAN HUADIAN SCTC.'A'" "CNE100000GP0" 9 1999 .05038755 .012336186
"ANSHAN NO.1 CON.MACH.'A'" "CNE000001J43" 9 1999 .00468859 .016678596
"BEIJING ELECTRONIC ZONE INV.& DEV.'A'" "CNE000000974" 9 1999 .012970156 .016052367
"TIANMA MICROELS.'A'" "CNE000000HT1" 9 1999 .01070668 .011334972
"SICHUAN HUATI LIGHTING TECH.'A'" "CNE100002WT5" 9 1999 .0019493623 .017990904
"AEROSPACE HI-TECH HLDG. GP. 'A'" "CNE000000Y86" 9 1999 0 .011800613
"LANZHOU HUANGHE ENTER. 'A'" "CNE000000ZD7" 9 1999 .00491644 .011046248
"XIAMEN ITG GROUP 'A'" "CNE000000MN4" 9 1999 .03773593 .01435472
"CHINA NAT.ACCORD MDC.'B'" "CNE0000009M8" 9 1999 0 .022249887
"HUAYI COMPR. 'A'" "CNE000000KM0" 9 1999 .006968579 .012515024
"GZH.PER.RVR.IND.DEV.'A'" "CNE000000BN7" 9 1999 0 .01151179
"VANFUND URB.INVDV. 'A'" "CNE0000008Y5" 9 1999 .006067984 .006347805
"SHANDONG SWAN CTN.ILMH. STK.'A'" "CNE100002748" 9 1999 .024590205 .015767446
"QINGHAI HUZHU BARLEY WINE 'A'" "CNE1000019X2" 9 1999 .007911422 .016542897
"QINHUANGDAO PORT 'A'" "CNE100002QX9" 9 1999 0 .021875
"PETROLEUM LONG CHAMP 'A'" "CNE000000N06" 9 1999 .006034539 .015018919
"FUJIAN FURI ELTN. 'A'" "CNE000000Z36" 9 1999 .016726445 .012993315
end
[/CODE]

Any help would be appreciated !

Thanks in advance,

↧

Wishlist for asdocx (A more versatile and flexible version of asdoc)

April 21, 2020, 5:42 am

≫ Next: Saving Rsquared, t-statistic, and other results in regression loop

≪ Previous: fama french portfolio sorting

I am developing a new version of asdoc that will be more versatile and flexible in terms of creating tables and exporting these tables to MS Word, Excel, LaTeX, and some other formats. I call this new version as asdocx. I have already received valuable input from the asdoc community. I hope to receive some more input here. I would appreciate it if you can list items that are in your WishList for asdocx.

↧

Saving Rsquared, t-statistic, and other results in regression loop

April 21, 2020, 5:59 am

≫ Next: Is it normal to have a negative pseudo R-Squared (McFadden’s pseudo R-squared) in tobit regression?

≪ Previous: Wishlist for asdocx (A more versatile and flexible version of asdoc)

Hello Statalist community,

I just started using Stata recently as I need to do some statistical analysis for my thesis project. I researched my question but couldn't quite find an answer to the problem I am facing, I hope somebody here can help me.
I am working with a dataset consisting of 5357 mutual funds (identified by wficn) with monthly return observations (mret_Rf) ranging between 12/1961 to 12/2019- to prevent survivorship bias also dead funds are included as well as later invented funds, hence, not all funds show observations for the entire period. Furthermore, each observation shows the value of the variables Mkt_RF SMB HML, which are the dependent variables in the regression analysis.
I want to run a regression as a loop for each of the funds and save the results for every regression for each fund including:

-Number of obs
- The Rsquared value
- Prob > F
- The coefficients
- The Std. Err.
- The t-statistic
- P>|t|

I have written following loop to run the regression in question, unfortunately this code only saves the coefficients (as shown below), which is good but I also need the other values.
I would be really grateful if somebody could help me out here- or indicate if what I am intending to do is simply not possible.

Thank you very much for your help in advance!

Best,
Lennart

Code used:

Code:

tempfile all_results
statsby, by (wficn) saving (`all_results'): reg mret_Rf Mkt_RF SMB HML
use`all_results'

Results of this code (just some of them as example):

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input double wficn float(_b_Mkt_RF _b_SMB _b_HML _b_cons)
100001  .009468214  -.0018029775   .0006833521   -.0012695126
100003  .010128003    .003318744   .0012590317  -.00022445657
100004  .010225002  -.0010207522   -.002516641    -.001314697
100009  .010970508    .004605711  -.0035189984    .0004628504
100010  .008183149   -.001309696     .00228036    -.001595757
100012  .009820702   -.002918876   .0018024922    .0015416763
100016  .009509544  -.0014319367   -.002432401   -.0007382804
100017  .010104744   -.001968148   .0030095594   -.0021064063
100019  .009635613    .005213343   .0009328977    .0008780413
100030  .011012082    .002016491   -.003541881   -.0021347988

↧

Is it normal to have a negative pseudo R-Squared (McFadden’s pseudo R-squared) in tobit regression?

April 21, 2020, 6:30 am

≫ Next: how to draw a graph of distribution when x axis is continuous varible while y axis is categorical value

≪ Previous: Saving Rsquared, t-statistic, and other results in regression loop

Dear stata users,

First of all: thanks for reading my question.
I have run a tobit model and found a negative pseudo R-squared value (-0,342).
Is this normal and how do I need to interpret this?

Kind regards

↧

how to draw a graph of distribution when x axis is continuous varible while y axis is categorical value

April 21, 2020, 7:04 am

≫ Next: Panel data xtreg question

≪ Previous: Is it normal to have a negative pseudo R-Squared (McFadden’s pseudo R-squared) in tobit regression?

Dear experts
I am new to this forum and I want to thank you in advance for any help.
Varible1 is categorical of yes or no, varible2 is continuous variable with 10000 values. I want to know the estimated rate of yes in each value of variable2 and make the curve smooth. I use "twoway fpfitci varible1 varible2" and it doesn' work.
Thank you
Kai-Lun

↧

Panel data xtreg question

April 21, 2020, 7:43 am

≫ Next: Change in a variable's value, for the same individual, across years of data

≪ Previous: how to draw a graph of distribution when x axis is continuous varible while y axis is categorical value

Hello Teachers and professors
I have a question
I download a lot of data form datastream
after organizing to panel data
it has a lot of missing data
when I use reg, it can run
but I use "xtreg", it can't run
stata always reply "no observations"

what can I do?

clear
input int time float id double(a1 a2 a3)
17203 1 . . .
17231 1 . . .
17262 1 . . .
17292 1 . . .
17323 1 . . .
17353 1 . . .
17384 1 . . .
17415 1 . . .
17445 1 . . .
17476 1 . . .
17506 1 . . .
17537 1 . . .
17568 1 . . .
17597 1 . . .
17628 1 . . .
17658 1 . . .
17689 1 . . .
17719 1 . . .
17750 1 . . .
17781 1 . . .
17811 1 . . .
17842 1 . . .
17872 1 . . .
17903 1 . . .
17934 1 . . .
17962 1 . . .
17993 1 . . .
18023 1 . . .
18054 1 . . .
18084 1 . . .
18115 1 . . .
18146 1 . . .
18176 1 . . .
18207 1 . . .
18237 1 . . .
18268 1 . . .
18299 1 . . .
18327 1 . . .
18358 1 . . .
18388 1 . . .
18419 1 . . .
18449 1 . . .
18480 1 253.5 90.22 .
18511 1 281.9968 100.36 .
18541 1 312.8999 111.36 .
18572 1 414.5999 147.56 .
18602 1 375.5999 133.68 .
18633 1 355.5 126.52 .
18664 1 289.2 102.93 .
18692 1 239.7 85.31 .
18723 1 229.5 131.51 33.6
18753 1 210.6 120.68 30.8
18784 1 180.15 103.23 9.9
18814 1 158.7 90.94 8.7
18845 1 165.9 95.07 9.1
18876 1 173.4 99.37 9.5
18906 1 179.4 102.8 7.1
18937 1 209.7 120.17 8.2
18967 1 209.1 119.82 10.4
18998 1 209.829 171.67 10.4
19029 1 240.0001 196.35 11.9
19058 1 219 179.17 169.8
19089 1 229.2 204.04 177.7
19119 1 204.6 185.82 158.6
19150 1 135 122.61 104.7
19180 1 96.3 87.46 74.7
19211 1 87.6 79.56 .
19242 1 77.7 70.57 .
19272 1 81.3 73.84 .
19303 1 76.8 69.75 .
19333 1 86.4 78.47 .
19364 1 70.2 63.76 .
19395 1 45.6 41.42 .
19423 1 33.9 30.79 .
19454 1 28.491 25.88 .
19484 1 28.491 25.88 .
19515 1 28.491 25.88 .
19545 1 28.491 25.88 .
19576 1 28.491 25.88 .
19607 1 28.491 25.88 .
19637 1 28.491 42.55 .
19668 1 28.491 42.55 .
19698 1 28.491 42.55 .
19729 1 28.491 40.92 .
19760 1 28.491 40.92 .
19788 1 28.491 40.92 .
19819 1 28.491 40.92 .
19849 1 28.491 40.92 .
19880 1 7.5 8.88 .
19910 1 4.5 5.33 .
19941 1 3.9 4.62 .
19972 1 2.4 2.84 .
20002 1 2.1 2.49 .
20033 1 2.1 2.49 .
20063 1 1.2 1.42 .
20094 1 1.35 1.6 .
20125 1 .75 .89 .
20153 1 .9 1.07 .
20184 1 1.2 1.42 .
20214 1 .9 1.07 .
end
format %dCY/N/D time
xtset id time
xtreg a1 a2 a3, fe robust

↧

Change in a variable's value, for the same individual, across years of data

April 21, 2020, 7:48 am

≫ Next: Converting Date or birth to Age

≪ Previous: Panel data xtreg question

Hi

Apologies if this query is very simplistic. I am using UKHLS data on individuals to look at changes in wellbeing as a result of commuting duration. In order to do so I wish to obtain a sample from the dataset for whom the value of reported commute duration (jbttwt) changes across different waves (years) of the data by say 5 mins or more. I have the variables:
pidp: person identifier

jbttwt: commuting time in mins (note: in appending my data I removed the existing wave prefix on all variable names so there is now multiple recorded values of jbttwt for one pipd reflecting different waves)

wave: a variable (created during the appending) that captures which wave that row of data is from i.e. which year's jbttwt is referred to

What command(s) can I use to create a variable that captures each person's commute change between waves in order to select a sample of only those who experience a change of say 5 mins?

Many thanks for any help.

↧

Converting Date or birth to Age

April 21, 2020, 7:58 am

≫ Next: Matching subjects across multiple variables and within variable

≪ Previous: Change in a variable's value, for the same individual, across years of data

Hello,
It's first time I am using this forum. I have previously converted successfully date of birth (dob) to age; most of the time the dob was string. I have now a dob variable which is of the type "numeric daily date (int)", and the format is as follows 5/15/2009 (for May 15, 2009), 6/8/82 (for June 8, 1982), etc. I have tried to convert this dob to stata date (in string format) before converting to age, but am stuck. Grateful for your help. Thanks. Daya

↧

Matching subjects across multiple variables and within variable

April 21, 2020, 8:00 am

≫ Next: Multicollinearity in regression analysis

≪ Previous: Converting Date or birth to Age

I have a dataset where there are several variables that I'm being asked to match and come up with some kind of "highly probable duplicate" score. For example, I have name, age, and address variables. In a situation where the name and address match perfectly, but the age does not I would suspect that to be two different people. However, the age variables are within a year or maybe even matching, then I would assume then are the same person and flag one observation as a duplicate.

One difficultly I'm having is that the name variable is an agglutinated string with FIRST LAST MIDDLE SALUTATION, etc. all possibly crammed into one value. Some are as short as one "word", others as long as seven. It just depends on the data collector and the respondent.

To add to this, the order is not set. So some subjects list FIRST LAST while others list LAST FIRST. It's this problem that I want to try to tackle first.

Does anyone have any suggestions on how to go about this? My instinct is to create two new variables for the first and second words in the name blank, but then I'm unclear how to run any kind of -duplicates- function on the variables such that it would criss-cross the and flag the people who appear to be the same.

Name1	Name2	Duplicate
Abe	Lincoln	Yes
Ada	Lovelace	Yes
Lincoln	Abe	Yes
Earheart	Amelia	Yes
Hamilton	Alexander	No
Earheart	Amelia	Yes
Amelia	Earheart	Yes

Secondly, when I get to the addresses, I'd like to develop some fuzzy match so that misspellings and other data entry variable can be sorted out.

Perhaps this kind of thing already exists, either in Stata or in another program? Any guidance or suggestions are most graciously welcome.

↧

Multicollinearity in regression analysis

April 21, 2020, 8:23 am

≫ Next: Regrouping variables

≪ Previous: Matching subjects across multiple variables and within variable

I need to test multicollinearity for my data set. I made the correlation matrix and I notice a correlation between some of my independent variables, but from the analysis of the VIF the result is that there is no multicollinearity (VIF=1.85)
I upload results of the correlation matrix and the Vif.
my interpretation is that there is no multicollinearity, because the correlations and the value of the Vif are low.
are there other tests to identify the presence of multicollinearity on STATA?
I read that eigenvalue analysis and standard error analysis might be useful, but I don't know how to interpret the results of these analyzes.
thank you for your attention and for your help to a STATATALIST neophyte

↧

Regrouping variables

April 21, 2020, 8:26 am

≫ Next: Centered VIF or uncentered VIF?

≪ Previous: Multicollinearity in regression analysis

Hello everyone !

We are working on a project on Stata where we have to work on this regression : ln(y) = β0 + β1X + β2 ln(population) + γ Geographie
Where "Geographie" contains those three variables : elevat_range_msa, ruggedness_msa and heating_dd which are on our Datalist.
How do I create Geographie ? Hope this question is not too elementary..

Thank you for your help !

↧