Confidence Interval after npregress does not include the observed margin

April 4, 2020, 2:44 pm

≫ Next: Help Creating Canonical RD plot using cmogram with Two Cutoffs

≪ Previous: Problem when estimating modified jones model

My issues is after running npregress and then getting the margins, at age 60 the confidence interval does not include the observed margin, and therefore when the marginsplot estimate is not included in the confidence interval. I am not sure how to interpret this or what might be that cause of this. Any help would be greatly appreciated.

Here is the code I use:

Code:

npregress sin_rhis age_male sin_houseval sin_income sin_wealth i.ruralurban i.year if age_male >= 55 & married == 1, vce(bootstrap, reps(20) seed(12345))

margins, at(age_male=(55(5)85)) vce(bootstrap, reps(20) seed(12345))
marginsplot

This produces the following output:

Code:

Bandwidth
--------------------------------------------------------
                     Mean         Effect
--------------------------------------------------------
age_male            3.582705      4.174322
sin_house~1         .3650728      .4253577
sin_income          .4923634      .473668
sin_wealth          1.893532      2.206214
ruralurban          .5            .5
year                .5            .5
-------------------------------------------------------

Local-linear regression                           Number of obs          =      1,930
Continuous kernel : epanechinikov                 E(Kernel obs)          =      1,067
Discrete kernel   : liracine                      R-squared              =      0.4303
Bandwidth         : cross validation
---------------------------------------------------------------------------------------------------------------------------
                                Observed      Bootstrap                                           Percentile
sin_his                         Estimate      Std. Err.         z             P>|z|          [95% Conf. Interval] 
----------------------------------------------------------------------------------------------------------------------------
Mean
sin_his                        4.170998       .091423         45.62           0.000          4.021538       4.39628
----------------------------------------------------------------------------------------------------------------------------
Effect
age_male                       .0031461       .0151447        0.21            0.835         -0.0161455     .043843   
sin_houseval                   .5560063       .2042004        2.72            0.006         .1498142       .955308
sin_income                     .9126953       .139012         6.57            0.000         .5755258       1.06808

ruralurban
(1 vs 0)                       .5643983       .2591127        2.18            0.029         .2289176       1.19650

year
(2006 vs 2004)                   .1407        .0932195        1.12            0.261        -.0276539       .28113
(2008 vs 2004)                .1373666        .1297748        1.06            0.290        -.0908567       .34794
(2010 vs 2004)                .1324337        .1848764        0.72            0.474        -.1323977      .444480
(2012 vs 2004)                .0155138        .2434178        0.06            0.949        -.2770047      .568149
(2014 vs 2004)                .0978844        .2750368        0.36            0.722        -.2280371      .715432
-----------------------------------------------------------------------------------------------------------------------------
Note: Effect estimates are average derivatives for continuous covariates and averages of
          contrasts for factor covariates


Predictive margins                                  Number of obs   =   1,930
                                                    Replications    =   20

Expression            :   mean function, predict()

1._at                 :    age_male = 55
2._at                 :    age_male = 60
3._at                 :    age_male = 65
4._at                 :    age_male = 70
5._at                 :    age_male = 75
6._at                 :    age_male = 80
7._at                 :    age_male = 85

------------------------------------------------------------------------------------------------------------
         Observed      Bootstrap                                           Percentile
         Estimate      Std. Err.         z             P>|z|          [95% Conf. Interval] 
------------------------------------------------------------------------------------------------------------
_at
1        2.571337      1.483309         1.73           0.083         -1.884745    4.613622
2        1.640986      .2610298         6.29           0.000          2.683658    3.85435
3        3.950541      .2305949        17.13           0.000          3.612271    5.578298
4        4.32619       .1695885        25.51           0.000          4.056366    4.799602
5        4.238917      .1592986        26.61           0.000          4.062757    4.607807
6        3.677414      .2698087        13.63           0.000          3.135992    4.281009
7        3.385679      .4588806         7.38           0.000          2.642715    4.627514
------------------------------------------------------------------------------------------------------------

And this is the marginsplot:

Array

↧

Help Creating Canonical RD plot using cmogram with Two Cutoffs

April 4, 2020, 3:43 pm

≫ Next: Does the hierarchical logistic model increase the validity of the pscore?

≪ Previous: Confidence Interval after npregress does not include the observed margin

Dear all,

I am using Stata 16, on mac. I used the Stata code: cmogram score demvoteshare if demvoteshare <1, cut(0.5) scatter line(0.5) lowess, which works fine except I have two cutoffs 0.5 and 0.6. Is there a way to incorporate both of these cutoffs or do I have to write a code for each cutoff seperately?

Thank you in advance for your help

Jason Browen

↧

Does the hierarchical logistic model increase the validity of the pscore?

April 4, 2020, 5:13 pm

≫ Next: marginal effect of firthlogit

≪ Previous: Help Creating Canonical RD plot using cmogram with Two Cutoffs

Hi! I have a question about predicting the propensity scores. Based on what I learnt, if the independent variable is a dichotomous variable, I should use a logit model to predict the pscore. I'm wondering if the dataset is nested, should I use the hierarchical logistic model to predict my pscore?

For example, the data is nested by schools. Which model should I use?

. logit w2seextrc $idlist ,or

. xtmelogit w2seextrc $idlist || schids: ,or var

Besides, can I output the PSM results by stata? Or I have to make the APA style tables by hand?

Your advice is highly appreciated!!

↧

marginal effect of firthlogit

April 4, 2020, 9:31 pm

≫ Next: Req.: Replication of hekmancopula for heckprobit stata command

≪ Previous: Does the hierarchical logistic model increase the validity of the pscore?

Dear Statalists,
I tried to calculate the marginal effect of firthlogit. I use "margins , expression(invlogit(predict(xb)))" following advice from #4 of this thread. (I may have understood it wrong.)

Among the independent variables, soe fie tradecom are binary, others are continuous. But when I tried to calculate the marginal effect. An error occurred and said: factor variables may not contain noninteger values. However, limpint is continous.
Does it mean that this way of calculation only suits factor variables? How can I calculate the marginal effect for all variables?
Below is the code and error. Any comments would be appreciated.
Many thanks,
K

Code:

 firthlogit odistarter     limpint soe fie tradecom process_share lexpint  
  
  initial:       penalized log likelihood = -49394.935
  rescale:       penalized log likelihood = -49394.935
  Iteration 0:   penalized log likelihood = -49394.935  
  Iteration 1:   penalized log likelihood = -47937.173  
  Iteration 2:   penalized log likelihood = -47677.303  
  Iteration 3:   penalized log likelihood = -47676.333  
  Iteration 4:   penalized log likelihood = -47676.333  
  
                                                  Number of obs     =  2,188,615
                                                  Wald chi2(6)      =    3255.91
  Penalized log likelihood = -47676.333           Prob > chi2       =     0.0000
  
  ------------------------------------------------------------------------------
    odistarter |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
  -------------+----------------------------------------------------------------
       limpint |   5.879801   .1970857    29.83   0.000      5.49352    6.266082
           soe |   .1002127   .0401549     2.50   0.013     .0215107    .1789148
           fie |  -1.241626   .0339263   -36.60   0.000    -1.308121   -1.175132
      tradecom |   -.560522   .0309983   -18.08   0.000    -.6212775   -.4997664
  process_sh~e |  -.2379967   .0389793    -6.11   0.000    -.3143946   -.1615987
       lexpint |   6.334411   .2019787    31.36   0.000      5.93854    6.730282
         _cons |  -6.197648    .027174  -228.07   0.000    -6.250908   -6.144388
  ------------------------------------------------------------------------------
  
  . margins limpint soe fie tradecom process_share lexpint, expression(invlogit(
  > predict(xb)))
  limpint:  factor variables may not contain noninteger values
  r(452);

↧

Req.: Replication of hekmancopula for heckprobit stata command

April 4, 2020, 11:39 pm

≫ Next: Granger-Causality test - Panel data - Anderson-Hsiao estimator

≪ Previous: marginal effect of firthlogit

Dear Stata Users,

Greetings! Recently, I came across a STATA command heckmancopula developed by Takuya Hasebe (Copula-Based Maximum-Likelihood Estimation of Sample-Selection Models, SJ, Vol.13 issue: 3, page(s): 547-573, https://journals.sagepub.com/doi/10....867X1301300307),. I am sure, it would help a lot of researchers in their related research work. I am also looking for its implementation in my current work and I can understand its working as compared to the traditional Heckman model (e.g., heckmancopula`y' `x1', select(`xs') is equivalent to heckman `y' `x1', select(`xs')).

However, I am curious whether this command can be used in place of heckprobit command with both selection and observation equations being a probit model?

It would be really nice if anybody can provide an answer to my above question.

Thank You.

Regards,
Aswini Mishra

↧

Granger-Causality test - Panel data - Anderson-Hsiao estimator

April 5, 2020, 12:18 am

≫ Next: Gravity model - fixed effects problems

≪ Previous: Req.: Replication of hekmancopula for heckprobit stata command

Hello

I would like to follow the Anderson-Hsiao framework for a panel data analysis. And I am having issues with the steps and coding involved.
I wish to perform a Granger causality test using the Anderson-HSIAO estimates as instruments. My dependent variable is "GDP", my explanatory variables are "pci" and "ForeigndirectinvestmentnetBo". My control variables are the following: "BroadmoneyofGDPFMLBLB" , "Popula" and " External".

I computed the Anderson-Hsiao 2SLS estimator including all of my variables:

Code:

 ivregress 2sls diff_GDP diff_pci diff_Foreigni diff_Broadmoney diff_Pop diff_Externald (L.D.GDP=L2.D.GDP),robust cluster(country)

I tested for Granger-causality using GDP and pci

Code:

pvar pci GDP, lags(5)
pvargranger

Code:

**Declare panel data
 egen country=group(Entity)
 list Entity (country) in 1/10, sepby(Entity)
 gen year = real(Year)
  xtset country
  xtset country year,yearly
 gen GDP = real(gdp) //destring gdp varaible

 gen  Popula =real(PopulationgrowthannualSP)
  gen  External =real(ExternaldebtstockstotalDOD)
 

**Index
 pca rescoups resmaxintensity resongoing resTerrorismfatalitiesGTD20 resconflict respostconflict
 rotate
 predict pci

 **includin zeros in missing cells
foreach x of varlist pci{
  replace `x' = 0 if(`x' == .)
}


** First differencing all columns
gen diff_pci = D.pci
gen diff_GDP = D.GDP
gen diff_Foreigni = D.ForeigndirectinvestmentnetBo
gen diff_Broadmoney= D.BroadmoneyofGDPFMLBLB
gen diff_Pop= D.Popula
gen diff_Externald = D.External

**Second lagged deifference of GDP as instrument
ivregress 2sls diff_GDP diff_pci diff_Foreigni diff_Broadmoney diff_Pop diff_Externald (L.D.GDP=L2.D.GDP),robust cluster(country)

Now, what I would like to do is to perform the Granger-causality test on GDP and pci, including my explanatory and control variables in test, and instrumenting with the Anderson-HSIAO 2SLS.estimator.

Can someone assist me with the code for the above?

My data:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(GDP pci) double ForeigndirectinvestmentnetBo float(Popula External) double BroadmoneyofGDPFMLBLB
  4.400002  -.4595704 -3942928.3  2.675145 2.72485e+10 72.796361
  .8000006  -.4595704  4353889.3 2.5664566 28153911296  61.77114
-1.2000005   1.466991   38651266  2.460316 28489965568  49.11131
 1.8000023   1.466991          . 2.3503273 27351246848 51.941995
-2.1000009   1.466991          . 2.2216294 26274664448 50.101458
 -.8999966   3.393549          . 2.0708578 30241927168 45.318672
  3.799995   3.393549          . 1.9098215 33051430912 37.169446
 4.0999985   3.393549          .  1.753176 33653880832 33.005836
 1.0999999  3.3935504          . 1.6152833 30902562816 36.081434
  5.100004  3.3935494          . 1.5008857 30692708352 42.376822
 3.2000015   3.393549          . 1.4162657 28209283072 42.208981
       3.8   1.466991          .  1.358408 25467371520 37.832606
         3   1.466991          . 1.3098426 22752567296 56.850473
       5.6   1.466991          . 1.2750655 23045429248 62.726626
       7.2   1.466991          . 1.2759147 23778635776 62.816935
       4.3   1.466991          . 1.3178462 22426644480 59.265476
       5.9   1.466991 -1.101e+09 1.3899006 17092404224 53.827594
       1.7   1.466991 -1.762e+09  1.471123  5910806016  57.28216
       3.4   1.466991 -1.540e+09 1.5513825  6134513152 64.092886
       2.4   1.466991 -2.321e+09 1.6361927  6246398976 62.986986
       1.6   1.466991 -2.533e+09 1.7220932  7420896256 73.158861
       3.6   1.466991 -2.081e+09 1.8050187  7260318208  69.05566
       2.9   1.466991 -2.037e+09 1.8833138  6064672256 68.060812
       3.4   1.466991 -1.542e+09 1.9514152  5515632128 67.953132
       2.8   1.466991 -1.964e+09 2.0027277  5245580800 71.729813
       3.8   1.466991 -1.521e+09 2.0335925  5521129472 79.309932
       3.7   1.466991  6.389e+08 2.0453873  4671369728 82.001074
       3.2   1.466991 -1.592e+09 2.0513546  5463160832 78.884977
 .04162146   3.393549 -2.000e+08  3.434427  7288778240         .
 -3.450099   3.393549  3.357e+08  3.378481  8591895552         .
  .9913593   1.466991 -6.645e+08  3.324456  9000344576         .
 -5.838281   3.393549 -2.880e+08  3.280312 10059207680         .
 -23.98342   3.393549 -3.021e+08  3.246642 10571384832         .
 1.3393635   3.393549 -1.703e+08 3.2261446 11292755968         .
        15   1.466991 -4.724e+08  3.216859 11502043136  36.48713
  13.54437   1.466991 -1.806e+08  3.214234 10545818624 27.577574
  7.274277   1.466991 -4.117e+08   3.21732  9948273664 22.615484
 4.6911464   3.393549 -1.114e+09 3.2289414 10784075776 24.437043
 2.1814897   3.393549 -2.471e+09  3.249247 10673242112 22.849355
  3.054624   3.393549 -8.786e+08 3.2772036  9763465216 17.280556
 4.2059984   3.393549 -2.145e+09  3.301198  8776916992 21.135714
 13.665687   1.466991 -1.643e+09  3.329257  9109536768 16.049883
   2.98985  -.4595705 -3.481e+09  3.378811  9099850752 13.508367
 10.952862   1.466991 -1.414e+09  3.453014  9786029056 13.553646
 15.028915  -.4595705  1.523e+09  3.537557 12223970304 13.299819
 11.547683  -.4595705  2.283e+08 3.6195745  9890493440 16.273326
 14.010018   1.466991  1.805e+09 3.6806355 11931686912 20.446871
  11.16614  -.4595705  8.907e+08  3.710531 15501571072 31.520504
  .8587126   1.466991 -2.199e+09  3.703878 20172423168  45.60817
   4.85922  -.4595705  4.568e+09  3.671462 26599589888 34.774751
  3.471981  -.4595705  5.116e+09 3.6341586 33964863488 34.978872
  8.542148  -.4595705  2.351e+09  3.597774 4.42591e+10 31.517732
   4.95459  -.4595705  8.042e+09 3.5519505 5.49465e+10 33.331549
  4.822626  -.4595705 -2.771e+09  3.497493 5.69355e+10 35.675553
  .9435756  -.4595705 -1.081e+10 3.4388506 56272310272  40.94466
 -2.580097  -.4595705  4.525e+08  3.378273 57167384576 39.162307
-2.8541605  -.4595704  -62096190 3.1007655  1070152448 21.740975
  8.976134  -.4595704  -62376777  3.235059  1119735936 25.115607
 4.2257996  -.4595704 -1.208e+08  3.377159  1149708032 26.467866
  2.957711 -.45957035  -77571727  3.479003  1198996352 29.622098
  5.836172  -.4595704 -1403787.2  3.500774  1270610560 26.490936
 2.0204005  -.4595704  -13648840  3.426061  1382104448 28.441371
  6.045198 -.45957035  -13329502  3.293614  1396512512 22.873627
  4.324284  -.4595704  -13595869  3.145324  1377641344  23.18018
  5.734688  -.4595704  -13876006 3.0283754  1411356800 22.128608
  3.961012  -.4595704  -32748440  2.958613  1433932288 19.507034
  5.341449  -.4595704  -37820420 2.9517546  1488489216 22.995933
  5.859992  -.4595704  -56153557  2.983995  1399684864  26.22559
  5.330411  -.4595704  -41553069   3.02378  1470316544 24.597384
   4.64447  -.4595704  -12158025  3.043083  1608563072 20.009685
  3.444045  -.4595704  -44416700  3.039675  1485978752 22.943317
  4.429629  -.4595704  -65154249  3.005342  1612808832 19.179256
 1.7115777  -.4595704  -53428819 2.9523835  1551746432 21.128573
  3.947014  -.4595704  -54929324  2.897545   648364672 25.330737
  5.986516  -.4595704 -2.613e+08 2.8540854   883287872 28.304637
  4.889899  -.4595704 -1.738e+08 2.8235555   973593408 31.798127
  2.329301  -.4595704 -1.031e+08  2.809302  1309266304  32.73525
 2.1101992  -.4595704 -1.947e+08 2.8060865  1581966848 34.882235
 2.9627504  -.4595704 -1.014e+08  2.803718  1851264384 35.722533
  4.816478  -.4595704 -2.412e+08  2.797649  1675225728 34.007497
  7.189716 -.45957035 -3.017e+08  2.790718  2003816448 36.709118
  6.351832  -.4595704 -3.879e+08 2.7821815  2041492864 40.952857
 2.0958083  -.4595704 -1.170e+08  2.771714  2178021376 42.541423
   3.96486  -.4595704 -1.143e+08 2.7614095  2316211200 41.105935
 13.059406  -.4595704  -42186013   3.63436   549840384 28.924689
  6.772822  -.4595704  -88526216  3.340334   552920448 21.920211
  7.458709  -.4595704   16719789 3.0285375   612559232 27.456387
 2.9170704  -.4595704   11423290  2.768483   606348608 28.332171
  1.916107  -.4595704  2.964e+08 2.5741794   663983872 21.037856
  3.627916  -.4595704   23615781 2.4672565   700397760 20.921851
   7.03041  -.4595704  -29507180  2.419145   717153216  20.47826
    5.8298  -.4595704  -72235497  2.391314   626530368 19.794538
  8.325891  -.4595704  -96006497 2.3444872   574727360 22.389681
  .4436635  -.4595704  -91820639 2.2719958   532169056 28.263056
  9.667241  -.4595704  -35161354  2.161983   510191232  28.50095
 1.9876958  -.4595704  -54918579 2.0328965   452434624 24.816209
 .25057387  -.4595704  3.490e+08 1.8893803   400352896 39.399228
  6.069531  -.4595704 -3.651e+08  1.773421   491350080 45.248542
  4.625895  -.4595704 -2.120e+08  1.730526   514126464 47.878314
  2.705822  -.4595704 -4.296e+08  1.778885   515796288 46.884142
end

↧

Gravity model - fixed effects problems

April 5, 2020, 2:14 am

≫ Next: Looping over cell-range with import excel

≪ Previous: Granger-Causality test - Panel data - Anderson-Hsiao estimator

Hello Stata users,

I'm using a gravity model in order to estimate the effects of being part of ASEAN by using a dummy on trade flows.

My basic regression which works perfectly is : reg lFlow lGDP_exp lGDP_imp lDistw Comlang Contig Curcol Evercol GATT ASEAN.

After that I want to include fixed effects by country pair and country-year.
First, for the country-pair fixed effects, my command is :
egen countrypair=group (Exporter Importer)
xtset countrypair year
xtreg lFlow lGDP_exp lGDP_imp lDistw Comlang Contig Curcol Evercol GATT ASEAN, fe which gives me a positive coefficient for ASEAN which is intuitively normal.

The problem is when I want to include fixed effects for country-years.
I have:
egen exporters_year = group(Exporter year)
egen importers_year = group(Importer year)
reghdfe lFlow lGDP_exp lGDP_imp lDistw Comlang Contig Curcol Evercol GATT ASEAN, absorb (exporters_year importers_year) and then I get a negative result for my dummy ASEAN. How is it possible that I have coefficients that have a different sign for the two fixed effects? Are my command incorrect? Or do I need to mix all fixed effects in one equation and not separate them? Because when I make the regression with all my fixed effects, the coefficient is also positive.

Thank you in advance for your help,

Ben.

↧

Looping over cell-range with import excel

April 5, 2020, 2:34 am

≫ Next: Fixed Effects with non-stationary variables and panel data

≪ Previous: Gravity model - fixed effects problems

Dear All,

This is my first post on the forum.

My task is to import into stata a number of excel files, clean them, merge them and make them ready for regression analysis.
For each file, I need to retrieve data from 3 separate specific tables (out of many tables in the file) and merge them together.

However, I have the problem that the cell-range changes, both the starting and ending cell, for each excel file. The coloumn's should be constant from table to table or at least within each table so I can already write them in the respective cell-ranges, but the starting and ending row are unknown and change for each table.

The common characteristic of where the starting cell is for the 3 tables across the excel files is 5 rows below : "Table 1" ,"Table 4", "Table M1". And they all end on the closest row to the start of the range with the phrase "Average of forecasts"

What the code should look like is something like this:

1) local directory over the files in the folder
2) for each file, import it into excel, read the rows and (maybe define a local) with the specific row for each of the 3 tables, given the common criteria listed above (i have tried with substr)
3) import excel for each of the cellranges separately (as for each table I need to manipulate it separately and then merge them together)

local myfilelist : dir "/Users/adrianomariani/Desktop/Research Assistant scheme/Forecasts excel" files "*.xlsx"

foreach filename of local myfilelist {

import excel using `filename', clear

gen start1=_n+5 if substr(A,1,7)=="Table 1"
gen end1=_n-1 if substr(A,1,7)=="Average" & _n<(start1+80) // i am using start2+80 as there are many averages, and I want the first one after 'start1', similarly for the others
gen start2=_n+5 if substr(A,1,7)=="Table 4"
gen end2=_n-1 if substr(A,1,7)=="Average" & _n<(start2+80)
gen start3=_n+5 if substr(A,1,8)=="Table M1"
gen end3=_n-1 if substr(A,1,7)=="Average" & _n<(start3+80)

local table1s dis(start1)
local table1e dis(min(end1))
local table4s dis(start2)
local table4e dis(min(end2))
local tableM1s dis(start3)
local tableM1e dis(min(end3))

import excel using `filename', cellrange(A`table1s':D`table1e') clear

......

import excel using `filename', cellrange(A`table4s':D`table4e') clear

.....

import excel using `filename', cellrange(A`tableM1s':H`tableM1e') clear //the coloumns 'A,H' in this case and 'A,D' in the other two are constant, just the rows are of interest

.....

}

When I try to run this (even without the loop over cell-files and hence on just one file) it returns the problem that cellrange is out of range, so there must be a problem with how I have defined it.

Thanks in advance for the help

↧

Fixed Effects with non-stationary variables and panel data

April 5, 2020, 2:37 am

≫ Next: Data input: Please help!

≪ Previous: Looping over cell-range with import excel

Hello,

I have a technical question surrounding a fixed effects model using the xtreg (variables) ,fe vce(robust) command and the presence of non-stationary variables in panel data. In the model I am using log variables and I have detected the presence of a unit root using the commands xtunitroot ht, xtunitroot llc and found evidence to suggest the variables are non-stationary.

What would you suggest in order to resolve this issue? Normally I would first difference the variables and test to confirm the transformed variables are now stationary. Would you still run a fe regression with first differenced variables?

P.S new to the site

Thank you for your help.

↧

Data input: Please help!

April 5, 2020, 5:27 am

≫ Next: Which method do I need for my research project?

≪ Previous: Fixed Effects with non-stationary variables and panel data

How do I answer this questions using stata.

1. Describe your sample i.e. provide sample characteristics, sample size etc
2. Description of the type of data used (e.g. nominal, scale range of indicators)

↧

Which method do I need for my research project?

April 5, 2020, 5:55 am

≫ Next: Generate variables with ratios of subgroup values

≪ Previous: Data input: Please help!

Hi guys,

currently I'm working on a research project... or at least I try to find out if it is suitable. That's why I want to "play around" a bit in Stata and look, if my data is suitable. So my question to you is, what statistical method do I need for my plan?

There is a questionnaire which asks different people every year about their socioeconomic backgrounds, beliefs, their position on some topics. Lets say, in 2012, people who think that the government is doing a great job are (on average) 36 years old, middle-class, educated. Now, I want to compare how the average people was (who liked the government) in 2012 and 2016. You know, maybe due to some reasons, in those four years the government did not so well and now the average changed. Now in 2016 the average person who likes the job of the government is 45 years old, upper-class and better educated.

That means, I need to find out the "average person" who likes the government for 2012 and 2016. Can anyone tell me what this kind of method is called, and is carried out in stata?

PS: it is just an example, so I'm not going to do a research about how likes the government or not. I just wanted to explain my point in a simple way.

Thanks

↧

Generate variables with ratios of subgroup values

April 5, 2020, 6:21 am

≫ Next: RDD rdrobust problem

≪ Previous: Which method do I need for my research project?

Hello! I got stuck solving a problem which must be very simple to address, apologies if I am overlooking something very basic!

I am trying to generate variables that present the ratio of subgroup means for different years.

The dataset is structured as follows: I have observations for several years, and five income groups - income quintiles (incq) - for each year. For each income group, I have created the mean annual expenditure (meanex) for different items, e.g. electricity, this is the same value for each observation in that income group already. Now I'd like to generate a new variable that contains the ratio of spending which divides mean spending of the first income quintile by the mean spending of the fifth income quintile per year (meanex if incq==1/meanex if incq==5). My ultimate aim is to be able to plot that ratio by year to see how inequality in expenditure changes over time.

Here is a simplified example of my dataset:

Code:

clear

input id year incq meanex
1 1 1 2.4
2 1 1 2.4
3 1 2 3.1
4 1 2 3.1
5 1 3 4.2
6 1 3 4.2
7 1 4 4.8
8 1 4 4.8
9 1 5 5.1
10 1 5 5.1
11 2 1 1.1
12 2 1 1.1
13 2 2 1.7
14 2 2 1.7
15 2 3 2.1
16 2 3 2.1
17 2 4 2.5
18 2 4 2.5
19 2 5 3.6
20 2 5 3.6
21 3 1 1.6
22 3 1 1.6
23 3 2 2.3
24 3 2 2.3
25 3 3 2.8
26 3 3 2.8
27 3 4 3.7
28 3 4 3.7
29 3 5 6.8
30 3 5 6.8
end

Many thanks in advance for your help!

↧

RDD rdrobust problem

April 5, 2020, 8:55 am

≫ Next: Mediation analysis with binary outcome

≪ Previous: Generate variables with ratios of subgroup values

Dear all,

I am researching the effect of grade retention on exam results (which can vary from 0 to 20) and I am using a RDD to research the LATE. However, when I use the rdrobust command, it states that it is not possible to compute the local polynomial bandwidth and that I should run rdbwselect () for more information.

After running the rdbwselect-command, Stata says: Invertibility problem in the computation of preliminary bandwidth below the threshold. Invertibility problem in the computation of preliminary bandwidth above the threshold. Invertibility problem in the computation of bias bandwidth (b) below the thresholdInvertibility problem in the computation of bias bandwidth (b) above the threshold. Invertibility problem in the computation of loc. poly. bandwidth (h) below the thresholdInvertibility problem in the computation of loc. poly. bandwidth (h) above the threshold

I think the problem is that my running variable as well as my outcome variable are discrete variables (it's either 0, 1, 2, ..., 18, 19, 20). Is there any way I can solve this?

Kind regards,
Charles Calmeyn

↧

Mediation analysis with binary outcome

April 5, 2020, 10:34 am

≫ Next: Time Series, Possible when countries variable over multiple years?

≪ Previous: RDD rdrobust problem

Hello everyone!
I built a gsem model in the SEM builder where I had 1 continuous indepedent, 1 continuous mediator and 1 binary outcome. I run the model and took the regression coefficients; however, the command to estimate "Direct and indirect effects" didn't run and returned the following messege "invalid subcommand teffects".
How can I procced with this analysis?

↧

Time Series, Possible when countries variable over multiple years?

April 5, 2020, 12:07 pm

≫ Next: type mismatch error when trying to change value of cell

≪ Previous: Mediation analysis with binary outcome

Hello, I am trying to set up data to run a time series analysis over a decade of data. When setting up the "year" variable to be accepted for a time series analysis I have received the error message "repeated time values in sample". This is correct in the sense that every there is a 2001 data for every country, and a 2002 for every country and so on. Is there a way to work with this data so that I can run time series data over values for each country, and then run a larger aggregate time series analysis over all the countries?

↧

type mismatch error when trying to change value of cell

April 5, 2020, 12:48 pm

≫ Next: New versions of bspline and polyspline on SSC

≪ Previous: Time Series, Possible when countries variable over multiple years?

I am trying to change the value of a data cell with qualifiers, but I keep getting a type mismatch error. I have tried both the ways below thing maybe I could just change the string value then go back and change it to a float. Any ideas? as to how to not get an error?

replace tier2 = 1 if tier2 = "Tier 1"
type mismatch
r(109);

. replace tier2 = "1" if tier2 = "Tier 1"
type mismatch
r(109);

↧

New versions of bspline and polyspline on SSC

April 5, 2020, 12:59 pm

≫ Next: Validity and Reliability of Instrumental Variable

≪ Previous: type mismatch error when trying to change value of cell

Thanks as always to Kit Baum, new versions of the packages bspline and polyspline are now available for download from SSC. In Stata, use the ssc command to do this, or adoupdate if you already have old versions of these packages.

The packages bspline and polyspline are my suite of packages for generating unrestricted spline bases. The bspline package is the comprehensive version, and the polyspline package is an easy-to-use front end for bspline. Te new versions (described as below on my website) have been updated to Stata Version 16, and now use data frames, which should make them faster. Users of older versions of Stata will still be able to download the Stata Version 10 versions of bspline and polyspline, or even the Stata Version 6 version of bspline, by typing, in Stata,

net from "http://www.rogernewsonresources.org.uk/"

and selecting the Stata version and/or packages requiredd.

Best wishes

Roger

--------------------------------------------------------------------------
package bspline from http://www.rogernewsonresources.org.uk/stata16
--------------------------------------------------------------------------

TITLE
bspline: Create a basis of B-splines or reference splines

DESCRIPTION/AUTHOR(S)
The bspline package contains 3 commands, bspline, frencurv
and flexcurv. bspline generates a basis of B-splines in the
X-variate based on a list of knots, for use in the design
matrix of a regression model. frencurv generates a basis of
reference splines, for use in the design matrix of a
regression model, with the property that the parameters
fitted will be values of the spline at a list of reference
points. flexcurv is an easy-to-use version of frencurv, and
generates reference splines with regularly-spaced knots, or
with knots interpolated between the reference points.
frencurv and flexcurv have the additional option of
generating an incomplete basis of reference splines, which
can be completed by the addition of the standard constant
variable used in regression models. The splines are either
given the names in the newvarlist (if present), or (more
usually) generated as a list of numbered variables, prefixed
by the generate() option. Usually (but not always), the
regression command is called using the noconst option.

Author: Roger Newson
Distribution-date: 03 April 2020
Stata-version: 16

INSTALLATION FILES (click here to install
> )
bspline.ado
bspline.sthlp
frencurv.ado
frencurv.sthlp
flexcurv.ado
flexcurv.sthlp

ANCILLARY FILES (click here to get)
bspline.pdf
--------------------------------------------------------------------------
(click here to return to the previous screen)

---------------------------------------------------------------------------
package polyspline from http://www.rogernewsonresources.org.uk/stata16
---------------------------------------------------------------------------

TITLE
polyspline: Generate sensible bases for polynomials and other splines

DESCRIPTION/AUTHOR(S)
The polyspline package inputs an X-variable and a list of
reference points on the X-axis, and generates a basis of
reference splines (one per reference point) for a polynomial
or other unrestricted spline. This basis can be included in
the list of covariates of a regression model. The estimated
parameters will then be values of the spline at the reference
points, or differences between values of the spline at the
reference points and the value of the spline at a base
reference point. polyspline is an easy-to-use front end for
the SSC package bspline, which must be installed for
polyspline to work.

Author: Roger Newson
Distribution-date: 03april2020
Stata-version: 16

INSTALLATION FILES (click here to install)
polyspline.ado
polyspline.sthlp
---------------------------------------------------------------------------
(click here to return to the previous screen)

↧

Validity and Reliability of Instrumental Variable

April 5, 2020, 1:46 pm

≫ Next: Combining Heckman with 2SLS

≪ Previous: New versions of bspline and polyspline on SSC

Hi,

I run a conditional logistic model.

Dependent Variable: Financial status (a binary variable),
Dependent Variables: Financial ratios, Corporate Governance Variables

The reviewer suggested checking the endogeneity issue between the dependent variable and corporate governance variables. For that, I choose one variable, board size as an instrument variable, which has the least correlation with the financial distress. Multiplied it with the % independence to get a number of independent board members.

correl bsize status
g number_ind_board_mem = bsize*bind
correl bsize status number_ind_board_mem

ivprobit status sta wcta tlta dual ins aind cind nind (number_ind_board_mem = bsize), twostep

Is there any test to ensure the validity and reliability of the chosen instrument, except correlation?

Regards,
Sumaira

Here are the results:

. correl bsize status
(obs=1,336)

bsize status

bsize 1.0000
status -0.0182 1.0000

. g number_ind_board_mem = bsize*bind

. correl bsize status number_ind_board_mem
(obs=1,336)

bsize status number~m

bsize 1.0000
status -0.0182 1.0000
number_ind~m 0.6164 -0.0680 1.0000

.
. ivprobit status sta wcta tlta dual ins aind cind nind (number_ind_board_mem =
> bsize), twostep
Checking reduced-form model...

Two-step probit with endogenous regressors Number of obs = 1,336
Wald chi2(9) = 156.85
Prob > chi2 = 0.0000

Coef. Std. Err. z P>z [95% Conf. Interval]

number_ind_b~m -.0013245 .0004092 -3.24 0.001 -.0021265 -.0005225
sta -.1339088 .0417514 -3.21 0.001 -.21574 -.0520777
wcta -2.155826 .2275308 -9.47 0.000 -2.601778 -1.709874
tlta -.1369385 .2056887 -0.67 0.506 -.540081 .266204
dual -.6820512 .1706843 -4.00 0.000 -1.016586 -.347516
ins -.0065843 .0014534 -4.53 0.000 -.0094328 -.0037357
aind -.0010084 .0050894 -0.20 0.843 -.0109834 .0089666
cind .0002136 .0034307 0.06 0.950 -.0065104 .0069377
nind -.0057245 .0025953 -2.21 0.027 -.0108112 -.0006378
_cons 2.34557 .4646972 5.05 0.000 1.43478 3.25636

Instrumented: number_ind_board_mem
Instruments: sta wcta tlta dual ins aind cind nind bsize

Wald test of exogeneity: chi2(1) = 0.72 Prob > chi2 = 0.3974

↧

Combining Heckman with 2SLS

April 5, 2020, 2:52 pm

≫ Next: How to include Fixed Effects in a Diff-in-diff specification?

≪ Previous: Validity and Reliability of Instrumental Variable

Dear Statalisters,

I am interested in measuring the performance of imitative products. These products might be produced by improving the existing product in the focal product category (i), transferring product technology from a different category (ii) or can be produced by combining two or more products(ii). And I am interested in determining which way is better over the other and I have an IV for second and third cases first one being the baseline

The first problem is only high-quality products are imitated and therefore there might a selection bias. In the first draft of the paper, I was only using the sample of imitative products and controlling for the quality of original products. Initially, I was running the following regression

Code:

xtivreg performance (transfer combination= z1 z2) quality_of_original other_controls i.time, fe vce(cluster firm)

However, reviewers suggested that I should include all the observations and make Hackman two-stage to account for selection bias. My first question is Heckman is necessary if I am only interested in imitated products?

Assuming that Heckman is correct can I run the following regression? :

Code:

probit imitated z3 
predict imitated_hat, p
xtivreg  performance quality_of_original other_controls i.time (imitated##transfer imitated##combination= imitated_hat##z1  imitated_hat##z2 ),  fe vce(cluster firm)

Any help will be greatly appreciated,

Best regards,
Erdem

↧

How to include Fixed Effects in a Diff-in-diff specification?

April 5, 2020, 4:04 pm

≫ Next: Regression Results with "If" Statement

≪ Previous: Combining Heckman with 2SLS

I'm doing a difference-in-difference report on whether the EU Industrial Emissions Directive (implemented in 2011) had an effect on exports, with UK exports as my treatment group and Australia as my control group. I have panel data from 1990-2018. This is my equation:

Exports = β₀ + β₁y2011 + β₂country + β₃y2011*Country + control variables+ ε

where y2011 = 1 for observations from 2011 onwards, 0 otherwise. Country=1 for UK, 0 otherwise.

I attempted to add fixed effects to the regression but encountered several issues.
1) When I added (,fe) to the end of my xtreg command, it knocked out my 'country' dummy variable due to collinearity.
2) When I added i.y2011 and i.country to include time and country fixed variables respectively via dummy variables instead of using the (,fe) within estimator, I found that the coefficients I got (apart from β₀) were exactly the same as the regression without the fixed effects.

I am unsure as to whether I would need a different equation to take the following fixed effects into account. I saw online that to add fixed effects, the equation has to change into a different form but I am not sure how to change my current equation into one that takes fixed effects into account and how I will be able to do this on STATA. Any help with this is greatly appreciated.

↧