Choosing quadratic or Cubic model, depending on significance level

March 26, 2018, 4:37 pm

Good evening everyone,

Stuck with am uneasy decision of choice between quadratic or cubic specification in a mincer equation, studying effect of education and potential experience on wages. Quadratic experience is significant predicting diminishing returns to schooling, cubic - insignificant, but supporting positive premium returns of later obtained status. Jointly both terms are significant. Intuitively, i would like to support cubic specification, but not sure if it is econometrically valid (referring to professor, stating that one should stick with last significant exponent). I am truly mixed up.

↧

Difference in Difference

March 26, 2018, 4:45 pm

≫ Next: Metan error "Effect size and confidence intervals invalid: order should be {effect size, lower ci limit, upper ci limit}"

≪ Previous: Choosing quadratic or Cubic model, depending on significance level

Apologies I recently changed my project and I am trying to run a difference in difference regression:

I am attempting to run the following regression:

Pricemarket i -Pricemarket j= Mobile + Paved Roads + Population + Fuel Price + Rainfall + Price type + Commodity type

Mobile is a dummy variable. I want the variable to take 1 if both the markets i and j have 1 for the mobile observation.

Paved roads, population, fuel price and rainfall are observed explanatory variables (control variables)

I include dummy variables for price type and commodity type to see if the price differencials are impacted by the type of commodity (perishable or non-perishable) and type of price (retail or trader)

Code:

set more off

*Setting up qdate
gen qdate = quarterly(string(quarter)+"q"+string(year), "QY")
format qdate %tq

*Setting up panel variable
egen panel_price = group(commodityid pricetype)

*reshape from long to wide for regress
reshape wide price pavedroadskm population fuelprice rainfall, i(commodity country pricetype mobile qdate) j(marketid)

*market differences
foreach v in price {
gen delta_`v' = `v'2 - `v'1
}

gen mtreatment=0
foreach w in treatment {
replace mtreatment=1 if 'w'1=1 = 'w'2=1
}


global ylist delta_price
global xlist pavedroadskm population fuelprice rainfall
reg $ylist $mtreatment $xlist

The error I keep receiving is:

Code:

'w'1 invalid name
r(198);

I have tried many variations but am unable to fix this.

I attach a copy of my data (using 'dataex')

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str5 country byte(countryid marketid quarter) int year str2 unit str3 currency str7 commodity byte commodityid str8 pricetype float(price rainfall pavedroadskm population fuelprice) byte mobile
"Benin" 1 1 1 2000 "KG" "XOF" "Millet" 1 "Producer"    110  .8320871 24.66  82.87595  364.8 0
"Benin" 1 1 2 2000 "KG" "XOF" "Millet" 1 "Producer"    130  9.266488 24.66  83.04096 363.38 0
"Benin" 1 1 3 2000 "KG" "XOF" "Millet" 1 "Producer"    115 17.017979 24.66   83.2063 364.59 0
"Benin" 1 1 4 2000 "KG" "XOF" "Millet" 1 "Producer"    110 2.7170014 24.66  83.37196 360.17 0
"Benin" 1 1 1 2001 "KG" "XOF" "Millet" 1 "Producer" 131.81  .4435913 27.12  85.38624 361.77 0
"Benin" 1 1 2 2001 "KG" "XOF" "Millet" 1 "Producer"    135 8.8764305 27.12  85.53993 358.57 0
"Benin" 1 1 3 2001 "KG" "XOF" "Millet" 1 "Producer"    170 15.918653 27.12   85.6939 356.98 0
"Benin" 1 1 4 2001 "KG" "XOF" "Millet" 1 "Producer" 172.22 1.1938106 27.12  85.84815 363.98 0
"Benin" 1 1 1 2002 "KG" "XOF" "Millet" 1 "Producer"    175 1.2990117 33.87  88.00756 353.83 0
"Benin" 1 1 2 2002 "KG" "XOF" "Millet" 1 "Producer"    175   9.72025 33.87  89.09885 356.81 0
"Benin" 1 1 3 2002 "KG" "XOF" "Millet" 1 "Producer"    110 17.741055 33.87  90.20368 366.09 0
"Benin" 1 1 4 2002 "KG" "XOF" "Millet" 1 "Producer"    115  3.802082 33.87   91.3222 372.07 0
"Benin" 1 1 1 2003 "KG" "XOF" "Millet" 1 "Producer"    115 1.2800548 37.36  90.72687 376.51 1
"Benin" 1 1 2 2003 "KG" "XOF" "Millet" 1 "Producer" 111.11  12.08556 37.36  91.16417 370.82 1
"Benin" 1 1 3 2003 "KG" "XOF" "Millet" 1 "Producer"    135  21.25432 37.36  91.60358 380.46 1
"Benin" 1 1 4 2003 "KG" "XOF" "Millet" 1 "Producer" 131.22 3.5291195 37.36  92.04511 394.77 1
"Benin" 1 1 1 2004 "KG" "XOF" "Millet" 1 "Producer" 142.19   .776113 37.49  93.52702 402.65 1
"Benin" 1 1 2 2004 "KG" "XOF" "Millet" 1 "Producer"    175 11.494287 37.49  94.02841 400.95 1
"Benin" 1 1 3 2004 "KG" "XOF" "Millet" 1 "Producer" 152.22    17.888 37.49  94.53249 405.85 1
"Benin" 1 1 4 2004 "KG" "XOF" "Millet" 1 "Producer" 142.22  2.598993 37.49  95.03928 412.34 1
"Benin" 1 1 1 2005 "KG" "XOF" "Millet" 1 "Producer"    135 1.9511178 49.82  96.38049 408.16 1
"Benin" 1 1 2 2005 "KG" "XOF" "Millet" 1 "Producer"    115  9.160417 49.82  96.83106 412.34 1
"Benin" 1 1 3 2005 "KG" "XOF" "Millet" 1 "Producer" 141.11 16.668852 49.82  97.28374 418.93 1
"Benin" 1 1 4 2005 "KG" "XOF" "Millet" 1 "Producer" 111.11 2.5000436 49.82  97.73855 419.24 1
"Benin" 1 1 1 2006 "KG" "XOF" "Millet" 1 "Producer" 138.11  1.093254 49.93  99.26842 441.15 1
"Benin" 1 1 2 2006 "KG" "XOF" "Millet" 1 "Producer" 211.22  8.828572 49.93  99.96013 447.75 1
"Benin" 1 1 3 2006 "KG" "XOF" "Millet" 1 "Producer" 152.22  17.47267 49.93 100.65665 473.89 1
"Benin" 1 1 4 2006 "KG" "XOF" "Millet" 1 "Producer"    150  3.420969 49.93 101.35802 462.73 1
"Benin" 1 1 1 2007 "KG" "XOF" "Millet" 1 "Producer"    155  .8597299 50.04 102.18684 478.21 1
"Benin" 1 1 2 2007 "KG" "XOF" "Millet" 1 "Producer" 111.11 12.250764 50.04 102.92442 494.21 1
"Benin" 1 1 3 2007 "KG" "XOF" "Millet" 1 "Producer"    145 19.497534 50.04 103.66733 527.63 1
"Benin" 1 1 4 2007 "KG" "XOF" "Millet" 1 "Producer" 115.89  2.830138 50.04  104.4156 546.86 1
"Benin" 1 1 1 2008 "KG" "XOF" "Millet" 1 "Producer" 111.11  1.162299  50.2 105.16927  566.8 1
"Benin" 1 1 2 2008 "KG" "XOF" "Millet" 1 "Producer" 141.11 11.977502  50.2 105.92838 556.57 1
"Benin" 1 1 3 2008 "KG" "XOF" "Millet" 1 "Producer"    175  22.16582  50.2 106.69298 546.53 1
"Benin" 1 1 4 2008 "KG" "XOF" "Millet" 1 "Producer" 132.22  3.137085  50.2  107.4631 526.01 1
"Benin" 1 1 1 2009 "KG" "XOF" "Millet" 1 "Producer"    175  1.074003 49.94 108.23875 517.47 1
"Benin" 1 1 2 2009 "KG" "XOF" "Millet" 1 "Producer" 141.11  10.02601 49.94 109.02002 507.23 1
"Benin" 1 1 3 2009 "KG" "XOF" "Millet" 1 "Producer" 138.11 18.241352 49.94 109.80693 489.96 1
"Benin" 1 1 4 2009 "KG" "XOF" "Millet" 1 "Producer" 188.11  4.314962 49.94 110.59952 512.07 1
"Benin" 1 1 1 2010 "KG" "XOF" "Millet" 1 "Producer" 111.11 1.2736073 49.96 111.39782 529.63 1
"Benin" 1 1 2 2010 "KG" "XOF" "Millet" 1 "Producer" 142.22  8.645645 49.96  112.2019 547.79 1
"Benin" 1 1 3 2010 "KG" "XOF" "Millet" 1 "Producer" 111.22 17.182041 49.96 113.01176 560.93 1
"Benin" 1 1 4 2010 "KG" "XOF" "Millet" 1 "Producer"    175  3.729227 49.96 113.82748 586.01 1
"Benin" 1 1 1 2011 "KG" "XOF" "Millet" 1 "Producer" 110.28 1.3433656 50.13  114.6491 600.18 1
"Benin" 1 1 2 2011 "KG" "XOF" "Millet" 1 "Producer"    155  6.857734 50.13 115.37757  626.9 1
"Benin" 1 1 3 2011 "KG" "XOF" "Millet" 1 "Producer" 118.11 13.282522 50.13 116.11068 642.18 1
"Benin" 1 1 4 2011 "KG" "XOF" "Millet" 1 "Producer"    150 3.2788765 50.13 116.84845  648.4 1
"Benin" 1 1 1 2012 "KG" "XOF" "Millet" 1 "Producer" 113.42   .695286 56.75  117.5909 628.81 1
"Benin" 1 1 2 2012 "KG" "XOF" "Millet" 1 "Producer"    180  8.839492 56.75 118.33807  635.1 1
"Benin" 1 1 3 2012 "KG" "XOF" "Millet" 1 "Producer" 152.22 12.117025 56.75    119.09 609.33 1
"Benin" 1 1 4 2012 "KG" "XOF" "Millet" 1 "Producer"    175 2.5099454 56.75  119.8467 602.89 1
"Benin" 1 1 1 2013 "KG" "XOF" "Millet" 1 "Producer" 108.11 1.4844406 58.59  120.6082 590.34 1
"Benin" 1 1 2 2013 "KG" "XOF" "Millet" 1 "Producer" 131.22 11.528778 58.59 121.37454 578.05 1
"Benin" 1 1 3 2013 "KG" "XOF" "Millet" 1 "Producer" 131.22 13.377524 58.59 122.14576 572.61 1
"Benin" 1 1 4 2013 "KG" "XOF" "Millet" 1 "Producer"    180   3.35759 58.59 122.92187 560.87 1
"Benin" 1 1 1 2014 "KG" "XOF" "Millet" 1 "Producer" 188.11 1.2139716 58.62 123.70292 549.37 1
"Benin" 1 1 2 2014 "KG" "XOF" "Millet" 1 "Producer" 111.11 10.257064 58.62  124.7477 530.71 1
"Benin" 1 1 3 2014 "KG" "XOF" "Millet" 1 "Producer" 132.22 18.031187 58.62 125.80133 515.76 1
"Benin" 1 1 4 2014 "KG" "XOF" "Millet" 1 "Producer" 201.11   3.15808 58.62 126.86385 507.54 1
"Benin" 1 1 1 2015 "KG" "XOF" "Millet" 1 "Producer" 131.22  2.468094 63.91 127.93534 484.21 1
"Benin" 1 1 2 2015 "KG" "XOF" "Millet" 1 "Producer" 171.22  6.944751 63.91 129.01588 485.38 1
"Benin" 1 1 3 2015 "KG" "XOF" "Millet" 1 "Producer" 131.11 15.586768 63.91 129.57907 440.46 1
"Benin" 1 1 4 2015 "KG" "XOF" "Millet" 1 "Producer" 151.11  3.730871 63.91 130.14473 453.95 1
"Benin" 1 1 1 2016 "KG" "XOF" "Millet" 1 "Producer" 132.22    3.4156 64.51 130.71284 443.93 1
"Benin" 1 1 2 2016 "KG" "XOF" "Millet" 1 "Producer" 171.22    8.3423 64.51 131.28345 454.59 1
"Benin" 1 1 3 2016 "KG" "XOF" "Millet" 1 "Producer" 208.08   18.4363 64.51 131.85654 434.13 1
"Benin" 1 1 4 2016 "KG" "XOF" "Millet" 1 "Producer"    205    3.9235 64.51 132.43213 424.55 1
"Benin" 1 1 1 2000 "KG" "XOF" "Millet" 1 "Retail"   271.22  .8320871 24.66  82.87595  364.8 0
"Benin" 1 1 2 2000 "KG" "XOF" "Millet" 1 "Retail"      285  9.266488 24.66  83.04096 363.38 0
"Benin" 1 1 3 2000 "KG" "XOF" "Millet" 1 "Retail"   248.11 17.017979 24.66   83.2063 364.59 0
"Benin" 1 1 4 2000 "KG" "XOF" "Millet" 1 "Retail"   232.22 2.7170014 24.66  83.37196 360.17 0
"Benin" 1 1 1 2001 "KG" "XOF" "Millet" 1 "Retail"   252.22  .4435913 27.12  85.38624 361.77 0
"Benin" 1 1 2 2001 "KG" "XOF" "Millet" 1 "Retail"   251.22 8.8764305 27.12  85.53993 358.57 0
"Benin" 1 1 3 2001 "KG" "XOF" "Millet" 1 "Retail"      285 15.918653 27.12   85.6939 356.98 0
"Benin" 1 1 4 2001 "KG" "XOF" "Millet" 1 "Retail"      285 1.1938106 27.12  85.84815 363.98 0
"Benin" 1 1 1 2002 "KG" "XOF" "Millet" 1 "Retail"      285 1.2990117 33.87  88.00756 353.83 0
"Benin" 1 1 2 2002 "KG" "XOF" "Millet" 1 "Retail"      285   9.72025 33.87  89.09885 356.81 0
"Benin" 1 1 3 2002 "KG" "XOF" "Millet" 1 "Retail"      214 17.741055 33.87  90.20368 366.09 0
"Benin" 1 1 4 2002 "KG" "XOF" "Millet" 1 "Retail"      210  3.802082 33.87   91.3222 372.07 0
"Benin" 1 1 1 2003 "KG" "XOF" "Millet" 1 "Retail"   208.11 1.2800548 37.36  90.72687 376.51 1
"Benin" 1 1 2 2003 "KG" "XOF" "Millet" 1 "Retail"      200  12.08556 37.36  91.16417 370.82 1
"Benin" 1 1 3 2003 "KG" "XOF" "Millet" 1 "Retail"   222.22  21.25432 37.36  91.60358 380.46 1
"Benin" 1 1 4 2003 "KG" "XOF" "Millet" 1 "Retail"      215 3.5291195 37.36  92.04511 394.77 1
"Benin" 1 1 1 2004 "KG" "XOF" "Millet" 1 "Retail"   221.22   .776113 37.49  93.52702 402.65 1
"Benin" 1 1 2 2004 "KG" "XOF" "Millet" 1 "Retail"   251.22 11.494287 37.49  94.02841 400.95 1
"Benin" 1 1 3 2004 "KG" "XOF" "Millet" 1 "Retail"   228.11    17.888 37.49  94.53249 405.85 1
"Benin" 1 1 4 2004 "KG" "XOF" "Millet" 1 "Retail"   218.11  2.598993 37.49  95.03928 412.34 1
"Benin" 1 1 1 2005 "KG" "XOF" "Millet" 1 "Retail"      205 1.9511178 49.82  96.38049 408.16 1
"Benin" 1 1 2 2005 "KG" "XOF" "Millet" 1 "Retail"      185  9.160417 49.82  96.83106 412.34 1
"Benin" 1 1 3 2005 "KG" "XOF" "Millet" 1 "Retail"      210 16.668852 49.82  97.28374 418.93 1
"Benin" 1 1 4 2005 "KG" "XOF" "Millet" 1 "Retail"   178.11 2.5000436 49.82  97.73855 419.24 1
"Benin" 1 1 1 2006 "KG" "XOF" "Millet" 1 "Retail"      205  1.093254 49.93  99.26842 441.15 1
"Benin" 1 1 2 2006 "KG" "XOF" "Millet" 1 "Retail"   277.11  8.828572 49.93  99.96013 447.75 1
"Benin" 1 1 3 2006 "KG" "XOF" "Millet" 1 "Retail"      215  17.47267 49.93 100.65665 473.89 1
"Benin" 1 1 4 2006 "KG" "XOF" "Millet" 1 "Retail"      210  3.420969 49.93 101.35802 462.73 1
"Benin" 1 1 1 2007 "KG" "XOF" "Millet" 1 "Retail"      215  .8597299 50.04 102.18684 478.21 1
"Benin" 1 1 2 2007 "KG" "XOF" "Millet" 1 "Retail"      170 12.250764 50.04 102.92442 494.21 1
"Benin" 1 1 3 2007 "KG" "XOF" "Millet" 1 "Retail"   201.11 19.497534 50.04 103.66733 527.63 1
"Benin" 1 1 4 2007 "KG" "XOF" "Millet" 1 "Retail"      170  2.830138 50.04  104.4156 546.86 1
end

Further please note that the above data example is before I ran the regression. I get an error (input statement exceeds linesize limit) if i use dataex after running my code

Many thanks in advance

↧

Metan error "Effect size and confidence intervals invalid: order should be {effect size, lower ci limit, upper ci limit}"

March 26, 2018, 6:21 pm

≫ Next: Fuzzy match and de-duplication of names

≪ Previous: Difference in Difference

Hi,

Im using Stata 15 on MacOS Sierra. I'm meta-analysing hazard ratios + 95% CIs which i have log transformed with variable labels loghr, logLL and logUL

The command: metan loghr logLL logUL, fixed by (SESmeasure) eform effect("Hazard Ratio") lcols(Author Year Country)

results in the error message "Effect size and confidence intervals invalid: order should be {effect size, lower ci limit, upper ci limit}"

Is there an obvious mistake? Would seeing the data help?

Many thanks
Liz

↧

Fuzzy match and de-duplication of names

March 26, 2018, 6:42 pm

≫ Next: cihplot

≪ Previous: Metan error "Effect size and confidence intervals invalid: order should be {effect size, lower ci limit, upper ci limit}"

I have a list of names in a single variable. That list contains near-duplicates, such as "SMITH, JOHN Q" and "SMITH, JOHN" and "SMITH, J", that for my purposes can be assumed to be the same person. I want to de-duplicate based on a fuzzy match of names, ideally using a repeatable process, but I understand that some manual review is probably required.

Searching this forum turned up a lot of posts on fuzzy matches, like these posts about -matchit- by Julio Raffo:

https://www.statalist.org/forums/for...-e-fuzzy-match

https://www.statalist.org/forums/for...ring-variables

-matchit- gets me close to what I need when I match the file to itself. That is, when I run this:

Code:

* Example generated by -dataex-. To install: ssc install dataex
* assumes user-written commands -matchit- and -freqindex- are installed

clear
input str14 name byte nameid
"SMITH, JOHN Q." 1
"SMITH, JOHN"    2
"SMITH, J."      3
"DOE, JANE S."   4
"DOE, JANE"      5
"ROE, PHIL"      6
end

save dataex.dta

matchit nameid name using dataex.dta, idu(nameid) txtu(name)

// drop obs that match themselves perfectly
drop if similscore == 1

The result has each pair of cases that come close to matching each other — but each pair is in there twice. That is, both name -> name1 and name1 -> name are there. I think I want to only have name -> name1, then use -merge- to get that back into the original data. I can't figure out how to remove the name1 -> name version of the matched pair.

It's possible that I've made this all way more complicated than it needs to be, so I'd love to see better solutions!

↧

cihplot

March 26, 2018, 8:58 pm

≫ Next: Interaction Effects and Models

≪ Previous: Fuzzy match and de-duplication of names

I am working on a users written program called cihplot. When i run a plot of means and CIs of an outcome variable overtime. it plots outcome variables on x-axis and time variable on y axis where I want to plot these variables inversely
cihplot outcome_var, by(time_var)

I tried to edit but there seems no options.

Any suggestion??

↧

Interaction Effects and Models

March 26, 2018, 9:51 pm

≫ Next: graph plot

≪ Previous: cihplot

Hello all,

I have a question regarding interaction effects and models. I want to test the interaction of age (continuous variable) on metabolic syndrome (binary variable). I found an example which uses the following command to test the interaction between a binary and continuous variable:

Code:

 regress low_MCS curmetsyn##c.agenew

and then using the following command to get the graph:

Code:

 predict fit

Code:

 twoway (line fit2 agenew if curmetsyn==0, sort) (line fit2 agenew if curmetsyn==1, sort lp(-)), legend(lab(1
>  "MetS -") lab(2 "MetS +") ring(0) pos(1))

...which results in graph 1 attached.

So the only thing is that I am using melogit to run regression with these variables since this is a longitudinal analysis, and I want to know if I would run the same command to test the interaction effects; for instance:

Code:

 melogit low_MCS curmetsyn##c.agenew || newid:, nolog

However, when I predict fit after this command and run the same code above for the graph, I get the second graph attached (graph 2).
Can someone tell me which is the correct one? I would say the first one since it looks a lot more acceptable, but I am not sure.
Any feedback would be appreciated.
Thank you in advance.

↧

graph plot

March 26, 2018, 10:14 pm

≫ Next: Combining shapefiles in Stata

≪ Previous: Interaction Effects and Models

Hi Stata users,

I want to plot a similar graph as the one attached and wondering if anyone knows how I can do this with examples. The paper where this graph appears is in http://journals.plos.org/plosmedicin...l.pmed.1000273

Thanks

↧

Combining shapefiles in Stata

March 27, 2018, 1:56 am

≫ Next: panel data: impact of a policy

≪ Previous: graph plot

Dear Statalisters

I'm trying to draw maps of England and Wales on the regional level, i.e. the nine macro-regions of England, plus Wales. I have two shapefiles: one for the regions of England, and one for the countries of Great Britain (England, Scotland and Wales), where I want to extract the boundaries of Wales and add to the English regions (full disclosure: there was a problem with the country shapefile, because it included a variable called "long" that shp2dta couldn't deal with, so I first modified the shapefile in R to remove this variable). There is an earlier thread that discusses a similar problem, but unfortunately these steps are not working for me.

I'm following the steps outlined by Roberto Liebscher in the tread mentioned above, simply appending the Wales boundaries to the English boundaries. However, whenever I try to draw a map (using Maurizio Pisati's spmap), Stata "freezes" (no error message, just a popup from Windows saying that the programme stopped responding) . Drawing the two maps individually is not a problem. It is also possible to add Wales as a polygon to the base map of the English regions, but ultimately I would like to draw choropleth maps, so that is not a satisfying option.

Here's the code I'm using (in Stata 15.0):

Code:

 shp2dta using Regions_December_2015_Full_Clipped_Boundaries_in_England, data(region_wow_shp) coor(region_wow_coor) genid(id) genc(c) replace
 shp2dta using countries, data(countryshp) coor(countrycoor) genid(id) genc(c) replace

 use countryshp, clear
 keep if ctry16nm == "Wales"
 rename ctry16nm rgn15nm
 rename ctry16cd rgn15cd
 drop bng_e bng_n
 replace id = 10 // IDs in region shapefile run from 1 to 9

 append using region_wow_shp
 sort id
 save regionshp, replace

 use countrycoor, clear
 keep if _ID == 3 // Extracting the Welsh coordinates
 replace _ID = 10 // Assign the same ID as above

 append using region_wow_coor
 sort _ID
 save regioncoor, replace

use regionshp, clear
spmap using regioncoor, id(id) // This is where the programme stops responding

Many thanks for your time! Any thoughts appreciated!

↧

panel data: impact of a policy

March 27, 2018, 2:57 am

≫ Next: Merging datasets with overlapping time periods

≪ Previous: Combining shapefiles in Stata

Dear all,

I do research on the impact of the Belgian gun law on crime rates. I collected strongly balanced panel data:
1. crime rates for all 500 Belgian municipalities for a periode of 11 years (2004 - 2014). The crime rates are my dependend variable
2. some control variables like unemployment rate. These variables are also collected for the same 500 Belgian municipalities for the period 2004 - 2014

In 2006, the government introduced a new national gun law (so this gun law is implemented for each municipality since 2006). The purpose of my study is to evaluate the impact of that gun law on the crime rates. My question is how I could measure the impact of that gun law with STATA. I suppose that I need a dummy variable (1 for the years after 2006 and 0 for the years 2004 -2006). Is this correct? And do I have to work with a fixed-effects model or time-effects-model?

Hopefully someone can help me.

Yours sincirely,

Robin

↧

Merging datasets with overlapping time periods

March 27, 2018, 3:12 am

≫ Next: Unconditional quantile regression using rifreg and genqreg

≪ Previous: panel data: impact of a policy

Dear all,

I'm trying to merge two datasets, in both cases data from content analyses of one and the same talkshow. One dataset contains data on visual aspects (video), the other one data on the spoken word (sound). In the video dataset, the observations are by the second (each second is one observation). The variables in the dataset are dummy coded. In the sound dataset, the observations range from a startpoint to an endpoint (one observation lasts between 1 and several seconds. One spoken statement in the talkshow represents one observation). I'd like to merge the two datasets in such a way that I have one dataset in the end that is based on the sound dataset. To one observation of sound dataset I'd like to assign the data from the video dataset.

My idea for the merging was to expand the sound dataset to get observations on a per-second base, too, and to merge the datasets with the running second as the identifier and to collapse the final dataset based on the ID of the statement again.

However, I have time gaps and overlapping time periods for the observations in the sound dataset (the talkshow participants sometimes didn't speak or spoke at the same time). In some cases, the startpoint of the next statement is identical with the endpoint of the previous statement. Expanding the dataset based on the duration of the observation and just assiging a running second to the expanded observation, hence, didn't do the trick for me (that was my apporach so far).

This is an example for the sound dataset:

Code:

clear
input double(ID start end duration)
  18737   0   8  8
  18738   9  16  7
  18739  16  22  6
  18740  23  26  3
  18741  27  34  7
  18742  35  42  7
  18743  42  47  5
  18745  45  49  4
  18746  50  53  3
  18747  54  64 10
  18748  64  75 11
  18749  76  77  1
  18750  78  86  8
  18752  88  91  3
  18753  92  93  1
  18754  94  96  2
  18755  93 108 15
end

What I'd like to get in the end is:

ID	start	end	duration	X_1	X_2
18737	0	8	8	0	1
18738	9	16	7	1	1
18739	16	22	6	0	1
18740	23	26	3	1	0

X_1 and X_2 are two of the dummy coded variables from the video dataset.

Is there a way in Stata to merge these two datasets? Since I'm rather new to Stata, my apporach with a running second in both datasets as identifiers for the merging might be wrong/too complicated.

Thanks and best,
Isabella

↧

Unconditional quantile regression using rifreg and genqreg

March 27, 2018, 3:17 am

≫ Next: Import delimited gets leading spaces wrong

≪ Previous: Merging datasets with overlapping time periods

Dear all,

I am trying to run an unconditional quantile regression using the user-written 'rifreg' command by Firpo, Fortin and Lemieux (2009) and 'genqreg' command written by Matthew J. Baker et. al. respectively. But the result of using different commands is quite different. How can I understand this difference? What’s the difference between this two commands?

Best,

↧

Import delimited gets leading spaces wrong

March 29, 2018, 2:19 pm

≫ Next: -predxcat- Adjusted mean value of continuous dependent variable for two categorical variables

≪ Previous: Unconditional quantile regression using rifreg and genqreg

I am using -import delimited- to import a file with whitespace separating values. As far as I can tell, any leading whitespace is interpreted as a delimiter and becomes a missing value. Is there any solution other than modifying the input file? Here is a simple demonstration:

Code:

. type test.raw
 1.00
2.00

. import delimited using test.raw,delimiter(whitespace,collapse)
(2 vars, 2 obs)

. list

     +---------+
     | v1   v2 |
     |---------|
  1. |  .    1 |
  2. |  2    . |
     +---------+

. version
version 14.2

Notice how the "1.00" in the first observation is preceeded by a single space, which convinces Stata that there are two variables, and the first is missing. I really don't want to change the format of the input data, and no other program seems to take this interpretation of whitespace used as a delimiter.

↧

-predxcat- Adjusted mean value of continuous dependent variable for two categorical variables

March 29, 2018, 2:30 pm

≫ Next: HELP - Understanding Panel Data/Fixed Effects through the Phillips Curve*

≪ Previous: Import delimited gets leading spaces wrong

Hi all,
I was trying to obtain adjusted mean values of yvar for all combinations of xvar1 (3 categories) and xvar2 (4 categories) adjusted for a covariate xcov (3 categories) and plot the bar graphs of the resulting adjusted mean values .
I came across the -predxcat- command for Stata, created by Joanne Garrett.
the code I used is

Code:

predxcat yvar, xvar(xvar1 (xvar2)) adjust(xcov) graph bar

But this adjusts for xcov taking it as continious variable instead of taking it as categorical variable( with 3 levels).
the documentation for this command states that if xcov (variables for adjusting) has more than 2 categories, it must be defined with indicator variables in the adjust list. I did not understand what it means and how to define xcov with indicator variable when it has values of 1-3.
Has anybody come across this command? I liked it as it produces really good graphs. Is there another way to obtain adjusted mean value for yvar [mean(SD) and 95% ci for the adjusted mean] ?

thank you

↧

HELP - Understanding Panel Data/Fixed Effects through the Phillips Curve*

March 29, 2018, 3:24 pm

≫ Next: Sepscatter and qfit

≪ Previous: -predxcat- Adjusted mean value of continuous dependent variable for two categorical variables

Hello,

I'm working on a project testing the strength of the Phillips curve, but am very unfamiliar with STATA and panel data econometrics.
I'd like to test the relation between unemployment and inflation (CPI) and see if the relationship is still significantly significant post-financial crisis, compared to pre.
I've never conducted panel data analysis, and have little experience with STATA.

I've gathered CPI and Unemployment data from Jan 1983-Feb2017 for 10 provinces in Canada and cleaned the data on an excel file.
My data:
10 provinces, dates (monthly starting Jan 1983), unemployment rate (monthly), and CPI (monthly)

I've attached a spreadsheet for reference.
https://docs.google.com/spreadsheets...it?usp=sharing

I've been recommended to include dummies for the provinces, dummies for the year (to account for general time trends), and dummies for every month (to account for seasonality).
I've been able to create dummies for the provinces (using "tabulate prov, gen(p)"), but am getting confused when creating dummies for the year and months:

1. I can't figure out how to create a dummy for a year, rather than monthly because my data is entered monthly
2. I believe I created dummies monthly (tabulate Date, gen(d)) but it looks strange --> D1, Date == 8401.0000 ????
How can I be sure if I did this correctly, and if so, how would I interpret the results?

Am I better off creating all of those dummy variables as recommended (there would be a very large amount) or should I be using the xtreg / areg function?
I'm not familiar with the xtreg or areg function (never used before)
but when I used the function "xi: regress Unem CPI i.prov" it looked quite clean.

What is the simplest way to remove the fixed effects and test if the relation between inflation and unemployment is still strong?

↧

Sepscatter and qfit

March 29, 2018, 3:26 pm

≫ Next: Calculation of hospital readmission rates using meqrlogit

≪ Previous: HELP - Understanding Panel Data/Fixed Effects through the Phillips Curve*

Hello! I was previously trying to fit a qfit over the sepscatter command but read somewhere that it's not possible because sepscatter not being a twoway type. I'm now trying to recreate the sepscatter code manually but I can't figure out how to get the legend to show (part where it says 2010, 2011 etc). Do anyone know how to solve this?

My original sepscatter code:
sepscatter WTI RI, seperate(Year) /*
*/ title("Relationship Between WTI Spot Price" "and Relative Inventory Level") /*
*/ xtitle("Relative Inventory Level", height(6))ytitle("WTI Spot Price") /*
*/ legen(col(3) ring(0) position(1) region(lcolor(white)))/*
*/ ylabel(, angle(0)) /*
*/ ylabel(,nogrid)/*
*/ graphregion(color(white))

My new code:
graph twoway (scatter WTI RI if Year=="2010") /*
*/ (scatter WTI RI if Year=="2011")(qfit WTI RI, lcolor(black))/*
*/ (scatter WTI RI if Year=="2012") /*
*/(scatter WTI RI if Year=="2013") /*
*/ (scatter WTI RI if Year=="2014") /*
*/(scatter WTI RI if Year=="2015") /*
*/ (scatter WTI RI if Year=="2016") /*
*/(scatter WTI RI if Year=="2017") /*
*/ (scatter WTI RI if Year=="2018") /*
*/(scatter WTI RI if Year=="Now", mcolor(black) msize(medium-large)), /*
*/ title("Relationship Between WTI Spot Price" "and Relative Inventory Level") /*
*/ xtitle("Relative Inventory Level", height(6))ytitle("WTI Spot Price") /*
*/ legen(col(3) ring(0) position(1) order(1 2) region(lcolor(white))) /*
*/ note("Data Source: EIA Weekly Petroleum Status Report", position(7) ring(0) size(vsmall)) /*
*/ ylabel(, angle(0)) /*
*/ ylabel(,nogrid) /*
*/ graphregion(color(white))

Array

↧

Calculation of hospital readmission rates using meqrlogit

March 29, 2018, 4:02 pm

≫ Next: Marginal effects on a base comparison group (Ordered Probit Model)

≪ Previous: Sepscatter and qfit

input float hospid byte indexevent float readmission
3014 1 0
3014 1 1
3014 1 0
3014 1 0
3014 1 0
3014 1 0
3014 1 0
3016 1 0
3016 1 0
3016 1 0
3016 1 0
3016 1 1
3016 1 0
3019 1 0
3019 1 0
3021 1 0
3021 1 0
3021 1 0
3021 1 0
3021 1 0
3023 1 0
3023 1 0
3023 1 1
3023 1 0
3023 1 0
3023 1 0
3023 1 0
3023 1 0
3023 1 0
3023 1 1
3023 1 0
3023 1 1
3023 1 0
3023 1 0
3023 1 0
3023 1 0
3023 1 0
3023 1 0
3023 1 0
3023 1 0
3023 1 0
3023 1 0
3023 1 0
3023 1 0
3023 1 0
3023 1 1
3023 1 0
3023 1 0
3023 1 0
3023 1 0
3023 1 0
3023 1 0
3023 1 0
3023 1 0
3024 1 1
3024 1 1
3024 1 0
3024 1 0
3024 1 0
3024 1 0
3024 1 1
3024 1 0
3024 1 0
3024 1 0
3024 1 0
3024 1 1
3024 1 1
3024 1 0
3024 1 1
3024 1 1
3024 1 1
3024 1 1
3024 1 0
3024 1 1
3024 1 0
3024 1 1
3024 1 0
3024 1 0
3024 1 0
3024 1 0
3024 1 0
3024 1 0
3024 1 1
3024 1 0
3024 1 0
3024 1 0
3024 1 0
3024 1 0
3024 1 0
3024 1 0
3024 1 1
3024 1 0
3024 1 0
3024 1 0
3024 1 0
3024 1 0
3024 1 0
3024 1 0

Dear all,

I'm working on a paper to calculate hospital specific readmission rates using a national database with about 15 million observations. Example from dataex is pasted above. It contains a hospital ID variable (hospid), variable for index hospitalization (indexevent) and variable for 30 day readmission (readmission, 0=no readmission, 1=readmission within 30 days).

I calculated the hospital readmission rates using the following approach

sort hopid
bysort hospid: egen hospital_volume=total(indexevent) /* This generates the total number of index hospitalizations at any given hospital */
bysort hospid: egen total_readmission=total(readmission) /* This generates the total number of readmissions at any given hospital */
gen HRR = total_readmission/hospital_volume /* This will generate the proportion of readmissions within 30 days for any given hospital */

However when I try to get the same results using a hospital-level random intercept using meqrlogit, I get quite a different result. I'm using meqrlogit because melogit given are error message with "initial values not feasible".

meqrlogit readmission || hospid:
predict HRR /* This should give the probability of readmission for each hospital*/

HRR computed manually as above and from the random intercept model below are different. The difference is especially pronounced in hospitals that have a 0% readmission rate computed manually.
Can someone explain why the two approaches are producing different results? I'm trying to understand this because I will finally be using the random intercept model to adjust for patient case mix, which I cannot do using the manual approach above.

↧

Marginal effects on a base comparison group (Ordered Probit Model)

March 29, 2018, 4:04 pm

≫ Next: Bootstrap stuck.

≪ Previous: Calculation of hospital readmission rates using meqrlogit

Hi
I am working with a ordered probit model to compare differences between regions (7 groups).
I am using

Code:

fvset base 1 region

quie oprobit satisfaction_economic ///
    i.region##(i.activity_condition 
    i.group_age  i.male) [aw=factor_per]

with this code, I have coefficients that are the differences between coeff of base region and the coeff like if I run
(for example)

Code:

 quie oprobit satisfaction_economic ///
    i.region##(i.activity_condition 
    i.group_age  i.male) [aw=factor_per]  if region==7

I undestand they are the comparison with my base region
So, now I am trying to get marginals effects with the same comparison base 1 region , I have used different codes like

Code:

 margins region,  dydx(*) predict(outcome(1)) post 

margins region##(i.activity_condition 
    i.group_age  i.male),  dydx(*) predict(outcome(1)) post

margins i.region##(i.activity_condition 
    i.group_age  i.male),  dydx(*) predict(outcome(1)) post

but independtly of the region base choosen, I get the same dy/dx and I think that are not comparable with region 1 because they are the marginal effects no take in count the comparison between my region base.

How Can I get my marginals effects (each variables categories to each region) but the marginals effects are comparable with base region 1 ?

↧

Bootstrap stuck.

March 29, 2018, 5:05 pm

≫ Next: Help with understanding complex interaction term

≪ Previous: Marginal effects on a base comparison group (Ordered Probit Model)

Hello everyone,

Is there any way to foster the process of bootstrapping. I am replicating only 1000 times but it is stuck after 5 replications for a long time. I can see Stata working but with no progress. Please see the codes below. The dots are where the estimation seems stuck for a long time (last one and half hour). The 'gsem' model and the 'nlcom' runs perfectly without bootstrapping.

Code:

 capture program drop boots
     program boots, rclass
         gsem (L -> wk*@1,  link(logit) fam(bin)) ///
               (z1 <- z0 treat, link(ident) fam(gaus)) ///
               (L <- hotash itihash z1 treat, link(logit) fam(bin)), diff
       return scalar product = _b[z1:treat]*_b[L:z1]
     end
 bootstrap r(product), reps(1000) seed(329250) : boots

(running boots on estimation sample)

Bootstrap replications (1000)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
....

Any suggestion is highly appreciated.

↧

Help with understanding complex interaction term

March 29, 2018, 5:40 pm

≫ Next: Generating kernel density balancing plots (with different weights) after teffects ipw

≪ Previous: Bootstrap stuck.

Hello all,

I am looking for help with the interpretation of a complicated interaction term.

I am using a fixed effects model.

The interaction term is:

Code:

c.log(corruption)#c.d.log(military expenditure as a % of GDP)

where (corr) = corruption is a continuous variable with log transformation
where (dlm) = military expenditure as a % of GDP is a continuous variable with log transformation followed by being differenced

any help would be greatly appreciated

below is my output

Code:

Fixed-effects (within) regression               Number of obs     =      1,602
Group variable: country_id                      Number of groups  =         86

R-sq:                                           Obs per group:
     within  = 0.2402                                         min =          4
     between = 0.1234                                         avg =       18.6
     overall = 0.0496                                         max =         20

                                                F(32,85)          =      15.45
corr(u_i, Xb)  = -0.9705                        Prob > F          =     0.0000

                            (Std. Err. adjusted for 86 clusters in country_id)
------------------------------------------------------------------------------
             |               Robust
        D.ly |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        lpop |
         D1. |  -1.025697   .1174937    -8.73   0.000    -1.259306   -.7920882
             |
          lk |
         D1. |   .0254958   .0203384     1.25   0.213    -.0149425    .0659341
             |
          lm |
         D1. |   .0972852   .0506766     1.92   0.058    -.0034735    .1980439
             |
          ly |
         L1. |  -.0504294   .0122852    -4.10   0.000    -.0748556   -.0260032
             |
          lm |
         L1. |  -.0205839   .0075336    -2.73   0.008    -.0355628    -.005605
             |
          ic |   .0029911   .0011915     2.51   0.014     .0006221    .0053601
          ec |   .0021997   .0012717     1.73   0.087    -.0003287    .0047282
        corr |   .0017344   .0022978     0.75   0.452    -.0028342    .0063031
         mip |   .0012438   .0023464     0.53   0.597    -.0034214     .005909
          gs |   .0037059   .0008534     4.34   0.000     .0020091    .0054027
             |
 c.mip#c.dlm |   .0007858   .0063241     0.12   0.901    -.0117883    .0133598
             |
  c.gs#c.dlm |  -.0104506   .0049473    -2.11   0.038    -.0202871   -.0006141
             |
c.corr#c.dlm |  -.0138721   .0080675    -1.72   0.089    -.0299124    .0021682

↧

Generating kernel density balancing plots (with different weights) after teffects ipw

March 29, 2018, 5:54 pm

≫ Next: Survey package starter: svyset for a 4 stage cluster stratified sampling with PPS at second stage

≪ Previous: Help with understanding complex interaction term

Hello Statalisters,

I am using inverse probability weighting with the teffects command in Stata 15.1. However, rather than using the weights generated by Stata, I am following a recommendation in the literature (e.g.: http://onlinelibrary.wiley.com/doi/1.../sim.6607/full) to use stabilized weights that help to account for the undue influence some observations can have when using inverse probability weighting analyses.

Because of this different weighting scheme, I cannot use the tebalance density command to check covariate balance after estimating the propensity score. So, I would like to generate my own kernel density plots similar to those created by tebalance density, but am running into problems. Before checking balance with my stabilized weights, I wanted to see if I could replicate the tebalance density plots to make sure that I was implementing everything correctly. However, the plots that I generate are similar but not equivalent to those produced by tebalance, and I am not sure what is wrong.

In the commands below, I am using iweights based on the help file for teffects, which suggests that the kernel density plots for tebalance are implemented with these kind of weights. insample is an indicator variable restricting the sample, and avg_weekhrs is a balancing covariate, indicating a parent's average hours of work per week.

Code:

teffects ipw (y) (x $controls) if insample==1
***predicting propensity score based on treatment=1
predict ps if insample==1, ps tlevel(1)
***generating inverse probability weights
gen ipw=1/ps if x==1
replace ipw=1/(1-ps) if x==0
****creating kernel density plots based on example in kdensity help file.
kdensity avg_weekhrs if insample==1 [iw=ipw], nograph generate(est fx) n(500)
kdensity avg_weekhrs if insample==1 & x==0 [iw=ipw], nograph generate(control) at(est)
kdensity avg_weekhrs if insample==1 & x==1 [iw=ipw], nograph generate(treat) at(est)
line control treat est, sort ytitle(density_test)

I am comparing the graph generated in the last line of code above to the weighted balance plot created by tebalance density:

Code:

teffects ipw (y) (x $controls) if insample==1
tebalance density avg_weekhrs

A careful look shows that the plots, while similar, are not identical, which gives me pause about using this technique to evaluate balance with the stabilized weights i am creating.

Does anyone have any insight or recommendations? I am pasting the two plots below.

Thank you,
Dan
Array Array
.

↧