Combining foreach with import excel

September 17, 2019, 9:14 am

≫ Next: Svy commands for Kappa and Pearson's Correlation Coefficient

Hi, thanks for the help in advance, I'm really pulling my hair out here. I'm having difficulty trying to do the following and trying to figure out why this doesn't work:

local workbook "sheet A" "sheet B"

foreach x in `workbook' {
display "`x'"
import excel "file.xlsx", sheet("`x'") firstrow clear
}
display "completed"

It shows "sheet A" and then stops, only having loaded sheet A. It seems to just be an issue with me not getting my quotation marks correct or bad interacting with import excel. I've tried all sorts of permutations of single, double quotation marks, `" " / " "', etc, all without being able to fix this. I am using stata 12.

↧

Svy commands for Kappa and Pearson's Correlation Coefficient

September 17, 2019, 9:18 am

≫ Next: Non-Convergence in PPML Gravity Model

≪ Previous: Combining foreach with import excel

Hello,
I am new to using survey commands. I have a survey data set and need to calculate Kappa statistics and Pearson correlation coefficients. Is there a way to do this using survey commands? If not, how should I approach this analysis?
Thank you kindly,
Elizabeth

↧

Non-Convergence in PPML Gravity Model

September 17, 2019, 9:31 am

≫ Next: correcting heteroskedasticity but obtaining not normally distributed residuals

≪ Previous: Svy commands for Kappa and Pearson's Correlation Coefficient

Hi Stata users,

I'm trying to run a gravity model on a panel of 43 countries and 15 years using the World Input-Output Dataset (WIOD) but am having some issues with convergence. When I estimate models by sector, with a number of standard gravity variables (i.e. distance, language, etc), the model converges for some sectors but not for all. My preference for estimating my models via PPML is to use the glm command (due to the syntax being similar to reg which I use for estimating in logarithms) with the poisson distribution specified. I've also run the model on the ppml_panel_sg & poi2hdfe commands and found the same result.

I know the issues identified by Joao Santo-Silva and Silvana Tenreyro (https://www.sciencedirect.com/scienc...832?via%3Dihub) that non-convergence can be caused by complete separation of the variables. I've tried using the test that they specify to identify and remove the separated problem variables, but its suggesting that none of the variables are separated (collinear for tradeflow>0). This test is also run by the ppml_panel-sg command, it didn't identify any non-existence issues either but still failed to converge)

Does anyone have any experience of the model still failing to converge? I've looked for other posts here and for academic papers on the problem but have had no joy. Alternatively, are there situations where the proposed test for separation doesn't identify the issue?

I use Stata 15 SE on a Windows computer.

Many thanks for you help!
Elliot

↧

correcting heteroskedasticity but obtaining not normally distributed residuals

September 17, 2019, 9:55 am

≫ Next: difference between alpha, gen() and using egen rowmean()?

≪ Previous: Non-Convergence in PPML Gravity Model

Hi everyone,

I am working with panel data. I have aproximately 200 observations in total. My time variable is birth decade (from 1880 until 1960) and my cross-sectional variable is province of birth.

My dependent variable is years of schooling and my main explanatory var is migrant share. I have been trying to correct the not normally distributed errors (I already created transformation of the variables) - I solved the problem with the residuals

regression:
xi: reg yrsc i.region i.bdec logmigrantshare logurbanshare gapmalefemale cattle

Then I wanted to correct heteroskedasticity, so I transformed yrsc to lnyrsc, heteroskedasticity was gone but the residuals are again not normally distributed

At this point I do not know what is better, if having heteroskedasticity or not normally distributed res. The tests that I used were estat imtest, white and sktest for the residuals

Could someone help me with this problem?

Thanks in advance

↧

difference between alpha, gen() and using egen rowmean()?

September 17, 2019, 11:32 am

≫ Next: Getting different marginal effects when over() was used

≪ Previous: correcting heteroskedasticity but obtaining not normally distributed residuals

Hi all,

I've been under the impression from the stata manual on the alpha command that the generate option sums the values over list items and divides the sum by the number of items. However, I'm getting different results when creating a scale by using the alpha command's generate option and when I manually create the scale by averaging the values over the scale items. I provide a sample of the data below. For example, when I use alpha's generate command, I get a questscore of 2.5 for id=111, whereas the egen rowmean()command produces a questscore of 3.

Here's what I'm doing. Am I missing something about the way the alpha, gen() command works?

Code:

clear
input int id long(quest1 quest2 quest3 quest4 quest5 quest6 quest7 quest8 quest9 quest10 quest11 quest12)
111 3 3 3 3 3 3 3 3 3 3 3 3
112 3 3 3 4 3 5 5 3 1 3 4 4
113 2 4 4 3 2 2 3 2 3 2 5 2
114 3 4 2 1 3 2 2 3 5 2 3 3
115 3 3 3 3 3 3 3 3 3 3 3 3
116 3 3 2 3 3 3 2 3 2 3 2 4
117 4 3 3 2 4 4 3 5 3 3 4 4
118 3 3 3 3 3 1 1 3 1 4 5 3
119 3 2 3 3 3 3 3 3 3 3 3 3
120 3 3 3 3 3 3 3 3 3 3 3 3
end

local q "quest1 quest2 quest3 quest4 quest5 quest6 quest7 quest8 quest9 quest10 quest11 quest12"
alpha `q',   gen(questscore) 

gen quest7r = 6-quest7 //this item is reverse coded in the original
gen quest11r = 6-quest11 //this item is reverse coded in the original
local q "quest1 quest2 quest3 quest4 quest5 quest6 quest7r quest8 quest9 quest10 quest11r quest12"
egen questscore_e = rowmean(`q')

↧

Getting different marginal effects when over() was used

September 17, 2019, 11:38 am

≫ Next: empty cells when estimate table command was used

≪ Previous: difference between alpha, gen() and using egen rowmean()?

Hello,

I am running a mixed effect model with time variable (1, 2, 3, and 4) and treatment variable (0 and 1) controlling for four variables.

Code:

mixed stai i.time##trt_grp v1 v2 v3 v4 || id:, var cov(un)

After this, I use margin command to get marginal effects. However, I get slightly different numbers when I used below commands and I wonder why.
What is the difference between the two commands?

Please help!

Code:

margins trt_grp#time
margins time, over(trt_grp)

↧

empty cells when estimate table command was used

September 17, 2019, 11:42 am

≫ Next: frames and backward compatability of programs

≪ Previous: Getting different marginal effects when over() was used

Hello,

I am running a mixed model with mixed command. I have 5 different outcomes, and I use estimate store to save all 5 models in different names (model1-model5).
To summarize the results, I use estimate table like below. However it only shows the results for model1. The cells for model2-model 5 are empty except N, ll, aic and bic, and I don't know why.

Code:

estimate table model1 model2 model3 model4 model5, star title("") stats(N ll aic bic)

Please help!

↧

frames and backward compatability of programs

September 17, 2019, 1:00 pm

≫ Next: Hypothesis confirmed or rejected? Analyze analysis / empiric

≪ Previous: empty cells when estimate table command was used

Thought I would share this, for anyone trying to maintain backward compatibility in programs they write:

In trying to update a program to take advantage of frames, I find that one form in particular breaks under older versions of Stata.

In Stata 15 this breaks:

Code:

sysuse auto

if c(stata_version) >= 16 {
    frame copy default auto
    frame auto {
        generate gpm = 1/mpg
        summarize gpm
    }
}
else {
    display "What? Me worry?"
}

However, this will run:

Code:

sysuse auto

if c(stata_version) >= 16 {
    frame copy default auto
    frame auto: generate gpm = 1/mpg
    frame auto: summarize gpm
}
else {
    display "What? Me worry?"
}

↧

Hypothesis confirmed or rejected? Analyze analysis / empiric

September 17, 2019, 1:11 pm

≫ Next: Merging two data sets with missing values

≪ Previous: frames and backward compatability of programs

Hi, I have an important question for an extensive housework
https://s17.directupload.net/images/190917/rllgfkdi.png

My hypothesis is that the negative effect of unemployment on mental health tends to be greater for men than for women

X1: Unemployment X2: Gender Y: Mental health

Can someone please tell me what I have to look at for the hypothesis - can be confirmed or not?
Because our lecturer said that the sole consideration of the significance stars is not sufficient.

And I'm surprised that the STATA output at sex has no Significant stars (even tough it is my X2 variable) but only the Interaction

I hope someone can help me or have some advice. iw ould really aprreciate itnce

↧

Merging two data sets with missing values

September 17, 2019, 1:43 pm

≫ Next: Calculate Cancer-specific surval

≪ Previous: Hypothesis confirmed or rejected? Analyze analysis / empiric

Hi,
I want to make a panel data set using two data sets sharing msacode (Metropolitan Statistical Area).
The first data has msacode, year [2008-2016], number of firms. Some Metropolitan Statistical Areas do not have any firms throughout the years. Some MSAs only have some of data throughout the years.

The second data has msacode, year [2006-2019], funding. The data is in similar condition. Some MSAs have some funding in intervals, but many of MSAs do not have any funding throughout the years.

I would like to have a panel dataset with variables of msacode, year, number of firms, funding.

Best,
Sang-Min

<<The first data set>>
[CODE]
* Example generated by -dataex-. To install: ssc install dataex
clear
input double msacode str50 Geographicarea str61 geographicarea int(year establishments)
10180 "Abilene, TX" "Abilene, TX Metro Area" 2008 1
10420 "Akron, OH" "Akron, OH Metro Area" 2008 2
10420 "" "Akron, OH Metro Area" 2009 2
10420 "" "Akron, OH Metro Area" 2010 3
10420 "" "Akron, OH Metro Area" 2011 3
10420 "" "Akron, OH Metro Area" 2012 2
10420 "" "Akron, OH Metro Area" 2013 2
10420 "" "Akron, OH Metro Area" 2014 3
10420 "" "Akron, OH Metro Area" 2015 3
10420 "" "Akron, OH Metro Area" 2016 6
10500 "Albany, GA" "" . .
10540 "" "Albany, OR Metro Area" 2012 1
10540 "" "Albany, OR Metro Area" 2013 1
10540 "" "Albany, OR Metro Area" 2014 1
10540 "" "Albany, OR Metro Area" 2015 1
10540 "" "Albany, OR Metro Area" 2016 1
10580 "Albany-Schenectady-Troy, NY" "Albany-Schenectady-Troy, NY Metro Area" 2008 7
10580 "" "Albany-Schenectady-Troy, NY Metro Area" 2009 7
10580 "" "Albany-Schenectady-Troy, NY Metro Area" 2010 7
10580 "" "Albany-Schenectady-Troy, NY Metro Area" 2011 7
10580 "" "Albany-Schenectady-Troy, NY Metro Area" 2012 9
10580 "" "Albany-Schenectady-Troy, NY Metro Area" 2013 10
10580 "" "Albany-Schenectady-Troy, NY Metro Area" 2014 8
10580 "" "Albany-Schenectady-Troy, NY Metro Area" 2015 12
10580 "" "Albany-Schenectady-Troy, NY Metro Area" 2016 13
10740 "Albuquerque, NM" "Albuquerque, NM Metro Area" 2008 10
10740 "" "Albuquerque, NM Metro Area" 2009 12
10740 "" "Albuquerque, NM Metro Area" 2010 12
10740 "" "Albuquerque, NM Metro Area" 2011 13
10740 "" "Albuquerque, NM Metro Area" 2012 12
10740 "" "Albuquerque, NM Metro Area" 2013 11
10740 "" "Albuquerque, NM Metro Area" 2014 8
10740 "" "Albuquerque, NM Metro Area" 2015 9
10740 "" "Albuquerque, NM Metro Area" 2016 6
10780 "Alexandria, LA" "" . .

<<The second dataset>>
[CODE]
* Example generated by -dataex-. To install: ssc install dataex
clear
input int year float msacode double funding
2010 10740 8861705
2011 10740 9383637
2012 10740 4780751
2013 10740 4242238
2014 10740 1678627
2015 10740 3666798
2016 10740 4098682
2017 10740 2981285
2018 10740 4156384
2019 10740 4074436
2007 11460 17495383
2008 11460 21596139
2009 11460 22826556
2010 11460 21573375
2011 11460 24112374
2012 11460 10951462
2013 11460 9837884
2014 11460 12398916
2015 11460 9769378
2016 11460 10503032
2017 11460 20154588
2018 11460 11617958
2019 11460 11644810
2007 12060 11144370
2008 12060 11689953
2009 12060 13302771
2010 12060 11676999
2011 12060 12570558
2012 12060 6208735
2013 12060 5711568
2014 12060 6229144
2015 12060 5675133
2016 12060 6127842
2017 12060 17673323
2018 12060 10624693
2019 12060 11085241

↧

Calculate Cancer-specific surval

September 17, 2019, 2:54 pm

≫ Next: Assistance with calculating Marginal Effects

≪ Previous: Merging two data sets with missing values

Hi all.
I am a cancer researcher.

I have some variables from data on patients that are from a specific date.
Variable 1: alive
Variable 2: dead by cancer, dead by other, dead by unknown.
I also have data on date on time of death and also for the patients that are alive I have a common date for all of them, that is the date where I gathered the data.

My question, if i want to calcualte cancer-specific death

I have created ONE common variable 'CauseofDeath' with 1 alive 2) death by cancer 3) dead by other 4) dead by unknown.

Then I use the command stset DateDeath, failure(Causeofdeath_n ==2) scale(365.25) origin(DateofDiagnosis) id(BlokstudieID)

Is this the correct way to do it?

↧

Assistance with calculating Marginal Effects

September 17, 2019, 4:18 pm

≫ Next: Recoding a string variable into numeric (instead of destring) and subsequent mismatch of values

≪ Previous: Calculate Cancer-specific surval

Hello,

Stata 14.1 user here! I am currently writing about the effect that sovereign credit rating actions have on the ratings of banks.

Dependent variable:
BDNn - bank is downgraded n notches (n=0,1,2,3)

Independent variables:
SDN_1 - sovereign is downgraded 1 notch (0 or 1)
SDN_2 - sovereign is downgraded 2 notches (0 or 1)
SDN_3 - sovereign is downgraded 3 or more notches (0 or 1)
SUP_1 - sovereign is upgraded 1 notch (0 or 1)
SNWQ - sovereign is on negative watch (0 or 1)
SPWQ - sovereign is on positive watch (0 or 1)
Sovereign rating - control var (from 1 to 24)

There are more variables and multiple versions but this is the 'simpler' version of the model. My initial code is:

xtset ID_number Date
xtologit BDNn SDN_1 SDN_2 SDN_3 SUP_1 SNWQ SPWQ SovereignRating

What I want to known is if SDN_1=1 what the likelihood of BDNn=1,2 or 3 or more notches is?

From what I saw, I should use the margins command. In order to do that it seems I should identify my dummy variables as categorical:

Code:

. xtologit BDNn i.SDN_1 i.SDN_2 i.SDN_3 i.SUP_1 i.SNWQ i.SPWQ SovereignRating

Fitting comparison model:

Iteration 0:   log likelihood = -4900.4703  
Iteration 1:   log likelihood = -4556.0509  
Iteration 2:   log likelihood = -4497.6561  
Iteration 3:   log likelihood = -4485.8624  
Iteration 4:   log likelihood = -4485.7416  
Iteration 5:   log likelihood = -4485.7374  
Iteration 6:   log likelihood = -4485.7371  
Iteration 7:   log likelihood = -4485.7371  

Refining starting values:

Grid node 0:   log likelihood =  -4577.397

Fitting full model:

Iteration 0:   log likelihood =  -4577.397  (not concave)
Iteration 1:   log likelihood = -4491.5689  
Iteration 2:   log likelihood = -4483.3882  
Iteration 3:   log likelihood = -4482.3499  
Iteration 4:   log likelihood = -4482.3486  
Iteration 5:   log likelihood = -4482.3486  

Random-effects ordered logistic regression      Number of obs     =      5,822
Group variable: ID_number                       Number of groups  =      2,016

Random effects u_i ~ Gaussian                   Obs per group:
                                                              min =          1
                                                              avg =        2.9
                                                              max =         21

Integration method: mvaghermite                 Integration pts.  =         12

                                                Wald chi2(7)      =     640.84
Log likelihood  = -4482.3486                    Prob > chi2       =     0.0000

---------------------------------------------------------------------------------
           BDNn |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
----------------+----------------------------------------------------------------
                |
        1.SDN_1 |   1.282189   .1026812    12.49   0.000     1.080937     1.48344
        1.SDN_2 |    1.55724   .1463606    10.64   0.000     1.270378    1.844101
        1.SDN_3 |   3.002522   .2455032    12.23   0.000     2.521345      3.4837
        1.SUP_1 |  -2.525022   .3708289    -6.81   0.000    -3.251833   -1.798211
         1.SNWQ |  -1.916259   .1896509   -10.10   0.000    -2.287968    -1.54455
         1.SPWQ |  -18.92281   17955.19    -0.00   0.999    -35210.45    35172.61
SovereignRating |   .0262496   .0090356     2.91   0.004     .0085401     .043959
----------------+----------------------------------------------------------------
          /cut1 |    .989602    .049952    19.81   0.000     .8916979    1.087506
          /cut2 |   2.977047   .0744732    39.97   0.000     2.831082    3.123012
          /cut3 |   4.301772   .1079608    39.85   0.000     4.090173    4.513372
----------------+----------------------------------------------------------------
      /sigma2_u |   .1130177   .0491958                      .0481531    .2652582
---------------------------------------------------------------------------------
LR test vs. ologit model: chibar2(01) = 6.78          Prob >= chibar2 = 0.0046

Is this correct to assume? Now onto the margins code. I first did margins, dydx(*)

Code:

.  margins, dydx(*)

Average marginal effects                        Number of obs     =      5,822
Model VCE    : OIM

dy/dx w.r.t. : 1.SDN_1 1.SDN_2 1.SDN_3 1.SUP_1 1.SNWQ 1.SPWQ SovereignRating
1._predict   : Pr(0.BDNn), predict(pr outcome(0))
2._predict   : Pr(1.BDNn), predict(pr outcome(1))
3._predict   : Pr(2.BDNn), predict(pr outcome(2))
4._predict   : Pr(3.BDNn), predict(pr outcome(3))

---------------------------------------------------------------------------------
                |            Delta-method
                |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
----------------+----------------------------------------------------------------
1.SDN_1         |
       _predict |
             1  |  -.2684081   .0218655   -12.28   0.000    -.3112637   -.2255526
             2  |   .1566434   .0108431    14.45   0.000     .1353912    .1778955
             3  |   .0712929   .0080412     8.87   0.000     .0555325    .0870534
             4  |   .0404718   .0055584     7.28   0.000     .0295776     .051366
----------------+----------------------------------------------------------------
1.SDN_2         |
       _predict |
             1  |  -.3235553   .0297191   -10.89   0.000    -.3818037   -.2653068
             2  |   .1673386   .0100742    16.61   0.000     .1475936    .1870836
             3  |   .0965402   .0130516     7.40   0.000     .0709596    .1221208
             4  |   .0596764   .0101186     5.90   0.000     .0398443    .0795086
----------------+----------------------------------------------------------------
1.SDN_3         |
       _predict |
             1  |  -.5485901   .0277538   -19.77   0.000    -.6029866   -.4941937
             2  |   .1052249   .0261592     4.02   0.000     .0539539    .1564959
             3  |   .2171439   .0180633    12.02   0.000     .1817404    .2525474
             4  |   .2262213   .0396679     5.70   0.000     .1484737     .303969
----------------+----------------------------------------------------------------
1.SUP_1         |
       _predict |
             1  |   .2702918   .0164749    16.41   0.000     .2380016     .302582
             2  |  -.2042009   .0137561   -14.84   0.000    -.2311623   -.1772394
             3  |   -.045458   .0032109   -14.16   0.000    -.0517512   -.0391648
             4  |  -.0206329   .0018991   -10.86   0.000    -.0243551   -.0169107
----------------+----------------------------------------------------------------
1.SNWQ          |
       _predict |
             1  |   .2471957   .0143714    17.20   0.000     .2190282    .2753631
             2  |  -.1850587   .0118024   -15.68   0.000    -.2081909   -.1619264
             3  |  -.0426171   .0030132   -14.14   0.000    -.0485229   -.0367114
             4  |  -.0195199   .0017978   -10.86   0.000    -.0230436   -.0159962
----------------+----------------------------------------------------------------
1.SPWQ          |
       _predict |
             1  |   .3063149   .0063064    48.57   0.000     .2939546    .3186752
             2  |  -.2350874   .0057033   -41.22   0.000    -.2462656   -.2239092
             3  |  -.0492198   .0027646   -17.80   0.000    -.0546383   -.0438014
             4  |  -.0220077   .0018719   -11.76   0.000    -.0256766   -.0183388
----------------+----------------------------------------------------------------
SovereignRating |
       _predict |
             1  |  -.0048165   .0016495    -2.92   0.004    -.0080495   -.0015835
             2  |   .0032462   .0011103     2.92   0.003       .00107    .0054224
             3  |   .0010353   .0003599     2.88   0.004     .0003299    .0017407
             4  |    .000535   .0001895     2.82   0.005     .0001636    .0009064
---------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.

but as per https://www.statalist.org/forums/for...nt-interaction the results may be meaningless. If not, then how would these be interpreted?

Would doing margins, dydx(SDN_1) predict(pu0 outcome(1)) or margins, dydx(SDN_1) - for each independent variable - be better? Any other suggestion?

Thank you very much in advance!

↧

Recoding a string variable into numeric (instead of destring) and subsequent mismatch of values

September 17, 2019, 6:55 pm

≫ Next: Multiple tabulate for different years

≪ Previous: Assistance with calculating Marginal Effects

I have some questions regarding an issue I have encountered when trying to change a string variable into a numeric (categorical) one. I am writing my question in a series of commands to show my steps.

My variable is called "confused" - with a 5-point Likert scale response for possible answers (1=Very unlikely, 2=Unlikely, 3=Neutral, 4=Likely, 5=Very likely). By chance, the category "Unlikely" appears not to have been selected by survey respondents for this particular variable, and there were 23 respondents who didn't complete the question.

Here is my variable:
. tab confused, missing

Consider a |
patient who |
you think |
might have |
delirium. How |
likely do the |
words li | Freq. Percent Cum.
--------------+-----------------------------------
| 23 10.13 10.13
Likely | 66 29.07 39.21
Neutral | 2 0.88 40.09
Very likely | 135 59.47 99.56
Very unlikely | 1 0.44 100.00
--------------+-----------------------------------
Total | 227 100.00

For analysis, I want to change the categorical Likert responses into numeric data with labels.

1) First, I tried destring:
. destring confused, replace

The error message was:
confused: contains nonnumeric characters; no replace

2) I wasn't sure why I got this message, so I tried encode:
encode confused, generate(confused2)

. d confused2

storage display value
variable name type format label variable label
------------------------------------------------------------------------------------------------------------------------------------------------------
confused2 long %13.0g confused2
Consider a patient who you think might have delirium. How likely do the words li

3) This looks good, but I would like to ensure that the correct categories have the correct numeric labels for future analysis. I prefer to have numbers as the primary data source so that if I need to do things quickly with future commands, I can write <if confused ==1> instead of <if confused =="Very unlikely">.

Initially I thought:
. label define likertlbl 1 "Very unlikely" 2 "Unlikely" 3 "Neutral" 4 "Likely" 5 "Very likely"

. label values confused2 likertlbl

. tab confused2, missing

Consider a |
patient who |
you think |
might have |
delirium. How |
likely do the |
words li | Freq. Percent Cum.
--------------+-----------------------------------
Very unlikely | 66 29.07 29.07
Unlikely | 2 0.88 29.96
Neutral | 135 59.47 89.43
Likely | 1 0.44 89.87
. | 23 10.13 100.00
--------------+-----------------------------------
Total | 227 100.00

But the output "confused2" here doesn't match the original "confused" categories - the response "likely" should have had n=66 from the original data, but here it has been made to match the "very unlikely" response.

I think this is because the responses were not in hierarchical order in the "confused" variable as they appear alphabetical in the list. Therefore when I used the <label define> command, it assumed that everything was already in hierarchical order in "confused2" when it assigned labels.

FIRST QUESTION: Is my interpretation of this mismatch correct?

4) So, to try again, I performed the following commands

. generate confused3 = 1 if confused =="Very unlikely"
(226 missing values generated)

. replace confused3 = 2 if confused =="Unlikely"
(0 real changes made)

. replace confused3 = 3 if confused =="Neutral"
(2 real changes made)

. replace confused3 = 4 if confused =="Likely"
(66 real changes made)

. replace confused3 = 5 if confused =="Very likely"
(135 real changes made)

. tab confused3, missing

confused3 | Freq. Percent Cum.
------------+-----------------------------------
1 | 1 0.44 0.44
3 | 2 0.88 1.32
4 | 66 29.07 30.40
5 | 135 59.47 89.87
. | 23 10.13 100.00
------------+-----------------------------------
Total | 227 100.00

Now I have the correct "match" - I know there were 66 responses that selected "likely" where 4=likely.
I can now attach my labels like this:

. label values confused3 likertlbl

. tab confused3, missing

confused3 | Freq. Percent Cum.
--------------+-----------------------------------
Very unlikely | 1 0.44 0.44
Neutral | 2 0.88 1.32
Likely | 66 29.07 30.40
Very likely | 135 59.47 89.87
. | 23 10.13 100.00
--------------+-----------------------------------
Total | 227 100.00

So in the end, I have achieved what I wanted. However this was very labor intensive!

SECOND QUESTION: Is there an easier way to achieve what I wanted than going through the code in section 4?

Previously I have made the changes for the numeric categories in Excel, so that when I import into Stata, they are already numeric. And then I simply assign the variable labels. I was hoping to do everything in Stata at once so that there was a clear log file for all commands with data cleaning and manipulation.

Apologies if these are simple questions and I have made it a long-winded scenario!

Any assistance appreciated.

↧

Multiple tabulate for different years

September 17, 2019, 7:06 pm

≫ Next: Industry Adjusted Measure

≪ Previous: Recoding a string variable into numeric (instead of destring) and subsequent mismatch of values

Hello,

I am trying to put together a table to see the frequency of occurrence of 1 (it is a dummy variable) for a specific variable (a) conditioned to another variable (status, which goes from 1 to 7. Also want one of the lines to not be conditioned at all). Also, I would like to group them together and have a collumn for each year (year). And to top it off, I need to use [aw=w].

Is there a simple way to put all this information in one table? Or at least one table for every year? I so far have done it manually by just using tabulate and checking the frequency of occurence of 1 for each year and for each value of status.

Thank you in advance.

↧

Industry Adjusted Measure

September 17, 2019, 8:52 pm

≫ Next: cmset choice models in Stata 16

≪ Previous: Multiple tabulate for different years

hello,
Please guide me about this comment.
I am using agency cost measure as dependent variable (selling plus admin expenses over sales). I have received a comment from a journal as;

"I am sure that the agency cost varies significantly across industries. For example, the agency cost in the financial industry is certainly different from that in the construction industry or the high-tech industry. As such, I recommend that the authors compute an industry-adjusted agency cost measure for each firm. Specifically, the industry average of the agency cost proxy should be deducted from that agency cost proxy of each firm-year that operates in that industry. The firm-year should be excluded from the computation of the industry average (i.e., exclude the focal firm-year). This test can be performed on the main analysis and reported in an appendix"

How to address this comment Please guide me.. And how to compute industry average of agency cost measure?

↧

cmset choice models in Stata 16

September 17, 2019, 9:24 pm

≫ Next: New to stata need help.

≪ Previous: Industry Adjusted Measure

I am trying to figure out the new choice model (cmset) analysis for results of the discrete choice experiment (DCE).

I have a panel data set with unlabeled alternatives: that is my respondents saw two options per choice-set, there were no alternative specific variables, the attribute-levels varied across choice sets. Each respondent saw 12 choice sets.

How do I run this command9CMset) with unlabelled choice tasks? I would like to use the cmclogit and cmmixlogit.
So far I tried:
cmset id scenario(the 12 choice sets), noalternatives
cmset id gr-var, no alternatives

Also, is there an option for latent class analysis under cmset ?

Thank you.
Kind regards,
Elena

↧

New to stata need help.

September 17, 2019, 10:12 pm

≫ Next: annual cross sectional regression versus panel regression

≪ Previous: cmset choice models in Stata 16

hi my final year project involves analyzing bicycle crashes in my state, im new to the software stata and would like some help. ive downloade the excel file but cant seem to filter male and females among bicycle users and non bicycle user. ive attached it and what i basically need is a way to filter the males and females amongst bicycle and non bicycles users to find out. this will not only help in this but can allow me to filter other variables too.
Thanks
Best Regards Muhammad Sarmad Latif

↧

annual cross sectional regression versus panel regression

September 17, 2019, 10:40 pm

≫ Next: Merge Issue

≪ Previous: New to stata need help.

Dear all
In one article I have read that they estimate "average coefficients from ten annual (2005–2014) cross-sectional regressions independently". Is this same as doing a panel regression?If not how both are different?

↧

Merge Issue

September 17, 2019, 11:24 pm

≫ Next: New to stata need help.

≪ Previous: annual cross sectional regression versus panel regression

Hello Everyone,

Hope you are doing great. I am running my code, but right in the middle of my code I need to merge one file which contains calculated weights from interview frequencies. But when I run my code Stata says that I am using the old syntax of a merge. Please have a look at the data and suggest what should I do to avoid this problem. I have read the manual for Merge command but still confused. I have attached complete data from weights file and example of data from main file.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(wt06 wt12 wt24 wt36 cum36 wt06x wt12x wt24x wt36x cum36x)
      2.620274       5.240549    .14448807     -4.951572    .14448807     1.0909091     2.1818182 .6857143 -.8103896 .6857143
  5.498567e-17  1.0997134e-16     .9724288     1.9448576     .9724288 -4.516161e-17 -9.032323e-17 .6857143 1.3714286 .6857143
-1.3746417e-17 -2.7492835e-17 -6.65337e-18 1.4186095e-17 -6.65337e-18             0             0        0         0        0
end

Now here is the type of data from the main file

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input double hhidpn byte hacohort float(age married phealth rgovt sgovt sitearnsemp ritearnsemp working)
1100032701 1 50 1   0         0     0 16832.605 15360.938 100
1100032701 1 53 1 100     12000     0 13345.703         0   0
1100032702 1 47 1   0         0     0 18127.424  14268.16 100
1100032702 1 50 1   0         0 12000         0 13345.703 100
1100067401 3 58 0   0         0     0         0 10145.454 100
1100067401 3 62 0   0     16672     0         0         0   0
1100067401 3 60 0   0     15400     0         0         0   0
1100121501 3 71 0   0     14400     0         0         0   0
1100121501 3 75 0   0     16800     0         0         0   0
1100121501 3 73 0   0     15400     0         0         0   0
1100149201 3 62 1   0         0 28000  5890.279         0   0
1100149201 3 58 1   0         0     0  45090.91         0   0
1100149201 3 60 1   0         0     0  45090.91         0   0
1100149202 3 58 1   0         0     0         0  47345.45 100
1100149202 3 60 1   0         0     0         0  48096.28 100
1100149202 3 62 1   0     28000     0         0  5890.279   0
1100181601 1 60 1   0      6000     0 2330.6687  2530.856   0
1100181602 1 55 1   0         0  6000  2848.595  1295.541 100
1100181602 1 57 1   0         0     .         .         0 100
1100188101 3 84 0   0     11680     0         0         0   0
1100188101 3 82 1 100     10640 11410         0         0   0
1100188101 3 80 1   0      8760 11780         0         0   0
1100188102 3 87 1   0     11410 10640         0         0   0
1100188102 3 86 1   0     11780  8760         0         0   0
1100188102 3 89 1   0     19100     0         0         0   0
1100193701 3 50 0 100      4800     0         0         0   .
1100213201 1 72 0 100      6600     0         0         0   0
1100213201 1 69 0 100      9600     0         0 1147.6398   0
1100213601 3 62 1 100     19600 22400         0         0   0
1100213601 3 60 1   0     18000 20100         0  16061.21   0
1100213601 3 58 1   0         0 17520         0  33818.18   0
1100213603 3 57 1   0     17520     0  33818.18         0   0
1100213603 3 61 1   0     22400 19600         0         0   0
1100213603 3 59 1   0     20100 18000  16061.21         0   0
1100218002 3 85 1 100     14000 14000         0         0   0
1100218002 3 83 1   0     12600 12600         0         0   0
1100218003 3 71 1 100     14000 14000         0         0   0
1100218003 3 69 1   0     12600 12600         0         0   0
1100235501 1 70 1   0     14400  7080         0         0   0
1100235502 1 70 1   0      7080 14400         0         0   0
1100252501 3 67 1   0     14700 16730         0         0   0
1100252501 3 71 1   0     15400 16800         0         0   0
1100252501 3 69 1   0     15220 17960         0         0   0
1100252502 3 69 1   0     17960 15220         0         0   0
1100252502 3 67 1   0     16730 14700         0         0   0
1100252502 3 71 1   0     16800 15400         0         0   .
1100257301 3 74 1   0      5600 22980         0         0   0
1100257301 3 76 1   0      6240 23010         0         0   0
1100257302 3 70 1   0     23010  6240         0         0   0
1100257302 3 68 1   0     22980  5600         0         0   0
1100280001 3 56 0   0         0     0         0         0 100
1100280001 3 59 0   0     27120     0         0  37866.08   0
1100280001 3 57 0   0         0     0         0   22482.1   0
1100296502 3 88 0   0         0     0         0         0   0
1100296502 3 90 0   0      9000     0         0         0   0
1100296502 3 86 0   0      8400     0         0         0   0
1100319401 3 74 1   0     21140     .         .         0   0
1100319401 3 73 1   0     21580     0         0         0   0
1100319401 3 76 1   0     21440     .         .         0   0
1100319402 3 72 1   0         0 21580         0         0   0
1100321301 3 69 0   0     14000     0         0         0   0
1100321301 3 68 0   0     13200     0         0         0   0
1100321301 3 71 0   0     14560     0         0         0   0
1100368301 3 49 1   0         0     0 25983.635  22861.09 100
1100368302 3 50 1   0         0     0  22861.09 25983.635 100
1100423401 1 52 1   0     12072     .         .         0   0
1100423401 1 59 1   0     14840     0 15781.817         0   0
1100423401 1 55 1   0     12600     0 13345.703         0   0
1100423402 1 52 1   0         0 12600         0 13345.703 100
1100423402 1 56 1   0         0 14840         0 15781.817 100
1100437901 1 70 0   0      6576     0         0         0   0
end

Here is my code which illustrates how i am merging these two datasets

Code:

clear all

local output "D:\Share\Output"
local programs "D:\Share\Programs"

local samples "under60_hosp_insured" // age60to64_hosp_insured over65_hosp_insured"
local fes "cohortXwaveXcountry"
local outcomes "oop_spend working ritearnsemp sitearnsemp hgovt hittot rhlthlm phealth fphealth"
local isocountry "Austria Belgium Czech Denmark France Germany Italy Spain Sweden Switzerland "

cap log close
log using "D:\Share\output\implied_effects_log.log", replace

foreach samp in `samples' {
foreach fe in `fes' {
foreach cont in `isocountry' {

mat drop _all
local colnames ""
local col_list_1 ""
local col_list_2 ""
local col_list_IE ""

* Keep only those hospitalized 
use "D:\Share\Data\SHARE_long.dta" if `samp'==1 & `cont'==1, clear

*Get implied effects weighting matrix to be used at the end of this do-file
preserve
do "D:\Share\Programs\SHARE Implied Effects Weights.do"
restore
merge using "D:\Share\output\implied_effects_matrix.dta"
drop _merge

mkmat wt06  wt12  wt24  wt36  cum36  if _n<=3
mkmat wt06x wt12x wt24x wt36x cum36x if _n<=3


* Generate event time dummies
drop if evt_time<-$window
drop if evt_time>$window

forv i = 0/$window {
    if (`i' < 0) {
        local j = abs(`i')
        gen evt_f`j' = (evt_time == `i')
    }
    else {
        gen evt_l`i' = (evt_time == `i')
    }
}

egen cohortXwave = group(hacohort wave)
egen cohortXwaveXcountry = group(hacohort wave isocountry)

* Define number of variables for "implied effects" matrix
local J = 0
foreach outcome of varlist `outcomes' {
    local J = `J' + 1
}
matrix define results_IEs = J(33,`J',.)

local j = 1
foreach v of varlist `outcomes' { 

    local controls ""
    if "`fe'"=="hhidpn" {
        xi i.wave
        drop _Iwave_11
        local controls "_I*"
        if regexm("`v'","_c")==1 {
            drop _Iwave_2 _Iwave_3 
        }
        
        drop _Iwave_2 _Iwave_3 
    }

    di "FE = `fe', Sample = `samp', rregi = `cont' , Var = `v'"
    *areg `v' evt_time evt_l* `controls' [pweight=rwtresp], absorb(`fes')  cluster(hhidpn)
    areg `v' evt_time evt_l* `controls' [pweight=rwtresp], absorb(`fes')  cluster(hhidpn)
    *areg `v' evt_time evt_l* `controls' i.country_fe  [pweight=rwtresp], absorb(`fes')  cluster(hhidpn)
    *areg `v' evt_time evt_l* `controls' dum*  [pweight=rwtresp], absorb(`fes')  cluster(hhidpn)
    *reg `v' evt_time evt_l* `controls' i.country_fe i.cohortXwave [pweight=rwtresp], cluster(hhidpn)
    *reghdfe `v' evt_time evt_l* `controls' [pweight=rwtresp], absorb(country_fe cohortXwave) cluster(hhidpn)
        
    *log close
    
    *Saves N, number of individuals, and effective sample size to matrix
    local N = e(N)
    local C = e(N_clust)
    local R= e(r2)
    
    * Save first four rows as N, unique individuals, weighted individuals, and R-squared
    di "`N' \ `C' \ `R' " 
    mat input N=(`N' \ `C' \ `R' )
    mat rown N="N" "Indiv" "R2"
    
    * Save coefficients and add to column
    matrix eb = e(b)
    matrix eb = (N\ eb')
    

    * Save variance-covariance matrix
    matrix var= (e(V))
    local colnames: coln var
    matrix list var            // YU ADDED THIS
    
    local n=0
    * Drop SE matrix from prior run
    cap mat drop se

    * Clean up matrices for output
    foreach col in `colnames'  {
        local n=`n'+1
        mat c`n'=var[`n'..., `n']
        local rownames: rown c`n'

        foreach w of local rownames  {
            local rw_c`n' `rw_c`n'' `w'_`col'
        }
        
        matrix rown c`n'= `rw_c`n''
        matrix coln c`n'= `v'
        matrix se=(nullmat(se)\ c`n')
        cap mat drop c`n' 
        local rw_c`n' ""
    }
    
    if regexm("`v'","_c")==1 {
        mat se=(N\se)
        matrix results_ses_2=(nullmat(results_ses_2), se)
        matrix results_coefs_2 = (nullmat(results_coefs_2), eb)
        local col_list_2 `col_list_2' `v'
    }    
        
     {
        mat se=(N\se)
        matrix results_ses_1=(nullmat(results_ses_1), se)
        matrix results_coefs_1 = (nullmat(results_coefs_1), eb)
        local col_list_1 `col_list_1' `v'
    }
    
    * Calculating implied effects:
    * (lincom takes the last estimates stored)
        
    *Using Earnings weights
    lincom wt06[1,1]*evt_l0 + wt06[2,1]*evt_l1 + wt06[3,1]*evt_l2
        matrix results_IEs[1,`j'] = r(estimate)
        matrix results_IEs[2,`j'] = r(se)
    lincom wt12[1,1]*evt_l0 + wt12[2,1]*evt_l1 + wt12[3,1]*evt_l2
        matrix results_IEs[3,`j'] = r(estimate)
        matrix results_IEs[4,`j'] = r(se)
    lincom wt24[1,1]*evt_l0 + wt24[2,1]*evt_l1 + wt24[3,1]*evt_l2
        matrix results_IEs[5,`j'] = r(estimate)
        matrix results_IEs[6,`j'] = r(se)    
    lincom wt36[1,1]*evt_l0 + wt36[2,1]*evt_l1 + wt36[3,1]*evt_l2
        matrix results_IEs[7,`j'] = r(estimate)
        matrix results_IEs[8,`j'] = r(se)
    lincom cum36[1,1]*evt_l0 + cum36[2,1]*evt_l1 + cum36[3,1]*evt_l2
        matrix results_IEs[9,`j'] = r(estimate)
        matrix results_IEs[10,`j'] = r(se)    
    test wt12[1,1]*evt_l0 + wt12[2,1]*evt_l1 + wt12[3,1]*evt_l2 = wt36[1,1]*evt_l0 + wt36[2,1]*evt_l1 + wt36[3,1]*evt_l2
        matrix results_IEs[11,`j'] = r(p)
    
    *Using OOP weights
    lincom wt06x[1,1]*evt_l0 + wt06x[2,1]*evt_l1 + wt06x[3,1]*evt_l2
        matrix results_IEs[12,`j'] = r(estimate)
        matrix results_IEs[13,`j'] = r(se)
    lincom wt12x[1,1]*evt_l0 + wt12x[2,1]*evt_l1 + wt12x[3,1]*evt_l2
        matrix results_IEs[14,`j'] = r(estimate)
        matrix results_IEs[15,`j'] = r(se)
    lincom wt24x[1,1]*evt_l0 + wt24x[2,1]*evt_l1 + wt24x[3,1]*evt_l2
        matrix results_IEs[16,`j'] = r(estimate)
        matrix results_IEs[17,`j'] = r(se)    
    lincom wt36x[1,1]*evt_l0 + wt36x[2,1]*evt_l1 + wt36x[3,1]*evt_l2
        matrix results_IEs[18,`j'] = r(estimate)
        matrix results_IEs[19,`j'] = r(se)
    lincom cum36x[1,1]*evt_l0 + cum36x[2,1]*evt_l1 + cum36x[3,1]*evt_l2
        matrix results_IEs[20,`j'] = r(estimate)
        matrix results_IEs[21,`j'] = r(se)
    test wt12x[1,1]*evt_l0 + wt12x[2,1]*evt_l1 + wt12x[3,1]*evt_l2 = wt36x[1,1]*evt_l0 + wt36x[2,1]*evt_l1 + wt36x[3,1]*evt_l2
        matrix results_IEs[22,`j'] = r(p)    
    
    *Using LFP weights
    lincom wt06x[1,1]*evt_l0 + wt06x[2,1]*evt_l1 + wt06x[3,1]*evt_l2
        matrix results_IEs[23,`j'] = r(estimate)
        matrix results_IEs[24,`j'] = r(se)
    lincom wt12x[1,1]*evt_l0 + wt12x[2,1]*evt_l1 + wt12x[3,1]*evt_l2
        matrix results_IEs[25,`j'] = r(estimate)
        matrix results_IEs[26,`j'] = r(se)
    lincom wt24x[1,1]*evt_l0 + wt24x[2,1]*evt_l1 + wt24x[3,1]*evt_l2
        matrix results_IEs[27,`j'] = r(estimate)
        matrix results_IEs[28,`j'] = r(se)    
    lincom wt36x[1,1]*evt_l0 + wt36x[2,1]*evt_l1 + wt36x[3,1]*evt_l2
        matrix results_IEs[29,`j'] = r(estimate)
        matrix results_IEs[30,`j'] = r(se)
    lincom cum36x[1,1]*evt_l0 + cum36x[2,1]*evt_l1 + cum36x[3,1]*evt_l2
        matrix results_IEs[31,`j'] = r(estimate)
        matrix results_IEs[32,`j'] = r(se)
    test wt12x[1,1]*evt_l0 + wt12x[2,1]*evt_l1 + wt12x[3,1]*evt_l2 = wt36x[1,1]*evt_l0 + wt36x[2,1]*evt_l1 + wt36x[3,1]*evt_l2
        matrix results_IEs[33,`j'] = r(p)    
    
    local col_list_IE `col_list_IE' `v'    
    local j = `j' + 1    
        
} // outcomes

* Labeling rows of implied effects table
* NOTE: 36a indicate the annual effect at 36 months
    * the 36m are the average annual effects that are presented in the paper
local r="b_6mEarn se_6mEarn b_12mEarn se_12mEarn b_24mEarn se_24mEarn b_36mEarn se_36mEarn b_cum36Earn se_cum36Earn p_Earn b_6mOOP se_6mOOP b_12mOOP se_12mOOP b_24mOOP se_24mOOP b_36mOOP se_36mOOP b_cum36OOP se_cum36OOP p_OOP b_6mLFP se_6mLFP b_12mLFP se_12mLFP b_24mLFP se_24mLFP b_36mLFP se_36mLFP b_cum36LFP se_cum36LFP p_LFP"
mat rown results_IEs=`r'
        
* Outputting and saving results
local types = "coefs ses"
foreach type of local types {
      drop _all         
     mat coln results_`type'_1=`col_list_1'
     svmat2 results_`type'_1, names(col) rnames(var) 

     order var
    
     outsheet using "`output'\SHARE_ES_`type'_`samp'_`cont'_$window.txt", replace
 } // end foreach type of local types

drop _all
mat coln results_IEs=`col_list_IE'
svmat2 results_IEs, names(col) rnames(var) full
order var
mat list results_IEs
outsheet using "`output'\SHARE_IEs_`samp'_`cont'_$window.txt", replace  
  
} // samples
} // fes
} //country

log close

↧

New to stata need help.

September 17, 2019, 11:34 pm

≫ Next: Data Set required for Economic Turmoil and Conflict

≪ Previous: Merge Issue

↧