Quantcast
Channel: Statalist
Viewing all 65500 articles
Browse latest View live

Generate a row rather than Column

$
0
0
Hello Everyone,

May I ask if anyone has an idea if we can generate a row of data in stata? If I have a row, and I need to generate another row, with exact values, however, then rename one variable value in the new row.

I have, for example:

Code:
geoid2           geodisplaylabel               median~e            popula~n          year
603393500       UpperLake-Clearlake       44.7               11596             2009
But I need to split these two words to be

Code:
geoid2           geodisplaylabel                median~e            popula~n          year
603393500       Clearlake                         44.7               11596             2009
603393500       UpperLake                       44.7               11596             2009
Thank you in advance,
Ali

Melogit: variance of random slope looks insignficant, but LRtest says it is?

$
0
0
Can someone please help me with a multilevel modeling question? I've gotten conflicting advice from colleagues, so I thought I'd ask the experts here...

I'm estimating a mixed effects logit model with several variables, trying to determine which should have random slopes. The command I'm using has the form:

melogit Y X1 X2 || group: X1 X2

X1 appears to have a significant effect on Y, but X2 does not.

The random effects portion of my output looks like this:


| Coef. Std. Err. [95% Conf. Interval]
-----------------+----------------------------------------------------------------
group |
var(X1) | .031886 .0129037 .0144256 .07048
var(X2) | 1.981478 1.375139 .5084618 7.721827
var(_cons) | 1.588888 .6971545 .6723764 3.754689
----------------------------------------------------------------------------------
LR test vs. logistic model: chi2(3) = 461.64 Prob > chi2 = 0.0000

One colleague tells me that the random slope on X2 is not necessary, since the variance looks insignificant (and the variable's effect on Y is insignificant). Another colleague tells me a likelihood ratio test is actually necessary for random slopes. So I tested the model with X1 and X2 random slopes against a smaller model with a random slope on X1 only. (FWIW, X2's effect on Y is insignificant in the simpler model too.)

The LR test result (Prob > chi2 = 0.0003) suggests that the random slope on X2 is preferred, even though the variance LOOKED insignificant based on the larger model's output shown above.

Whose advice do I believe? Do I report the model with random slopes on X1 and X2 or the model with a random slope on X1 only? (And is there a source I can cite to explain this decision to reviewers?)

Thanks!

Joe

Create loop based on multiple qualifiers and panel dataset

$
0
0
Hello everyone,

Although I've been trying for several hours, I'm afraid I can't solve this one by myself. To be more specific, I have a dataset with a rotational design for the period 2005-2015, where the sample of each year consists of four subsamples, one that has been selected for the specific year and three others that have been followed for 2, 3 and 4 years, respectively. Every subsample is dropped after a four-year follow-up. Hence, I have (at most) four-year observations for each person (person_id) for variable wstatus (working status) that may or may not change during this period.


Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input float line int year long(hh_id person_id) byte wstatus
64965 2006 138460 13846001 2
64966 2007 138460 13846001 1
64967 2008 138460 13846001 1
64968 2009 138460 13846001 2
64969 2006 138460 13846002 4
64970 2007 138460 13846002 4
64971 2008 138460 13846002 4
64972 2009 138460 13846002 4
64973 2006 138470 13847001 6
64974 2007 138470 13847001 6
64975 2008 138470 13847001 6
64976 2009 138470 13847001 6
64977 2006 138470 13847002 7
64978 2007 138470 13847002 7
64979 2008 138470 13847002 7
64980 2009 138470 13847002 7
64981 2006 138470 13847003 5
64982 2007 138470 13847003 5
64983 2008 138470 13847003 2
64984 2009 138470 13847003 1
end


What I'm trying to do is to construct a loop which generates a new variable, say "trans", whose values depend on the change of the values of wstatus between each combination of two consecutive years. Hence, for the example above, and for person 13846001, the value of the said variable trans would be empty for 2006, while for 2007 would be based on the comparison of the values of wstatus for the years 2006 and 2007, for the year 2008 would be based on the comparison of the values of wstatus for the years 2007 and 2008, and for the year 2009 would be based on the comparison of the values of wstatus for the years 2008 and 2009. I don't think that the specific qualifiers are of much relevance, but for the sake of this example let's say that if wstatus==1 at t and wstatus==1 at t+1 for person i , then trans==100 at t+1 for the same person. I have another 10 combinations to consider.

I did come up with something, but it doesn't work obviously, as it changes the values of all observations for each person

Code:
gen trans=.
by person_id (year), sort: gen yid = _n
summarize yid, meanonly
forval i= 1/`r(max)' {
by person_id: replace trans=100 if wstatus[`i']==1 & wstatus[`i'+1]==1
}

I could really use your help!

Thank you in advance
Thanos

edit: I don't want to change the form of data from long to wide

Reshaping data from wide to long

$
0
0
Hello! I am working with the following data obtained from World Bank's World Development Indicators. I have country level data for years 2005 and 2010 on 3 variables- var1, var2 and var3. Currently, the data is wide and looks like this:


Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input str52 countryname str3 countrycode str97 seriesname str11(yr2005 yr2010)
"Afghanistan"            "AFG" "var1" "365.5487336" "550.514974" 
"Afghanistan"            "AFG" "var2" "88.80699921" "88.35099792"
"Afghanistan"            "AFG" "var3" "16.69099998" "15.24300003"
"Albania"                "ALB" "var1" "3062.592504" "4094.360204"
"Albania"                "ALB" "var2" "74.65299988" "71.75900269"
"Albania"                "ALB" "var3" "53.83599854" "52.68199921"
"Algeria"                "DZA" "var1" "4273.312751" "4463.394675"
"Algeria"                "DZA" "var2" "76.07900238" "74.65899658"
"Algeria"                "DZA" "var3" "13.79599953" "15.49800014"
"American Samoa"         "ASM" "var1" "10644.58107" "10352.82276"
"American Samoa"         "ASM" "var2" ".."          ".."         
"American Samoa"         "ASM" "var3" ".."          ".."         
"Andorra"                "AND" "var1" "48831.92936" "39736.35406"
"Andorra"                "AND" "var2" ".."          ".."         
"Andorra"                "AND" "var3" ".."          ".."         
"Angola"                 "AGO" "var1" "2866.434694" "3585.905553"
"Angola"                 "AGO" "var2" "80.5039978"  "80.47399902"
"Angola"                 "AGO" "var3" "75.81999969" "76.22100067"
"Antigua and Barbuda"    "ATG" "var1" "12857.16429" "12174.6978" 
"Antigua and Barbuda"    "ATG" "var2" ".."          ".."         
"Antigua and Barbuda"    "ATG" "var3" ".."          ".."         
"Argentina"              "ARG" "var1" "8522.522732" "10276.2605" 
"Argentina"              "ARG" "var2" "81.88500214" "80.80200195"
"Argentina"              "ARG" "var3" "56.69800186" "53.70700073"
"Armenia"                "ARM" "var1" "2571.985756" "3218.381655"
"Armenia"                "ARM" "var2" "70.58200073" "76.43699646"
"Armenia"                "ARM" "var3" "53.12200165" "54.84700012"
end
I am trying to reshape this data such that I have 1 column variable "yr" which takes the value 2005 and 2010. Variables var1, var2 and var3 would also be reshaped into columns. In the end, ideally, the data would look something like this:
Countryname Countrycode yr var1 var2 var3
Afganistan AFG 2005 365.549 88.807 16.691
Afganistan AFG 2010 550.515 88.351 15.243
Albania ALB 2005 3062.59 74.653 53.835999
Albania ALB 2010 4094.36 71.759 52.681999
Algeria DZA 2005 4273.31 76.079 13.796
Algeria DZA 2010 4463.39 74.659 15.498
American Samoa ASM 2005 10644.6 .. ..
American Samoa ASM 2010 10352.8 .. ..
Andorra AND 2005 48831.9 .. ..
Andorra AND 2010 39736.4 .. ..
Angola AGO 2005 2866.43 80.504 75.82
Angola AGO 2010 3585.91 80.474 76.221001
Antigua and Barbuda ATG 2005 12857.2 .. ..
Antigua and Barbuda ATG 2010 12174.7 .. ..
Argentina ARG 2005 8522.52 81.885 56.698002
Argentina ARG 2010 10276.3 80.802 53.707001
Armenia ARM 2005 2571.99 70.582 53.122002
Armenia ARM 2010 3218.38 76.437 54.847

I would appreciate any guidance or help in working on this! Thank you!

Removing insignificant variables

$
0
0
Hello,
As an example here, I am trying to estimate the following regression and compare when my dependent variable is non-routine vs routine (that is share of employees doing routine vs non-routine tasks). In reality, I also have 2 other dependent variables between which I want to compare results.
My question is the following: if there are some estimators which come up as significant in the first regression, but insignificant in the second (P>0.1), should I remove those from the second regression to make it more efficient, or just leave

Code:
. xtreg nonroutine using_computer lngva  price_computer total_internet_access sharedegre
> e sharehigher shareother, fe vce(robust)

Fixed-effects (within) regression               Number of obs     =        120
Group variable: industry1                       Number of groups  =         10

R-sq:                                           Obs per group:
     within  = 0.3276                                         min =         12
     between = 0.4580                                         avg =       12.0
     overall = 0.4408                                         max =         12

                                                F(7,9)            =      16.27
corr(u_i, Xb)  = 0.5375                         Prob > F          =     0.0002

                                   (Std. Err. adjusted for 10 clusters in industry1)
------------------------------------------------------------------------------------
                   |               Robust
           nonrout |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------------+----------------------------------------------------------------
    using_computer |   .0014271   .0004206     3.39   0.008     .0004755    .0023786
             lngva |  -.0193869   .0317928    -0.61   0.557    -.0913072    .0525334
    price_computer |   .0014037   .0009901     1.42   0.190     -.000836    .0036434
total_internet_a~s |   .0041153   .0022304     1.85   0.098    -.0009303    .0091609
       sharedegree |   .0926562   .1112741     0.83   0.427    -.1590632    .3443756
       sharehigher |  -.2771514   .1359427    -2.04   0.072    -.5846752    .0303723
        shareother |   .1583427   .0836769     1.89   0.091    -.0309475    .3476329
             _cons |    .200577   .5024723     0.40   0.699    -.9360942    1.337248
-------------------+----------------------------------------------------------------
           sigma_u |   .1681399
           sigma_e |  .01373558
               rho |  .99337076   (fraction of variance due to u_i)
---------------------------------------------------------------------------------

VS

. xtreg routsem using_computer lngva  price_computer total_internet_access sharedegre
> e sharehigher shareother, fe vce(robust)

Fixed-effects (within) regression               Number of obs     =        120
Group variable: industry1                       Number of groups  =         10

R-sq:                                           Obs per group:
     within  = 0.0458                                         min =         12
     between = 0.7000                                         avg =       12.0
     overall = 0.6918                                         max =         12

                                                F(7,9)            =       9.83
corr(u_i, Xb)  = 0.7868                         Prob > F          =     0.0014

                                   (Std. Err. adjusted for 10 clusters in industry1)
------------------------------------------------------------------------------------
                   |               Robust
           routsem |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------------+----------------------------------------------------------------
    using_computer |  -.0003031   .0002802    -1.08   0.308    -.0009371    .0003308
             lngva |  -.0181579   .0289361    -0.63   0.546     -.083616    .0473002
    price_computer |   -.000896   .0002683    -3.34   0.009     -.001503    -.000289
total_internet_a~s |  -.0018722   .0009078    -2.06   0.069    -.0039258    .0001813
       sharedegree |  -.0601111    .083996    -0.72   0.492    -.2501232     .129901
       sharehigher |  -.0041789   .1033247    -0.04   0.969    -.2379157    .2295579
        shareother |  -.0227866   .1205998    -0.19   0.854    -.2956024    .2500292
             _cons |   .7114436   .3188556     2.23   0.053     -.009858    1.432745
-------------------+----------------------------------------------------------------
           sigma_u |   .1552167
           sigma_e |  .01099854
               rho |  .99500405   (fraction of variance due to u_i)
------------------------------------------------------------------------------------

. 
end of do-file
Also - I have performed the Hausman test to ensure that I should use a fixed effects model. Are there any other tests I should consider to check for endogeneity and think about instruments? I am just learning about panel data models now.

Thank you very much.

Interaction term in fixed effects

$
0
0
Hello,

I would like to clarify the following uncertainty: I am analysing the impact of technology on different occupations and use panel data on 11 industries between 2006-2017. I have created a fixed effects model in order to look for the similar effect across all industries after accounting for individual effects, however, I would also like to look later into the different effect of computer_use for each industry. Hence, I decided to use the interaction of computer_use with industry dummies for the second model.
However, when I use the latter regression, other variables that were significant in the first model become now insignificant. Can I still use the first regression to interpret the effect of other variables, and from the latter just refer to different effects of computer_use?
Or does it make my results uncomparable?
Code:
. xtreg nonrout using_computer lngva  price_computer total_internet_access sharedegre
> e sharehigher shareother, fe vce(robust)

Fixed-effects (within) regression               Number of obs     =        120
Group variable: industry1                       Number of groups  =         10

R-sq:                                           Obs per group:
     within  = 0.3276                                         min =         12
     between = 0.4580                                         avg =       12.0
     overall = 0.4408                                         max =         12

                                                F(7,9)            =      16.27
corr(u_i, Xb)  = 0.5375                         Prob > F          =     0.0002

                                   (Std. Err. adjusted for 10 clusters in industry1)
------------------------------------------------------------------------------------
                   |               Robust
           nonrout |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------------+----------------------------------------------------------------
    using_computer |   .0014271   .0004206     3.39   0.008     .0004755    .0023786
             lngva |  -.0193869   .0317928    -0.61   0.557    -.0913072    .0525334
    price_computer |   .0014037   .0009901     1.42   0.190     -.000836    .0036434
total_internet_a~s |   .0041153   .0022304     1.85   0.098    -.0009303    .0091609
       sharedegree |   .0926562   .1112741     0.83   0.427    -.1590632    .3443756
       sharehigher |  -.2771514   .1359427    -2.04   0.072    -.5846752    .0303723
        shareother |   .1583427   .0836769     1.89   0.091    -.0309475    .3476329
             _cons |    .200577   .5024723     0.40   0.699    -.9360942    1.337248
-------------------+----------------------------------------------------------------
           sigma_u |   .1681399
           sigma_e |  .01373558
               rho |  .99337076   (fraction of variance due to u_i)
------------------------------------------------------------------------------------


Now - with the interaction term:

. xtreg nonrout c.using_computer#i.industry1 lngva  price_computer total_internet_acc
> ess sharedegree sharehigher shareother, fe vce(robust)

Fixed-effects (within) regression               Number of obs     =        120
Group variable: industry1                       Number of groups  =         10

R-sq:                                           Obs per group:
     within  = 0.4646                                         min =         12
     between = 0.0007                                         avg =       12.0
     overall = 0.0008                                         max =         12

                                                F(6,9)            =          .
corr(u_i, Xb)  = -0.7915                        Prob > F          =          .

                                   (Std. Err. adjusted for 10 clusters in industry1)
------------------------------------------------------------------------------------
                   |               Robust
           nonrout |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------------+----------------------------------------------------------------
         industry1#|
  c.using_computer |
    Accommodation  |  -.0019848   .0006782    -2.93   0.017    -.0035191   -.0004505
Administrative ..  |   .0011701   .0002696     4.34   0.002     .0005603    .0017799
     Construction  |   .0019143   .0003306     5.79   0.000     .0011663    .0026623
Financial and I..  |  -.0067784   .0034683    -1.95   0.082    -.0146244    .0010675
Information and..  |   .0032694   .0010298     3.17   0.011     .0009398     .005599
    Manufacturing  |   .0017933   .0013022     1.38   0.202    -.0011526    .0047392
Professional, S..  |  -.0004355   .0008249    -0.53   0.610    -.0023016    .0014306
      Real Estate  |   .0019032   .0002549     7.47   0.000     .0013266    .0024799
Transportation ..  |  -.0000783   .0002766    -0.28   0.784    -.0007041    .0005475
  Wholesale trade  |  -.0000112   .0014483    -0.01   0.994    -.0032876    .0032651
                   |
             lngva |   .0000949   .0355875     0.00   0.998    -.0804097    .0805995
    price_computer |   .0013431   .0009608     1.40   0.196    -.0008305    .0035166
total_internet_a~s |   .0037761   .0023165     1.63   0.138    -.0014641    .0090163
       sharedegree |   .0639118   .1022921     0.62   0.548     -.167489    .2953126
       sharehigher |  -.2078717   .1323905    -1.57   0.151    -.5073598    .0916164
        shareother |   .0901104   .0656804     1.37   0.203     -.058469    .2386898
             _cons |    .127795    .461287     0.28   0.788    -.9157087    1.171299
-------------------+----------------------------------------------------------------
           sigma_u |   .3108439
           sigma_e |  .01283018
               rho |  .99829925   (fraction of variance due to u_i)
------------------------------------------------------------------------------------

. 
.

Randomized items

$
0
0
Dear Stata experts,

I have recently finalized a data collection from an on-line platform with embedded Qualtrics survey. My issue pertains to a one particular matrix-type question with 10 sub-items as rows, which have been randomized and assigned a Y/N value by the respondents (column).

The randomization resulted in a different order of presentation of same single items to each respondent. Y/N values listed under the same column (such as one of ten column resulting from the split) pertain therefore to different items, The order of items displayed for each respondent is provided in a summary column (all items separated by vertical bars).
I attempt to illustrate the structure of the data below :

Respondents no Q1_1 Q1_2 Q1_3 .... Q3_DO (display order)
1 Yes Yes No I find it relaxing | I am too lazy to do it | other
2 Yes No Yes I am too lazy to do it | Other | I find it relaxing
....

My goal is to organize the respondents' answers so that the Y/N values are assigned to a column with the corresponding item in a consitent way across all observations. This would result in the use of the current sub-items (listed in the DO column) as independent column labels.
I do not know how to approach this issue and tried several approaches (first splitting the DO variable into multiple variables, concatenating the item to the related value by order of display, possibly reshaping the data based on such inputs..) but unfortunately failed.
I would be very grateful for your help.
With kind regards
Agnieszka
Post-Doc researcher
CBS, Denmark

Heckman Panel Data

$
0
0
Hello everyone!
I have small doubts regarding estimation of Heckman model in the panel data framework..
does the heckman command work the same way in panel as it does in the cross sectional data or is there any difference?
If there are differences, then i would like to apply heckman two step estimation, in that case how to specify the type of data (fixed/random effect) in structural and selection equation?

interpretation of interaction of two logged variables in log-log model

$
0
0
Dear Stata Listers,

Could one advise on how to claculate the marginal effect of one logged variable conditional on another logged variable? Model is as follows: ln(y)=b0+ b1ln(X1) + b2ln(X2) +b3ln(X3) + b5ln(X4) +b6ln(X3)##ln(X4) . The last term is an interaciton between ln(X3) and ln(X4). Any advice on how to calculate the marginal effect of X3 on y given X4 would be appreciated.

Thank you,
Julia

Trying Latex

extracting random effects

$
0
0
Dear Stata users,


I am wondering how to extract random effects after joint model (-stjm-).


The following codes were used but I am not sure whether they are correct or not.


PHP Code:



stjm gfr 
(age gender race sbp smoking) , panel(idsurvm(weibullrfp(1gh(15)


predict re_slope re_intreffects 

I would really appreciate if anyone may help us.


Thank you so much.
Oyun






"Scaled RSS evaluates to missing" problem

$
0
0
Dear Stata users,

I am running nlsur command to regress a system of 2 equations to exercise the Mishkin test for capital market efficiency.

Code:
nlsur (F_Inc = {a0} + {a1}*Inc)(F_Ret = {beta1}*(F_Inc -{a0s}-{a1s}*Inc)), vce(cluster time)
My codes works with 20/21 market samples, however, there is a problem when I tried apply the code to the Chilean sample. Stata returned the following message:

Code:
nlsur (F_Inc = {a0} + {a1}*Inc)(FSizeRet = {beta1}*(F_Inc -{a0s}-{a1s}*Inc)), vce(cluster time)
(obs = 1,456)

Calculating NLS estimates...
Iteration 0:  Residual SS =  1.13e+77
Iteration 1:  Residual SS =  1.13e+77
Iteration 2:  Residual SS =  1.13e+77
Iteration 3:  Residual SS =  1.13e+77
Calculating FGNLS estimates...
Iteration 0:  Scaled RSS =         .
Scaled RSS evaluates to missing; check model specification
r(498);

end of do-file
It seems that the NLS estimates were calculated normally, but the process stopped at computing FGNLS estimates. I cannot figure out the meaning of the error in this case and not sure how to proceed from here on. Is there any solution to this?
Any suggestion is very much appreciated!

Sincerely,

Khanh


p.s: I hereby attach a sample of my data


Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input int(unit_id time) float(F_Inc Inc F_Ret)
1 2001   47.1196   -7.9599  3.0190966e+36
1 2002    78.068   47.1196 -2.7425215e+36
1 2003   64.8466    78.068  3.0190966e+36
1 2004   77.0961   64.8466  3.0190966e+36
1 2005  132.5792   77.0961  1.7127544e+35
1 2006   77.1911  132.5792  1.7127544e+35
1 2007  169.3283   77.1911  1.7127544e+35
1 2008   327.939  169.3283  1.7127544e+35
1 2009   169.772   327.939  1.7127544e+35
1 2010   326.084   169.772 -2.3972388e+37
1 2011   202.933   326.084 -2.3972388e+37
1 2012   201.321   202.933 -2.3972388e+37
1 2013   183.651   201.321 -2.3972388e+37
1 2014   264.874   183.651 -2.3972388e+37
1 2015   261.009   264.874 -2.3972388e+37
1 2016   184.519   261.009  1.7127544e+35
2 2000   47637.8   41319.3  1.7127544e+35
2 2001   54112.1   47637.8 -2.7425215e+36
2 2002   59659.2   54112.1 -2.7425215e+36
2 2003  62503.45   59659.2 -2.7425215e+36
2 2004  75944.64  62503.45 -2.7425215e+36
2 2005  84622.09  75944.64 -2.7425215e+36
2 2006     97059  84622.09  1.7061676e+33
2 2007 109120.04     97059  1.9787994e+37
2 2008  123047.5 109120.04  1.9787994e+37
2 2009 103849.58  123047.5  1.9787994e+37
2 2010 111479.27 103849.58  1.9787994e+37
2 2011 121269.56 111479.27  1.9787994e+37
2 2012 116675.53 121269.56 -2.7425215e+36
2 2013 119422.48 116675.53 -2.7425215e+36
2 2014 129008.15 119422.48 -2.7425215e+36
2 2015 150575.67 129008.15 -2.7425215e+36
2 2016 139620.28 150575.67 -2.7425215e+36
3 2000   47637.8   41319.3              .
3 2001   54112.1   47637.8              .
3 2002   59659.2   54112.1              .
3 2003  62503.45   59659.2              .
3 2004  75944.64  62503.45              .
3 2005  84622.09  75944.64 -2.7425215e+36
3 2006     97059  84622.09  1.7061676e+33
3 2007 109120.04     97059  1.9787994e+37
3 2008  123047.5 109120.04  1.9787994e+37
3 2009 103849.58  123047.5  1.9787994e+37
3 2010 111479.27 103849.58  1.9787994e+37
3 2011 121269.56 111479.27  1.9787994e+37
3 2012 116675.53 121269.56 -2.7425215e+36
3 2013 119422.48 116675.53 -2.7425215e+36
3 2014 129008.15 119422.48 -2.7425215e+36
3 2015 150575.67 129008.15 -2.7425215e+36
3 2016 139620.28 150575.67 -2.7425215e+36
4 2000    7.3755   10.5494   7.567931e+27
4 2001    6.4257    7.3755   7.567931e+27
4 2002    8.2839    6.4257  -6.789422e+31
4 2003   11.6737    8.2839  -6.789422e+31
4 2004   16.1299   11.6737   6.714277e+36
4 2005   17.0245   16.1299   6.714277e+36
4 2006   36.4842   17.0245   6.714277e+36
4 2007   24.6518   36.4842   6.714277e+36
4 2008     15.04   24.6518   6.714277e+36
4 2009    17.053     15.04    3.51309e+32
4 2010    23.771    17.053   6.714277e+36
4 2011    34.029    23.771    3.51309e+32
4 2012     40.11    34.029   6.714277e+36
4 2013    40.236     40.11   6.714277e+36
4 2014    57.339    40.236   6.714277e+36
4 2015    12.057    57.339   6.714277e+36
4 2016    18.503    12.057   6.714277e+36
5 2003  3948.682  3745.293    4.60285e+25
5 2004  3194.007  3948.682    4.60285e+25
5 2005  3535.162  3194.007 -2.3013135e+35
5 2006  3845.955  3535.162 -2.3013135e+35
5 2007 11004.243  3845.955      .14318873
5 2008 10997.284 11004.243       .3665221
5 2009 15444.887 10997.284       .3239335
5 2010  7609.143 15444.887      .33626565
5 2011  8986.569  7609.143      .14254399
5 2012 11073.418  8986.569      .14318873
5 2013  8092.089 11073.418      .14318873
5 2014  4121.302  8092.089       .3044791
5 2015   1348.73  4121.302      .14318873
5 2016  9048.525   1348.73      .14318873
6 2008   292.658   339.923 -2.3013135e+35
6 2009  -421.543   292.658 -2.3013135e+35
6 2010    50.264  -421.543 -2.3013135e+35
6 2011   -96.404    50.264 -2.3013135e+35
6 2012  1275.186   -96.404     -153.16333
6 2013   195.431  1275.186   1.718264e+24
6 2014   133.388   195.431   1.718264e+24
6 2015   346.717   133.388   1.718264e+24
6 2016    898.59   346.717   1.718264e+24
8 2011   -42.059    53.148  3.0190966e+36
8 2012    -8.259   -42.059              .
8 2013     2.423    -8.259              .
8 2014   -97.576     2.423  3.0190966e+36
8 2015     8.434   -97.576  3.0190966e+36
8 2016    59.505     8.434  3.0190966e+36
9 2013    31.233   -22.339   1.718264e+24
9 2014   -19.447    31.233   1.718264e+24
9 2015   -31.542   -19.447   1.718264e+24
9 2016    -5.741   -31.542   1.718264e+24
end

[Help] Marginal effect on Stata

$
0
0
Hi. My question is from Stata (15.1), but also requires some statistical knowledge, which is why I am calling on you.
I'm doing a homework, and I have three different models.
1: ln (y) = b0 + b1 * X + error
2: y = b0 + b1 * ln (x) + error
3: ln (y) = b0 + b1 * ln (x) + error

I have a database with the variables x and y. I have to tell among the 3 models, which assumes a decreasing marginal relationship between Y and X.
How can I do with Stata?

When I use:
Code:
* 1: log-lin
gen ln_y = ln (x)
reg ln_y x
mfx
* 2: lin-log
gen ln_x = ln (x)
reg y ln_x
mfx
* 3: log
reg ln_y ln_x
mfx
At the mfx command, I always get postives results >0 (but I should have a result <0 in order to have a decreasing marginal relationship, isnt it ?).

Even when I derive the functions, I get positive results. Do you have an idea that can prevent from failing this assignement ?




Array Array Array

Renaming Variables

$
0
0
I am using STATA 15. I have a large data set with multiple columns and rows. I am interested in two variables , var1 and var2, each of which have multiple entries. I would like to rename var1 with the same name as var2. However, when I use the code rename var1 var2, I receive the error message variable var2 already defined. When I use the code rename (var1 var2) (var3 var3), I get the following error message: 1 new name specified repeatedly in newname. You requested the name var3 be assigned to 2 existing variables: var1 and var2.

How can I make the two columns of data with the two different variable names both have the same name?

Thank you.

Using Loops to Convert Strings With Letters to Numeric Values

$
0
0
Hello Stata Users :-)

I have a dataset comprised of 71 variables containing responses to items on a satisfaction survey.

All of the variables are measured on a 5-point Likert scale. Most of the variables are stored as numbers, but many are stored as strings.

Missing values for variables stored as numbers show up as "."

Missing values for variables stored as strings show up as "X"

Question: How can I write a looping command to change "X" values contained in string variables to "." values contained in numeric variables?

The code below gives a "type mismatch"

Code:
foreach var of varlist q1-q71 {
    replace `var' = "." if `var'=="X"
}
My hunch is that I must use an "if" statement to separate out string from numeric variables, but I have little experience writing loops within "if" statements. Any help would be very much appreciated.

Again, a sample of the data is below.

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input str1(q15 q16 q17 q18 q19)
"X" "4" "5" "4" "X"
"1" "5" "5" "2" "1"
"5" "5" "5" "5" "5"
"5" "5" "5" "5" "5"
"2" "4" "4" "4" "4"
"4" "4" "3" "2" "3"
"5" "2" "3" "1" "3"
"4" "5" "4" "5" "5"
"5" "5" "5" "4" "5"
"X" "2" "3" "2" "X"
end
Thanks so much,
Adam

Hausman and testparm

$
0
0
Hello everyone,
Just a little confused, I currently have a panel data and I am choosing between using random effects or fixed effects. When I run my regression first without an i.Year variable I find through a Hausman test that FE is more suited for this regression. However I then use testparm in order to see if I need to include a i.Year variable and the result of Prob is less than 0.05 which indicates I should. Once I include the i.year variable in my regression the value of the Hausman test changes completely and goes to 0.9457 from 0.0013 (which was the value of the Hausman test before including the i.Year variable). This new Hausman would indicate that a random effects regression might be more suited. However my questions are;
1. Why does the value of the Hausman change so much?
2. Is it more logical to do the Hausman test including the i.Year variable or without it?

Many thanks
Pepito

Rounding with a macro

$
0
0
Hi,

To display a number rounded down to a user selected number of decimal places, I wrote some code that gave a surprising answer.

Attempt 1

local x = 0.20629048
local dp = 4
local x_round = round(`x', 10^-`dp')
display "this is x `x' this is x_round `x_round'"

The final line results in the following unexpected output - this is x .20629048 this is x_round .2063000000000001


From the following you would expect x_round above to be calculated as .2063

local x = 0.20629048
local dp = 4
display 10^-`dp'
display round(`x', .0001)


However, I can get around the problem with creating a new macro that evaluates the number of decimal places

Attempt 2

local x = 0.20629048
local dp = 4
local dp_value = 10^-`dp'
local x_round = round(`x', `dp_value')
display "this is x `x' this is x_round `x_round'"

The final line results in the following correct output - this is x .20629048 this is x_round .2063

Why does Attempt 1 fail?

Thanks for taking the time to consider this,

Don Vicendese

Seeking an efficient way to establish auxiliary time scales refrenced to data elements

$
0
0
I have been working for 2 + years with data generated from a custom app seeking improvement in treatment of Parkinson's disease. Data are merged from 3 streams, tsset with 1 minute intervals, every day is a panel. Data elements from the app include patient reported symptoms, time, type and quantity of dopamine and related drugs (Rytary and Requip) and other variables. The second stream is biometric data reported by the Apple watch: heart rate, basal and active energy, steps, and the like. The third stream emanates from a Scilab model that estimates concentrations of drugs based on a state space technique.

Drugs are supposed to be taken every 4 hours, at 02 06 10 14 18 and 22 hours, but actual times vary. I seek to establish subpanels based on a new variable aux_axis whose value is zero 60 minutes before a non_zero instance of time_Ry, the time each Rytary is taken, and is 180 120 minutes after actual pill time.

My general approach has been to use subgroup techniques described in Mitchell, MN, "Data Management Using Stata", 2010, Chapter 7, with direct addressing of subscripts, but I haven't succeeded. Is there a better approach to this goal?

sem vs gsem?

$
0
0
Hi everyone.
I want to use sem or gsem to evaluate the associations of different patterns of weight change 1 year after delivery and metabolic outcomes 6 years after delivery. My independent variable is categorical (4 levels). My outcome variables, total cholesterol and LDL-cholesterol, are continuous, as well as my mediating variable (BMI at 6 years). I want to adjust by some sociodemographic variables (some categorical and some continuous). My question is which type of model (sem vs gsem) is more appropriate considering that my independent variable is categorical and my dependent variable is continuous?
Thanks

Generating a variable to count # of &quot;ideal&quot; in each category per observation

$
0
0
Dear all,

I have three variables(idealFruitVeg ; idealfiber ; idealssbweek). Each variable can have either a value of 0 “Ideal” or 1 “Nonideal.”

I would like to get a count of how many observations fall into the “ideal” category for all three variables, how many observations are in the “ideal” category for only two out of three variables, how many observations are in the “ideal” category for only one of the three variables, and how many observations are not in the “ideal” category for any of the variables.

I pasted the syntax for how I accomplished this in Stata, but I am assuming there is a more efficient way to do this because the syntax would be rather long if I was trying to do this with a large number of variables. I am new to Stata and am trying to learn best practices.

Thank you for your help and let me know if you have any clarifying questions. Also, please let me know if there is a better way to post questions in order to accurately convey what I want to accomplish and to allow others to help as easily as possible.



Code:
gen       RidealDiet = . 
replace RidealDiet = 3 if idealFruitVeg ==0 & idealfiber ==0 & idealssbweek ==0
replace RidealDiet = 2 if RidealDiet !=3 & idealFruitVeg==0 & idealfiber ==0   | RidealDiet !=3 & idealFruitVeg==0  & idealssbweek ==0 | RidealDiet !=3 & idealssbweek ==0  & idealfiber ==0
replace RidealDiet = 1 if RidealDiet !=3 & RidealDiet !=2 & idealFruitVeg==0 | RidealDiet !=3 & RidealDiet !=2 & idealssbweek ==0  | RidealDiet !=3 & RidealDiet !=2  & idealfiber ==0
replace RidealDiet = 0 if RidealDiet !=3 & RidealDiet !=2  & RidealDiet !=1 

label define RidealDiet 3 "3 components" 2 "2 components" 1 "1 component" 0 "0 components"
label value RidealDiet RidealDiet


Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input float(idealFruitVeg idealfiber idealssbweek)
0 0 1
1 1 1
1 1 1
0 0 0
1 1 0
1 1 0
1 1 1
1 1 0
1 1 1
1 1 1
0 1 1
0 1 1
1 1 1
1 1 1
0 1 1
1 1 1
0 1 1
0 1 0
1 1 1
1 1 1
1 1 1
1 1 1
1 1 1
1 1 1
1 1 0
1 1 1
0 1 0
0 1 1
0 0 1
1 1 1
end
label values idealFruitVeg idealFruitVeg
label def idealFruitVeg 0 "ideal", modify
label def idealFruitVeg 1 "nonideal", modify
label values idealfiber idealfiber
label def idealfiber 0 "ideal", modify
label def idealfiber 1 "nonideal", modify
label values idealssbweek idealssbweek
label def idealssbweek 0 "ideal", modify
label def idealssbweek 1 "nonideal", modify
Viewing all 65500 articles
Browse latest View live