Quantcast
Channel: Statalist
Viewing all 65664 articles
Browse latest View live

Plot interaction effect after stcox with tvc?

$
0
0
Since proportional hazards assumption is violated, I use tvc option to estimate my model, which surprisingly give me more significant coefficients. However, when I attempt to plot the interaction effect, I was warned that with tvc option stcurve is not available. Can anyone help me by pointing out any user-written command that could get this done? Thanks a lot!

MICE for single imputation

$
0
0
Hello,

I am wondering if you can use multiple imputation for chained equation (MICE) to just perform a single imputed dataset in Stata. I think the command I am using for analysis is not compatible with multiple imputation (the commmand I am using is gllamm, a user generated command for multilevel models). So, is it reasonable to run just one imputation in MICE (see code below) and then run my analysis how I normally would (i.e., not using the mi estimate command)? I choose MICE for imputation because I have multiple variable types that need to be imputed (binary, continuous, and ordinal/categorical) and it seemed to be the correct imputation option to choose given my variety of variable types.

I am open to other single imputation suggestions is anyone has them.

Code:
mi set mlong;
xi: mi register imputed Lr2_number_adults_ LSize_Cat_ Lr_Do_you_own_;

xi: mi register regular Garden_Active_ i.Year Garden_ID LSite_Visit_Curr_or_Prior_ LSold_GID_ LPickups_ LUR_Curr_Yr_or_Prior_ LSOD_Curr_or_Prior_ LKGD_Curr_Or_Prior_ LCommunity_Garden_ LMarket_Garden_ LYr_Act_Prior_ LSoil_Test_Curr_or_Prior_ i.r_L_classes i.r_L_volunteer_3_max i.r_L_social_2_max;
xi: mi impute chained (logit) Lr_Do_you_own_ (regress) Lr2_number_adults_ (ologit) LSize_Cat_ = Garden_Active_ i.Year Garden_ID LCommunity_Garden_ LMarket_Garden_ LSite_Visit_Curr_or_Prior_ LSold_GID_ LPickups_ LUR_Curr_Yr_or_Prior_ LSOD_Curr_or_Prior_ LKGD_Curr_Or_Prior_ LYr_Act_Prior_ LSoil_Test_Curr_or_Prior_ i.r_L_classes i.r_L_volunteer_3_max i.r_L_social_2_max, add(1);

Many thanks,
Alyssa

Numbers instead of variable names

$
0
0
I downloaded a .dta file. All of the variable names are numbers. Is there anything I can do to fix this? Thank you.

How forum software deals with Unicode whitespace

$
0
0
The following dataex output contains several uchar(160):

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input str16 A
"1     1  "
"1     2  "
"1     3  "
"1     4  "
"1     5  "
"1     6  "
"1     7  "
"1     8  "
"1     9  "
end
Do these Unicode whitespace characters survive?

Generating cross-sectional information from employment spells

$
0
0
Hi all,

I am currently working with a polish data set and would like to generate the professional experience for a person from their reported employment spells. I can only conduct cross-sectional analyses, so I need to generate their cumlated experinece up until the last reported episode.

My main problem is that the spells overlap, as many people were simultaneously working in multiple jobs. I'll give an example below for the Person with the ID "89".

HTML Code:
ID    start_y  end_y
89    1964    1967    
89    1967    1968
89    1972    1974
89    1975    1986
89    1985    2013
89    1973    1990
89    1990    2011
89    1999    2001
89    2001    2002
89    1972    1974
89    2013    2015
89    2014    2015
89    2014    2015
I am very new to working with spell data, so I would really appreciate all tipps. What I would ultimately like to do is to generate the cumulated experience, without the overlaps, i.e. in this example from 1964 to 2015 and simulatenously ensure that there aren's any gaps.

Thank you all,

Evelyn

Margins after REGHDFE with log dependent variable

$
0
0
I am using the reghdfe command with a log dependent variable. Then, I am using the margins command for postestimation. However, I would like to "contextualize" the result by putting the margins answers back into the magnitudes of the original variables. There is a lot going on here so I was hoping to get some validation: is this the correct approach and interpretation? Here is an MWE.

sysuse auto, clear
drop if rep78==.
gen lprice = log(price)
reghdfe lprice mpg i.foreign, absorb(FE=rep78) resid
margins foreign, expression(exp(predict(xb)+FE))



Predictive margins Number of obs = 69
Model VCE : OLS

Expression : exp(predict(xb)+FE)

------------------------------------------------------------------------------
| Delta-method
| Margin Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
foreign |
Domestic | 5492.019 287.0945 19.13 0.000 4929.324 6054.713
Foreign | 6576.807 604.7936 10.87 0.000 5391.434 7762.181
------------------------------------------------------------------------------



margins r.foreign, expression(exp(predict(xb)+FE))


Contrasts of predictive margins
Model VCE : OLS

Expression : exp(predict(xb)+FE)

------------------------------------------------
| df chi2 P>chi2
-------------+----------------------------------
foreign | 1 2.26 0.1328
------------------------------------------------

------------------------------------------------------------------------
| Delta-method
| Contrast Std. Err. [95% Conf. Interval]
-----------------------+------------------------------------------------
foreign |
(Foreign vs Domestic) | 1084.789 721.7383 -329.7924 2499.37
------------------------------------------------------------------------


It is a bit silly to do causal interpretation in this scenario, but setting that concern aside: "we assume that for the average car in the dataset, we could charge $5492 if the car is labeled as domestic, whereas it would have a price of $6577 if it was labeled as foreign. Although changing a car's classification to foreign is estimated to increase its value by about $1085, this difference is not statistically significant."

Questions:
  • Is this the right way to convert my estimates back into $? (As I understand it reghdfe does not allow xbd prediction option: https://github.com/sergiocorreia/reghdfe/issues/138. Also, I am using the "expression" option because my understanding is that the typical recommended solution -- raw dependent variable with poisson or GLM before margins -- will be harder to implement in the high-dimensional fixed effects scenario).
  • Is this the right interpretation of the results? e.g. are the estimates and confidence intervals coming out of these commands OK or are there issues with my transformation that I should be aware of?
  • Has anyone experienced any problems with this type of postestimation using reghdfe? As Sergio notes on github, not all examples have been checked: https://github.com/sergiocorreia/reghdfe/issues/32
Thanks in advance!

Hello wonder whether someone knows the formula, by which stata do the f test for equality of coefficients.

$
0
0
Hello, thank you in advance

for example:

code: reg price weight mpg
price Coef. Std. Err. t P>t [95% Conf. Interval]
weight 1.746559 .6413538 2.72 0.008 .467736 3.025382
mpg -49.51222 86.15604 -0.57 0.567 -221.3025 122.278
_cons 1946.069 3597.05 0.54 0.590 -5226.245 9118.382

test weight=mpg
( 1) weight - mpg = 0
F( 1, 71) = 0.36
Prob > F = 0.5514


I wonder whether someone knows the formula of this F test?

Thank you so much!

eprobit and margins

$
0
0
Hi,

I am going to use eprobit for my purpose of regressing a binary variable(y) on another endogenous binary variable(x). The related instrument is called z. The Stata command that I use is as follows:

eprobit y x, endogenous(x= z, probit)

I am interested in marginal effects. However, when I use margins cmd, it gives me an error that "e(sample) does not identify the estimation sample".
I was wondering if anybody know what might be the problem.


Thanks.

Calculating day-level statistics with data that may span across several days

$
0
0
Hi,

Data at hand: patient visit start and end dates/times, with each line representing a unique visit. A sample is below. The visits span from several hours to several days.

Question we're looking to answer: for each day, what proportion of patients who were there in the morning (0800) had left by the end of the day (2359).

The only set of steps that I can sketch in my brain (though not yet sure how to code it) would be to create two variables for each day to create flags to identify whether each observation was present at 0800 then if they had left by 2359, then somehow calculate proportions for each of these variables. Is there an easier way?

Thanks!

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input float(id obsbegin obsend)
 1 1.8619213e+12  1.861961e+12
 2  1.861922e+12 1.8620786e+12
 3  1.861923e+12 1.8619218e+12
 4 1.8619246e+12  1.861973e+12
 5  1.861929e+12  1.861966e+12
 6 1.8619295e+12 1.8619564e+12
 7 1.8619327e+12 1.8620413e+12
 8 1.8619335e+12 1.8619677e+12
 9  1.861934e+12 1.8619636e+12
10 1.8619395e+12 1.8619865e+12
11 1.8619415e+12 1.8619577e+12
12 1.8619433e+12 1.8619715e+12
13  1.861949e+12  1.861985e+12
14  1.861961e+12 1.8620724e+12
15  1.861965e+12 1.8619995e+12
16 1.8619742e+12 1.8619744e+12
17  1.861979e+12 1.8620514e+12
18  1.861987e+12  1.862078e+12
19 1.8619886e+12 1.8620075e+12
20  1.861995e+12  1.862071e+12
21 1.8619962e+12  1.862055e+12
22 1.8620012e+12  1.862016e+12
23 1.8620025e+12 1.8620584e+12
24  1.862021e+12 1.8620544e+12
25 1.8620255e+12 1.8620362e+12
end
format %tc obsbegin
format %tc obsend

percent of correct predictions

$
0
0
Hello, I am trying to estimate the number of correct predictions in a linear model using robust and a linear model with wls. My instructor showed doing this regress diab owgt obese exer cig alc inc coll marr male age, robust
predict probl, xb
generate cdiab=(probl >=0.5 & diab==1)
generate cnodiab=(probl <0.5 & diab==0)
generate correct =(cdiab==1 | cnodiab==1)
sum diab cdiab cnodiab correct

but is this specific for either model? And it only prints out this:
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
diab | 5,051 .1282914 .3344471 0 1
cdiab | 5,051 0 0 0 0
cnodiab | 5,051 .8717086 .3344471 0 1
correct | 5,051 .8717086 .3344471 0 1

It does not include a table that shows the predictions as the instructor said it would. Can anyone tell me if the information I need is hidden in here or if I did something wrong?
Thank you

Plotting a fitted Line for log scale axis

$
0
0
Hello,

I have to plot a simple fitted line for exports relative to GDP (the variable is called "norm_exports") on the y axis and bilateral distance (variable "km") on the x axis. However, I am having problems, since I have to depict the y axis in log scale. I know that I can therefore not use the standard twoway lfit command:

twoway (scatter norm_exports km)(lfit norm_exports km)

Therefore I tried to use predict for a regression of the log of my export variable on distance:

quietly reg log_norm_exports km

predict gr

label var gr "Linear prediction"

twoway (scatter norm_exports km)(line gr km, sort), yscale(log)

However, the code I used is not working. I would be really thankful if one could tell me where my mistake was.

Thanks a lot.

need to drop duplicates depending on how many duplicates there are per observation

$
0
0
I am trying to reshape my data from wide to long, however I need to drop duplicates first. Here is an example of my data of employment by U.S. County:

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input int year byte qtr str1 disclosure_code str41 area_title str43 agglvl_title long(qtrly_estabs_count month1 month2 month3) float countyid str5 time float dup
2009 1 "" "Abbeville County, South Carolina" "County, NAICS Sector -- by ownership sector" 31 1868 1721 1641 1 "20091" 0
2009 2 "" "Abbeville County, South Carolina" "County, NAICS Sector -- by ownership sector" 31 1531 1530 1496 1 "20092" 0
2009 3 "" "Abbeville County, South Carolina" "County, NAICS Sector -- by ownership sector" 32 1495 1480 1433 1 "20093" 0
2009 4 "" "Abbeville County, South Carolina" "County, NAICS Sector -- by ownership sector" 33 1445 1444 1447 1 "20094" 0
2010 1 "" "Abbeville County, South Carolina" "County, NAICS Sector -- by ownership sector" 31 1428 1457 1516 1 "20101" 0
2010 2 "" "Abbeville County, South Carolina" "County, NAICS Sector -- by ownership sector" 31 1540 1570 1572 1 "20102" 0
2010 3 "" "Abbeville County, South Carolina" "County, NAICS Sector -- by ownership sector" 30 1552 1578 1590 1 "20103" 0
2010 4 "" "Abbeville County, South Carolina" "County, NAICS Sector -- by ownership sector" 30 1601 1594 1584 1 "20104" 0
2011 1 "" "Abbeville County, South Carolina" "County, NAICS Sector -- by ownership sector" 30 1525 1567 1576 1 "20111" 0
2011 2 "" "Abbeville County, South Carolina" "County, NAICS Sector -- by ownership sector" 31 1566 1581 1580 1 "20112" 0
2011 3 "" "Abbeville County, South Carolina" "County, NAICS Sector -- by ownership sector" 31 1587 1579 1639 1 "20113" 0
2011 4 "" "Abbeville County, South Carolina" "County, NAICS Sector -- by ownership sector" 30 1658 1668 1636 1 "20114" 0
2012 1 "" "Abbeville County, South Carolina" "County, NAICS Sector -- by ownership sector" 30 1649 1703 1722 1 "20121" 0
2012 2 "" "Abbeville County, South Carolina" "County, NAICS Sector -- by ownership sector" 30 1719 1733 1712 1 "20122" 0
2012 3 "" "Abbeville County, South Carolina" "County, NAICS Sector -- by ownership sector" 30 1723 1727 1734 1 "20123" 0
2012 4 "" "Abbeville County, South Carolina" "County, NAICS Sector -- by ownership sector" 30 1701 1592 1547 1 "20124" 0
2013 1 "" "Abbeville County, South Carolina" "County, NAICS Sector -- by ownership sector" 29 1538 1561 1563 1 "20131" 0
2013 2 "" "Abbeville County, South Carolina" "County, NAICS Sector -- by ownership sector" 29 1572 1577 1584 1 "20132" 0
2013 3 "" "Abbeville County, South Carolina" "County, NAICS Sector -- by ownership sector" 29 1582 1592 1602 1 "20133" 0
2013 4 "" "Abbeville County, South Carolina" "County, NAICS Sector -- by ownership sector" 30 1657 1655 1637 1 "20134" 0
2014 1 "" "Abbeville County, South Carolina" "County, NAICS Sector -- by ownership sector" 31 1740 1741 1760 1 "20141" 0
2014 2 "" "Abbeville County, South Carolina" "County, NAICS Sector -- by ownership sector" 32 1782 1798 1812 1 "20142" 0
2014 3 "" "Abbeville County, South Carolina" "County, NAICS Sector -- by ownership sector" 32 1806 1813 1848 1 "20143" 0
2014 4 "" "Abbeville County, South Carolina" "County, NAICS Sector -- by ownership sector" 32 1789 1789 1783 1 "20144" 0
end
This is what I used to generate the dup variable: quietly by countyid time: gen dup = cond(_N==1,0,_n)

However, the issue is that there are duplicates for the same year, quarter, and county with one dupe listing zeros for employment and the other version listing the actual employment. I sorted the data with this: sort countyid time month1 month2 month3 so that the dupe with zeros would appear before the dupe with employment, giving it a dup value of 1. My intention was to do drop if dup == 1 however i discovered that some of the duplicates come in triplets meaning that there are two lines of zeros, with dup values of 1 and 2, and then the actual employment has a dup value of 3. I thought I could try to use gsort to instead sort the duplicates in descending order so I can do drop if dup >1 but I can't seem to get it to work.

I would appreciate any help!

Getting Stata to recall the last opened directory

$
0
0
Hi,

I have been using Stata 13 on a Windows system and each time I start Stata, CTRL+O took me to the last directory opened in either the main window or that of a DO file editor

After upgrading to Stata 15 it works on the main window but CTRL+O on the DO file editor takes me to c:\data and nothing I do changes this.

Is there anyway to restore the Stata 13 behaviour?

Many thanks
Suhail

Random seed needed

$
0
0
I am looking for a number or string that will be different in different runs of the same program. So far I can only think of using c(current_time) but that only changes once a minute. Is there a way to get the processid? That is what Stata uses for its temporary files but I don't see how to get it myself. The documentation for Stata random functions suggests saving the rng state, but that requires coordination across runs that I can't depend on. Any other suggestions?

Can we determine the significance of changes over time using cross-sectional data?

$
0
0
Dear Statalisters,

I am using data from the 2011 and 2014 Demographic Health Surveys (these are repeated cross sections) to see the aggregate trends in intimate partner violence across type of residence (rural and urban). Does cross sectional data allow us to estimate the significance of the change? E.g., if the prevalence of violence in urban areas in 2011 was 13% and 10% in 2014, is there anyway we could carry out a statistical test in Stata to determine if this change is significant?

Thanks,

Monzur

Convert string variable to a date variable

$
0
0
Dear Stata Users

I have a string variable yearmonth: for Jan 2004 it's stored as "200401". I need to convert it to a readable year/month format. I have used the following code:

Code:
generate month = date( yearmonth ,"YM")
format %tm month
The code above produces strange date variable.

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input str6 yearmonth
"200401"
"200402"
"200403"
"200404"
"200405"
"200406"
"200407"
"200408"
"200409"
"200410"
"200411"
"200412"
end
How to alter a code to get a meaningful year/month variable?
Thank you.

Converting continuous variable into categorical

$
0
0
Hi, I am new to STATA and would really appreciate if someone could advise me on the following:
I would like to generate a new categorical variable that would correspond to histogram bins.
I have 200759 observations, 7064 bins (0.005 widths); min value of the variable is -19.80374 and maximum 15.48759.

I've tried gen with autocode but getting missing values
Thanks.

Interpretation of logit coefficients and rescaling of variables

$
0
0
Dear all,

I am trying interpret the coefficient of logistic regression results and would very much appreciate if you could double-check my thinking on the steps below.

Dependent variable is pass (dummy: passed/not passed).
Motivation is a continuous variable between 0-100, tenure is also continuous (# years) and gender (1: male) and education (1: good university) are dummies.

Code:
logit pass motivation tenure i.gender_male i.edu
HTML Code:
Logistic regression                               Number of obs   =       9777
                                                  LR chi2(4)      =      62.17
                                                  Prob > chi2     =     0.0000
Log likelihood = -5551.7018                       Pseudo R2       =     0.0056

-------------------------------------------------------------------------------
         pass |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
   motivation |  -.0021914   .0012533    -1.75   0.080    -.0046479    .0002651
       tenure |  -.0174614   .0026707    -6.54   0.000     -.022696   -.0122268
1.gender_male |   .1047808   .0992882     1.06   0.291    -.0898205    .2993821
        1.edu |    .233494   .0528765     4.42   0.000      .129858    .3371301
        _cons |  -1.041527   .1336965    -7.79   0.000    -1.303568    -.779487
-------------------------------------------------------------------------------
From the regression output, I would like to interpret the coefficient of motivation and express the result in the following ways:

a) A one unit increase in motivation score, decreases the odds of passing by X %.
First, turn coefficient into an odds ratio: e(-0.002) = 0.998
My understanding of the interpretation is that a one unit increase in motivation score would decrease odds of pass by 0.002% "One unit" in my understanding refers to an increase from 0 to 1 on a scale of 0-100.

b) A 10% increase in motivation score, decreases the odds of passing by X %.
e(-0.002 * 1.1) = 0.997
Interpretation: A 10% increase would decrease odds of pass by 0.003%.

Does this make sense so far?

c) An increase in motivation score by one standard deviation, decreases the odds of passing by X %.
I am not sure how to approach this one - do you have any guidance?


One further question:
Coefficients for motivation are very small as shown in the regression output above. My understanding is that this is due to the scaling of the variable between 0-100. When rescaling the variable to 0-1, coefficients and effect sizes increase by x100:

Code:
logit pass motiv_100 tenure i.gender_male i.edu
HTML Code:
Logistic regression                               Number of obs   =       9777
                                                  LR chi2(4)      =      62.17
                                                  Prob > chi2     =     0.0000
Log likelihood = -5551.7018                       Pseudo R2       =     0.0056

-------------------------------------------------------------------------------
         pass |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
    motiv_100 |  -.2191368   .1253341    -1.75   0.080    -.4647872    .0265136
       tenure |  -.0174614   .0026707    -6.54   0.000     -.022696   -.0122268
1.gender_male |   .1047808   .0992882     1.06   0.291    -.0898205    .2993821
        1.edu |    .233494   .0528765     4.42   0.000      .129858    .3371301
        _cons |  -1.041527   .1336965    -7.79   0.000    -1.303568    -.779487
-------------------------------------------------------------------------------
Repeating step a), this would imply:
e(-0.21) = 0.81
Interpretation: A one unit increase in motivation score would decrease the odds of pass by 19%.
My understanding is that in this case, "one unit" reflects an increase from the minimum score (0) to the maximum score (1) and that's why the effect size is 100x that above. Is that correct?

Thank you very much in advance again for your help!

Time series ARIMA, how to forecast future?

$
0
0
Hi everyone,

I am new to STATA time series, and here is a question I have on how to perform forecast after ARIMA.

I have arima values (for example, 1,0,1), I am curious on what should I do next in order to perform a out of sample forecast until year of 2030?
Is there any tutorial on this? Inside the STATA pdf manual, I could not find relevant tutorial on post ARIMA forecast. The only material I found regards to forecast is on regression that involves multiple variables. However, for my problem, I have only one variable (eg. ship arrivals to port over time).

I would really appreciate if someone could explain to me how to transform my ARIMA (1,0,1) into a regression if thats the case? Or, what should I do next to forecast dynamic forecast? Thank you!!

mlogit + margins + test

$
0
0
Using Stata 15.1, I was wondering how I test for equality of the coefficients for the marginal effects.

Example below.


use http://www.stata-press.com/data/r13/sysdsn1
mlogit insure age i.male i.nonwhite i.site
test [Indemnity]age = [Prepaid]age

margins, dydx(*) predict(outcome(Indemnity))
margins, dydx(*) predict(outcome(Prepaid))
How do I perform the equivalent test test [Indemnity]age = [Prepaid]age?
Viewing all 65664 articles
Browse latest View live


Latest Images

<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>