Since proportional hazards assumption is violated, I use tvc option to estimate my model, which surprisingly give me more significant coefficients. However, when I attempt to plot the interaction effect, I was warned that with tvc option stcurve is not available. Can anyone help me by pointing out any user-written command that could get this done? Thanks a lot!
↧
Plot interaction effect after stcox with tvc?
↧
MICE for single imputation
Hello,
I am wondering if you can use multiple imputation for chained equation (MICE) to just perform a single imputed dataset in Stata. I think the command I am using for analysis is not compatible with multiple imputation (the commmand I am using is gllamm, a user generated command for multilevel models). So, is it reasonable to run just one imputation in MICE (see code below) and then run my analysis how I normally would (i.e., not using the mi estimate command)? I choose MICE for imputation because I have multiple variable types that need to be imputed (binary, continuous, and ordinal/categorical) and it seemed to be the correct imputation option to choose given my variety of variable types.
I am open to other single imputation suggestions is anyone has them.
Many thanks,
Alyssa
I am wondering if you can use multiple imputation for chained equation (MICE) to just perform a single imputed dataset in Stata. I think the command I am using for analysis is not compatible with multiple imputation (the commmand I am using is gllamm, a user generated command for multilevel models). So, is it reasonable to run just one imputation in MICE (see code below) and then run my analysis how I normally would (i.e., not using the mi estimate command)? I choose MICE for imputation because I have multiple variable types that need to be imputed (binary, continuous, and ordinal/categorical) and it seemed to be the correct imputation option to choose given my variety of variable types.
I am open to other single imputation suggestions is anyone has them.
Code:
mi set mlong; xi: mi register imputed Lr2_number_adults_ LSize_Cat_ Lr_Do_you_own_; xi: mi register regular Garden_Active_ i.Year Garden_ID LSite_Visit_Curr_or_Prior_ LSold_GID_ LPickups_ LUR_Curr_Yr_or_Prior_ LSOD_Curr_or_Prior_ LKGD_Curr_Or_Prior_ LCommunity_Garden_ LMarket_Garden_ LYr_Act_Prior_ LSoil_Test_Curr_or_Prior_ i.r_L_classes i.r_L_volunteer_3_max i.r_L_social_2_max; xi: mi impute chained (logit) Lr_Do_you_own_ (regress) Lr2_number_adults_ (ologit) LSize_Cat_ = Garden_Active_ i.Year Garden_ID LCommunity_Garden_ LMarket_Garden_ LSite_Visit_Curr_or_Prior_ LSold_GID_ LPickups_ LUR_Curr_Yr_or_Prior_ LSOD_Curr_or_Prior_ LKGD_Curr_Or_Prior_ LYr_Act_Prior_ LSoil_Test_Curr_or_Prior_ i.r_L_classes i.r_L_volunteer_3_max i.r_L_social_2_max, add(1);
Many thanks,
Alyssa
↧
↧
Numbers instead of variable names
I downloaded a .dta file. All of the variable names are numbers. Is there anything I can do to fix this? Thank you.
↧
How forum software deals with Unicode whitespace
The following dataex output contains several uchar(160):
Do these Unicode whitespace characters survive?
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input str16 A "1 1 " "1 2 " "1 3 " "1 4 " "1 5 " "1 6 " "1 7 " "1 8 " "1 9 " end
↧
Generating cross-sectional information from employment spells
Hi all,
I am currently working with a polish data set and would like to generate the professional experience for a person from their reported employment spells. I can only conduct cross-sectional analyses, so I need to generate their cumlated experinece up until the last reported episode.
My main problem is that the spells overlap, as many people were simultaneously working in multiple jobs. I'll give an example below for the Person with the ID "89".
I am very new to working with spell data, so I would really appreciate all tipps. What I would ultimately like to do is to generate the cumulated experience, without the overlaps, i.e. in this example from 1964 to 2015 and simulatenously ensure that there aren's any gaps.
Thank you all,
Evelyn
I am currently working with a polish data set and would like to generate the professional experience for a person from their reported employment spells. I can only conduct cross-sectional analyses, so I need to generate their cumlated experinece up until the last reported episode.
My main problem is that the spells overlap, as many people were simultaneously working in multiple jobs. I'll give an example below for the Person with the ID "89".
HTML Code:
ID start_y end_y 89 1964 1967 89 1967 1968 89 1972 1974 89 1975 1986 89 1985 2013 89 1973 1990 89 1990 2011 89 1999 2001 89 2001 2002 89 1972 1974 89 2013 2015 89 2014 2015 89 2014 2015
Thank you all,
Evelyn
↧
↧
Margins after REGHDFE with log dependent variable
I am using the reghdfe command with a log dependent variable. Then, I am using the margins command for postestimation. However, I would like to "contextualize" the result by putting the margins answers back into the magnitudes of the original variables. There is a lot going on here so I was hoping to get some validation: is this the correct approach and interpretation? Here is an MWE.
sysuse auto, clear
drop if rep78==.
gen lprice = log(price)
reghdfe lprice mpg i.foreign, absorb(FE=rep78) resid
margins foreign, expression(exp(predict(xb)+FE))
Predictive margins Number of obs = 69
Model VCE : OLS
Expression : exp(predict(xb)+FE)
------------------------------------------------------------------------------
| Delta-method
| Margin Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
foreign |
Domestic | 5492.019 287.0945 19.13 0.000 4929.324 6054.713
Foreign | 6576.807 604.7936 10.87 0.000 5391.434 7762.181
------------------------------------------------------------------------------
margins r.foreign, expression(exp(predict(xb)+FE))
Contrasts of predictive margins
Model VCE : OLS
Expression : exp(predict(xb)+FE)
------------------------------------------------
| df chi2 P>chi2
-------------+----------------------------------
foreign | 1 2.26 0.1328
------------------------------------------------
------------------------------------------------------------------------
| Delta-method
| Contrast Std. Err. [95% Conf. Interval]
-----------------------+------------------------------------------------
foreign |
(Foreign vs Domestic) | 1084.789 721.7383 -329.7924 2499.37
------------------------------------------------------------------------
It is a bit silly to do causal interpretation in this scenario, but setting that concern aside: "we assume that for the average car in the dataset, we could charge $5492 if the car is labeled as domestic, whereas it would have a price of $6577 if it was labeled as foreign. Although changing a car's classification to foreign is estimated to increase its value by about $1085, this difference is not statistically significant."
Questions:
sysuse auto, clear
drop if rep78==.
gen lprice = log(price)
reghdfe lprice mpg i.foreign, absorb(FE=rep78) resid
margins foreign, expression(exp(predict(xb)+FE))
Predictive margins Number of obs = 69
Model VCE : OLS
Expression : exp(predict(xb)+FE)
------------------------------------------------------------------------------
| Delta-method
| Margin Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
foreign |
Domestic | 5492.019 287.0945 19.13 0.000 4929.324 6054.713
Foreign | 6576.807 604.7936 10.87 0.000 5391.434 7762.181
------------------------------------------------------------------------------
margins r.foreign, expression(exp(predict(xb)+FE))
Contrasts of predictive margins
Model VCE : OLS
Expression : exp(predict(xb)+FE)
------------------------------------------------
| df chi2 P>chi2
-------------+----------------------------------
foreign | 1 2.26 0.1328
------------------------------------------------
------------------------------------------------------------------------
| Delta-method
| Contrast Std. Err. [95% Conf. Interval]
-----------------------+------------------------------------------------
foreign |
(Foreign vs Domestic) | 1084.789 721.7383 -329.7924 2499.37
------------------------------------------------------------------------
It is a bit silly to do causal interpretation in this scenario, but setting that concern aside: "we assume that for the average car in the dataset, we could charge $5492 if the car is labeled as domestic, whereas it would have a price of $6577 if it was labeled as foreign. Although changing a car's classification to foreign is estimated to increase its value by about $1085, this difference is not statistically significant."
Questions:
- Is this the right way to convert my estimates back into $? (As I understand it reghdfe does not allow xbd prediction option: https://github.com/sergiocorreia/reghdfe/issues/138. Also, I am using the "expression" option because my understanding is that the typical recommended solution -- raw dependent variable with poisson or GLM before margins -- will be harder to implement in the high-dimensional fixed effects scenario).
- Is this the right interpretation of the results? e.g. are the estimates and confidence intervals coming out of these commands OK or are there issues with my transformation that I should be aware of?
- Has anyone experienced any problems with this type of postestimation using reghdfe? As Sergio notes on github, not all examples have been checked: https://github.com/sergiocorreia/reghdfe/issues/32
↧
Hello wonder whether someone knows the formula, by which stata do the f test for equality of coefficients.
Hello, thank you in advance
for example:
code: reg price weight mpg
test weight=mpg
( 1) weight - mpg = 0
F( 1, 71) = 0.36
Prob > F = 0.5514
I wonder whether someone knows the formula of this F test?
Thank you so much!
for example:
code: reg price weight mpg
price | Coef. | Std. Err. | t | P>t | [95% Conf. | Interval] |
weight | 1.746559 | .6413538 | 2.72 | 0.008 | .467736 | 3.025382 |
mpg | -49.51222 | 86.15604 | -0.57 | 0.567 | -221.3025 | 122.278 |
_cons | 1946.069 | 3597.05 | 0.54 | 0.590 | -5226.245 | 9118.382 |
test weight=mpg
( 1) weight - mpg = 0
F( 1, 71) = 0.36
Prob > F = 0.5514
I wonder whether someone knows the formula of this F test?
Thank you so much!
↧
eprobit and margins
Hi,
I am going to use eprobit for my purpose of regressing a binary variable(y) on another endogenous binary variable(x). The related instrument is called z. The Stata command that I use is as follows:
eprobit y x, endogenous(x= z, probit)
I am interested in marginal effects. However, when I use margins cmd, it gives me an error that "e(sample) does not identify the estimation sample".
I was wondering if anybody know what might be the problem.
Thanks.
I am going to use eprobit for my purpose of regressing a binary variable(y) on another endogenous binary variable(x). The related instrument is called z. The Stata command that I use is as follows:
eprobit y x, endogenous(x= z, probit)
I am interested in marginal effects. However, when I use margins cmd, it gives me an error that "e(sample) does not identify the estimation sample".
I was wondering if anybody know what might be the problem.
Thanks.
↧
Calculating day-level statistics with data that may span across several days
Hi,
Data at hand: patient visit start and end dates/times, with each line representing a unique visit. A sample is below. The visits span from several hours to several days.
Question we're looking to answer: for each day, what proportion of patients who were there in the morning (0800) had left by the end of the day (2359).
The only set of steps that I can sketch in my brain (though not yet sure how to code it) would be to create two variables for each day to create flags to identify whether each observation was present at 0800 then if they had left by 2359, then somehow calculate proportions for each of these variables. Is there an easier way?
Thanks!
Data at hand: patient visit start and end dates/times, with each line representing a unique visit. A sample is below. The visits span from several hours to several days.
Question we're looking to answer: for each day, what proportion of patients who were there in the morning (0800) had left by the end of the day (2359).
The only set of steps that I can sketch in my brain (though not yet sure how to code it) would be to create two variables for each day to create flags to identify whether each observation was present at 0800 then if they had left by 2359, then somehow calculate proportions for each of these variables. Is there an easier way?
Thanks!
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input float(id obsbegin obsend) 1 1.8619213e+12 1.861961e+12 2 1.861922e+12 1.8620786e+12 3 1.861923e+12 1.8619218e+12 4 1.8619246e+12 1.861973e+12 5 1.861929e+12 1.861966e+12 6 1.8619295e+12 1.8619564e+12 7 1.8619327e+12 1.8620413e+12 8 1.8619335e+12 1.8619677e+12 9 1.861934e+12 1.8619636e+12 10 1.8619395e+12 1.8619865e+12 11 1.8619415e+12 1.8619577e+12 12 1.8619433e+12 1.8619715e+12 13 1.861949e+12 1.861985e+12 14 1.861961e+12 1.8620724e+12 15 1.861965e+12 1.8619995e+12 16 1.8619742e+12 1.8619744e+12 17 1.861979e+12 1.8620514e+12 18 1.861987e+12 1.862078e+12 19 1.8619886e+12 1.8620075e+12 20 1.861995e+12 1.862071e+12 21 1.8619962e+12 1.862055e+12 22 1.8620012e+12 1.862016e+12 23 1.8620025e+12 1.8620584e+12 24 1.862021e+12 1.8620544e+12 25 1.8620255e+12 1.8620362e+12 end format %tc obsbegin format %tc obsend
↧
↧
percent of correct predictions
Hello, I am trying to estimate the number of correct predictions in a linear model using robust and a linear model with wls. My instructor showed doing this regress diab owgt obese exer cig alc inc coll marr male age, robust
predict probl, xb
generate cdiab=(probl >=0.5 & diab==1)
generate cnodiab=(probl <0.5 & diab==0)
generate correct =(cdiab==1 | cnodiab==1)
sum diab cdiab cnodiab correct
but is this specific for either model? And it only prints out this:
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
diab | 5,051 .1282914 .3344471 0 1
cdiab | 5,051 0 0 0 0
cnodiab | 5,051 .8717086 .3344471 0 1
correct | 5,051 .8717086 .3344471 0 1
It does not include a table that shows the predictions as the instructor said it would. Can anyone tell me if the information I need is hidden in here or if I did something wrong?
Thank you
predict probl, xb
generate cdiab=(probl >=0.5 & diab==1)
generate cnodiab=(probl <0.5 & diab==0)
generate correct =(cdiab==1 | cnodiab==1)
sum diab cdiab cnodiab correct
but is this specific for either model? And it only prints out this:
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
diab | 5,051 .1282914 .3344471 0 1
cdiab | 5,051 0 0 0 0
cnodiab | 5,051 .8717086 .3344471 0 1
correct | 5,051 .8717086 .3344471 0 1
It does not include a table that shows the predictions as the instructor said it would. Can anyone tell me if the information I need is hidden in here or if I did something wrong?
Thank you
↧
Plotting a fitted Line for log scale axis
Hello,
I have to plot a simple fitted line for exports relative to GDP (the variable is called "norm_exports") on the y axis and bilateral distance (variable "km") on the x axis. However, I am having problems, since I have to depict the y axis in log scale. I know that I can therefore not use the standard twoway lfit command:
twoway (scatter norm_exports km)(lfit norm_exports km)
Therefore I tried to use predict for a regression of the log of my export variable on distance:
quietly reg log_norm_exports km
predict gr
label var gr "Linear prediction"
twoway (scatter norm_exports km)(line gr km, sort), yscale(log)
However, the code I used is not working. I would be really thankful if one could tell me where my mistake was.
Thanks a lot.
I have to plot a simple fitted line for exports relative to GDP (the variable is called "norm_exports") on the y axis and bilateral distance (variable "km") on the x axis. However, I am having problems, since I have to depict the y axis in log scale. I know that I can therefore not use the standard twoway lfit command:
twoway (scatter norm_exports km)(lfit norm_exports km)
Therefore I tried to use predict for a regression of the log of my export variable on distance:
quietly reg log_norm_exports km
predict gr
label var gr "Linear prediction"
twoway (scatter norm_exports km)(line gr km, sort), yscale(log)
However, the code I used is not working. I would be really thankful if one could tell me where my mistake was.
Thanks a lot.
↧
need to drop duplicates depending on how many duplicates there are per observation
I am trying to reshape my data from wide to long, however I need to drop duplicates first. Here is an example of my data of employment by U.S. County:
This is what I used to generate the dup variable: quietly by countyid time: gen dup = cond(_N==1,0,_n)
However, the issue is that there are duplicates for the same year, quarter, and county with one dupe listing zeros for employment and the other version listing the actual employment. I sorted the data with this: sort countyid time month1 month2 month3 so that the dupe with zeros would appear before the dupe with employment, giving it a dup value of 1. My intention was to do drop if dup == 1 however i discovered that some of the duplicates come in triplets meaning that there are two lines of zeros, with dup values of 1 and 2, and then the actual employment has a dup value of 3. I thought I could try to use gsort to instead sort the duplicates in descending order so I can do drop if dup >1 but I can't seem to get it to work.
I would appreciate any help!
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input int year byte qtr str1 disclosure_code str41 area_title str43 agglvl_title long(qtrly_estabs_count month1 month2 month3) float countyid str5 time float dup 2009 1 "" "Abbeville County, South Carolina" "County, NAICS Sector -- by ownership sector" 31 1868 1721 1641 1 "20091" 0 2009 2 "" "Abbeville County, South Carolina" "County, NAICS Sector -- by ownership sector" 31 1531 1530 1496 1 "20092" 0 2009 3 "" "Abbeville County, South Carolina" "County, NAICS Sector -- by ownership sector" 32 1495 1480 1433 1 "20093" 0 2009 4 "" "Abbeville County, South Carolina" "County, NAICS Sector -- by ownership sector" 33 1445 1444 1447 1 "20094" 0 2010 1 "" "Abbeville County, South Carolina" "County, NAICS Sector -- by ownership sector" 31 1428 1457 1516 1 "20101" 0 2010 2 "" "Abbeville County, South Carolina" "County, NAICS Sector -- by ownership sector" 31 1540 1570 1572 1 "20102" 0 2010 3 "" "Abbeville County, South Carolina" "County, NAICS Sector -- by ownership sector" 30 1552 1578 1590 1 "20103" 0 2010 4 "" "Abbeville County, South Carolina" "County, NAICS Sector -- by ownership sector" 30 1601 1594 1584 1 "20104" 0 2011 1 "" "Abbeville County, South Carolina" "County, NAICS Sector -- by ownership sector" 30 1525 1567 1576 1 "20111" 0 2011 2 "" "Abbeville County, South Carolina" "County, NAICS Sector -- by ownership sector" 31 1566 1581 1580 1 "20112" 0 2011 3 "" "Abbeville County, South Carolina" "County, NAICS Sector -- by ownership sector" 31 1587 1579 1639 1 "20113" 0 2011 4 "" "Abbeville County, South Carolina" "County, NAICS Sector -- by ownership sector" 30 1658 1668 1636 1 "20114" 0 2012 1 "" "Abbeville County, South Carolina" "County, NAICS Sector -- by ownership sector" 30 1649 1703 1722 1 "20121" 0 2012 2 "" "Abbeville County, South Carolina" "County, NAICS Sector -- by ownership sector" 30 1719 1733 1712 1 "20122" 0 2012 3 "" "Abbeville County, South Carolina" "County, NAICS Sector -- by ownership sector" 30 1723 1727 1734 1 "20123" 0 2012 4 "" "Abbeville County, South Carolina" "County, NAICS Sector -- by ownership sector" 30 1701 1592 1547 1 "20124" 0 2013 1 "" "Abbeville County, South Carolina" "County, NAICS Sector -- by ownership sector" 29 1538 1561 1563 1 "20131" 0 2013 2 "" "Abbeville County, South Carolina" "County, NAICS Sector -- by ownership sector" 29 1572 1577 1584 1 "20132" 0 2013 3 "" "Abbeville County, South Carolina" "County, NAICS Sector -- by ownership sector" 29 1582 1592 1602 1 "20133" 0 2013 4 "" "Abbeville County, South Carolina" "County, NAICS Sector -- by ownership sector" 30 1657 1655 1637 1 "20134" 0 2014 1 "" "Abbeville County, South Carolina" "County, NAICS Sector -- by ownership sector" 31 1740 1741 1760 1 "20141" 0 2014 2 "" "Abbeville County, South Carolina" "County, NAICS Sector -- by ownership sector" 32 1782 1798 1812 1 "20142" 0 2014 3 "" "Abbeville County, South Carolina" "County, NAICS Sector -- by ownership sector" 32 1806 1813 1848 1 "20143" 0 2014 4 "" "Abbeville County, South Carolina" "County, NAICS Sector -- by ownership sector" 32 1789 1789 1783 1 "20144" 0 end
However, the issue is that there are duplicates for the same year, quarter, and county with one dupe listing zeros for employment and the other version listing the actual employment. I sorted the data with this: sort countyid time month1 month2 month3 so that the dupe with zeros would appear before the dupe with employment, giving it a dup value of 1. My intention was to do drop if dup == 1 however i discovered that some of the duplicates come in triplets meaning that there are two lines of zeros, with dup values of 1 and 2, and then the actual employment has a dup value of 3. I thought I could try to use gsort to instead sort the duplicates in descending order so I can do drop if dup >1 but I can't seem to get it to work.
I would appreciate any help!
↧
Getting Stata to recall the last opened directory
Hi,
I have been using Stata 13 on a Windows system and each time I start Stata, CTRL+O took me to the last directory opened in either the main window or that of a DO file editor
After upgrading to Stata 15 it works on the main window but CTRL+O on the DO file editor takes me to c:\data and nothing I do changes this.
Is there anyway to restore the Stata 13 behaviour?
Many thanks
Suhail
I have been using Stata 13 on a Windows system and each time I start Stata, CTRL+O took me to the last directory opened in either the main window or that of a DO file editor
After upgrading to Stata 15 it works on the main window but CTRL+O on the DO file editor takes me to c:\data and nothing I do changes this.
Is there anyway to restore the Stata 13 behaviour?
Many thanks
Suhail
↧
↧
Random seed needed
I am looking for a number or string that will be different in different runs of the same program. So far I can only think of using c(current_time) but that only changes once a minute. Is there a way to get the processid? That is what Stata uses for its temporary files but I don't see how to get it myself. The documentation for Stata random functions suggests saving the rng state, but that requires coordination across runs that I can't depend on. Any other suggestions?
↧
Can we determine the significance of changes over time using cross-sectional data?
Dear Statalisters,
I am using data from the 2011 and 2014 Demographic Health Surveys (these are repeated cross sections) to see the aggregate trends in intimate partner violence across type of residence (rural and urban). Does cross sectional data allow us to estimate the significance of the change? E.g., if the prevalence of violence in urban areas in 2011 was 13% and 10% in 2014, is there anyway we could carry out a statistical test in Stata to determine if this change is significant?
Thanks,
Monzur
I am using data from the 2011 and 2014 Demographic Health Surveys (these are repeated cross sections) to see the aggregate trends in intimate partner violence across type of residence (rural and urban). Does cross sectional data allow us to estimate the significance of the change? E.g., if the prevalence of violence in urban areas in 2011 was 13% and 10% in 2014, is there anyway we could carry out a statistical test in Stata to determine if this change is significant?
Thanks,
Monzur
↧
Convert string variable to a date variable
Dear Stata Users
I have a string variable yearmonth: for Jan 2004 it's stored as "200401". I need to convert it to a readable year/month format. I have used the following code:
The code above produces strange date variable.
How to alter a code to get a meaningful year/month variable?
Thank you.
I have a string variable yearmonth: for Jan 2004 it's stored as "200401". I need to convert it to a readable year/month format. I have used the following code:
Code:
generate month = date( yearmonth ,"YM") format %tm month
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input str6 yearmonth "200401" "200402" "200403" "200404" "200405" "200406" "200407" "200408" "200409" "200410" "200411" "200412" end
Thank you.
↧
Converting continuous variable into categorical
Hi, I am new to STATA and would really appreciate if someone could advise me on the following:
I would like to generate a new categorical variable that would correspond to histogram bins.
I have 200759 observations, 7064 bins (0.005 widths); min value of the variable is -19.80374 and maximum 15.48759.
I've tried gen with autocode but getting missing values
Thanks.
I would like to generate a new categorical variable that would correspond to histogram bins.
I have 200759 observations, 7064 bins (0.005 widths); min value of the variable is -19.80374 and maximum 15.48759.
I've tried gen with autocode but getting missing values
Thanks.
↧
↧
Interpretation of logit coefficients and rescaling of variables
Dear all,
I am trying interpret the coefficient of logistic regression results and would very much appreciate if you could double-check my thinking on the steps below.
Dependent variable is pass (dummy: passed/not passed).
Motivation is a continuous variable between 0-100, tenure is also continuous (# years) and gender (1: male) and education (1: good university) are dummies.
From the regression output, I would like to interpret the coefficient of motivation and express the result in the following ways:
a) A one unit increase in motivation score, decreases the odds of passing by X %.
First, turn coefficient into an odds ratio: e(-0.002) = 0.998
My understanding of the interpretation is that a one unit increase in motivation score would decrease odds of pass by 0.002% "One unit" in my understanding refers to an increase from 0 to 1 on a scale of 0-100.
b) A 10% increase in motivation score, decreases the odds of passing by X %.
e(-0.002 * 1.1) = 0.997
Interpretation: A 10% increase would decrease odds of pass by 0.003%.
Does this make sense so far?
c) An increase in motivation score by one standard deviation, decreases the odds of passing by X %.
I am not sure how to approach this one - do you have any guidance?
One further question:
Coefficients for motivation are very small as shown in the regression output above. My understanding is that this is due to the scaling of the variable between 0-100. When rescaling the variable to 0-1, coefficients and effect sizes increase by x100:
Repeating step a), this would imply:
e(-0.21) = 0.81
Interpretation: A one unit increase in motivation score would decrease the odds of pass by 19%.
My understanding is that in this case, "one unit" reflects an increase from the minimum score (0) to the maximum score (1) and that's why the effect size is 100x that above. Is that correct?
Thank you very much in advance again for your help!
I am trying interpret the coefficient of logistic regression results and would very much appreciate if you could double-check my thinking on the steps below.
Dependent variable is pass (dummy: passed/not passed).
Motivation is a continuous variable between 0-100, tenure is also continuous (# years) and gender (1: male) and education (1: good university) are dummies.
Code:
logit pass motivation tenure i.gender_male i.edu
HTML Code:
Logistic regression Number of obs = 9777 LR chi2(4) = 62.17 Prob > chi2 = 0.0000 Log likelihood = -5551.7018 Pseudo R2 = 0.0056 ------------------------------------------------------------------------------- pass | Coef. Std. Err. z P>|z| [95% Conf. Interval] --------------+---------------------------------------------------------------- motivation | -.0021914 .0012533 -1.75 0.080 -.0046479 .0002651 tenure | -.0174614 .0026707 -6.54 0.000 -.022696 -.0122268 1.gender_male | .1047808 .0992882 1.06 0.291 -.0898205 .2993821 1.edu | .233494 .0528765 4.42 0.000 .129858 .3371301 _cons | -1.041527 .1336965 -7.79 0.000 -1.303568 -.779487 -------------------------------------------------------------------------------
a) A one unit increase in motivation score, decreases the odds of passing by X %.
First, turn coefficient into an odds ratio: e(-0.002) = 0.998
My understanding of the interpretation is that a one unit increase in motivation score would decrease odds of pass by 0.002% "One unit" in my understanding refers to an increase from 0 to 1 on a scale of 0-100.
b) A 10% increase in motivation score, decreases the odds of passing by X %.
e(-0.002 * 1.1) = 0.997
Interpretation: A 10% increase would decrease odds of pass by 0.003%.
Does this make sense so far?
c) An increase in motivation score by one standard deviation, decreases the odds of passing by X %.
I am not sure how to approach this one - do you have any guidance?
One further question:
Coefficients for motivation are very small as shown in the regression output above. My understanding is that this is due to the scaling of the variable between 0-100. When rescaling the variable to 0-1, coefficients and effect sizes increase by x100:
Code:
logit pass motiv_100 tenure i.gender_male i.edu
HTML Code:
Logistic regression Number of obs = 9777 LR chi2(4) = 62.17 Prob > chi2 = 0.0000 Log likelihood = -5551.7018 Pseudo R2 = 0.0056 ------------------------------------------------------------------------------- pass | Coef. Std. Err. z P>|z| [95% Conf. Interval] --------------+---------------------------------------------------------------- motiv_100 | -.2191368 .1253341 -1.75 0.080 -.4647872 .0265136 tenure | -.0174614 .0026707 -6.54 0.000 -.022696 -.0122268 1.gender_male | .1047808 .0992882 1.06 0.291 -.0898205 .2993821 1.edu | .233494 .0528765 4.42 0.000 .129858 .3371301 _cons | -1.041527 .1336965 -7.79 0.000 -1.303568 -.779487 -------------------------------------------------------------------------------
e(-0.21) = 0.81
Interpretation: A one unit increase in motivation score would decrease the odds of pass by 19%.
My understanding is that in this case, "one unit" reflects an increase from the minimum score (0) to the maximum score (1) and that's why the effect size is 100x that above. Is that correct?
Thank you very much in advance again for your help!
↧
Time series ARIMA, how to forecast future?
Hi everyone,
I am new to STATA time series, and here is a question I have on how to perform forecast after ARIMA.
I have arima values (for example, 1,0,1), I am curious on what should I do next in order to perform a out of sample forecast until year of 2030?
Is there any tutorial on this? Inside the STATA pdf manual, I could not find relevant tutorial on post ARIMA forecast. The only material I found regards to forecast is on regression that involves multiple variables. However, for my problem, I have only one variable (eg. ship arrivals to port over time).
I would really appreciate if someone could explain to me how to transform my ARIMA (1,0,1) into a regression if thats the case? Or, what should I do next to forecast dynamic forecast? Thank you!!
I am new to STATA time series, and here is a question I have on how to perform forecast after ARIMA.
I have arima values (for example, 1,0,1), I am curious on what should I do next in order to perform a out of sample forecast until year of 2030?
Is there any tutorial on this? Inside the STATA pdf manual, I could not find relevant tutorial on post ARIMA forecast. The only material I found regards to forecast is on regression that involves multiple variables. However, for my problem, I have only one variable (eg. ship arrivals to port over time).
I would really appreciate if someone could explain to me how to transform my ARIMA (1,0,1) into a regression if thats the case? Or, what should I do next to forecast dynamic forecast? Thank you!!
↧
mlogit + margins + test
Using Stata 15.1, I was wondering how I test for equality of the coefficients for the marginal effects.
Example below.
use http://www.stata-press.com/data/r13/sysdsn1
mlogit insure age i.male i.nonwhite i.site
test [Indemnity]age = [Prepaid]age
margins, dydx(*) predict(outcome(Indemnity))
margins, dydx(*) predict(outcome(Prepaid))
How do I perform the equivalent test test [Indemnity]age = [Prepaid]age?
Example below.
use http://www.stata-press.com/data/r13/sysdsn1
mlogit insure age i.male i.nonwhite i.site
test [Indemnity]age = [Prepaid]age
margins, dydx(*) predict(outcome(Indemnity))
margins, dydx(*) predict(outcome(Prepaid))
How do I perform the equivalent test test [Indemnity]age = [Prepaid]age?
↧