Hybrid model

April 22, 2020, 7:29 am

≫ Next: Calculating uniform and non-uniform DIF for polytomous data

≪ Previous: Descriptive analysis scatterplots - panel data

Hello, I have chosen to use a hybrid model to estimate my equation which includes a time invariant variable.

Code:

 xthybrid HDRR lnPD IndexofDeprivation FDRDum2 FWDum2 Type2Dum, clusterid(Auth2)

However two of my variables that are time variant produce the message:

Code:

The variable 'lnPD' does not vary sufficiently within clusters
and will not be used to create additional regressors.
[~0% of the total variance in 'lnPD' is within clusters]

And so in my results table they are

Code:

 R__lnPD   0.0096

To estimate my original regression I was using the fixed effects model, however with the results from the hybrid model showing R__ (for random effects) does this mean I have chosen the wrong model? However the results from the Breusch-Pagan and Sargen-Hansen tests are significant indicating I should be using fixed effects.

Could anyone offer any explanation as to why this has happened?
I have consulted papers using the same dataset as myself and they have estimated between and within effects for my two variables that are just showing random effects results with the hybrid model.

Thanks

↧

Calculating uniform and non-uniform DIF for polytomous data

April 22, 2020, 7:45 am

≫ Next: Announcement: wtd_hotdeck -- a community-contributed program to hotdeck with sample weights

≪ Previous: Hybrid model

Dear Stata-list,

I am hoping to learn how to perform a uniform and non-uniform DIF analysis using polytomous (1-5 scale) questionnaire data.

I am currently using the dropdown menu Statistics >> IRT >> DIF option, but stata seems to be unable to calculate because my item responses range from 1-5:

Code:

 difmh q15 q16 q28 q29 q30 q31, group(_age75)
variable q15 has invalid values;
 requires item variables be coded 0, 1, or missing
r(198);

Any advice would be greatly appreciated.

William

↧

Announcement: wtd_hotdeck -- a community-contributed program to hotdeck with sample weights

April 22, 2020, 8:03 am

≫ Next: Collapse for double data type

≪ Previous: Calculating uniform and non-uniform DIF for polytomous data

Apologies in advance if I'm doing this wrong, but I didn't see anything in the FAQ about announcing things, but from searching the forum it appears sometimes people do and no one seems to mind, so...

I wrote a simple hotdeck program. Probably the only thing interesting about it is that it will select donor rows in proportion to their sample weights. I don't know how often this matters for most people, but I often work with datasets where weights can range from 1 to 2,000 and in those case it makes a big difference whether or not your hotdeck handles weights. As best I can tell, there are 3 existing community-contributed commands (hotdeck, whotdeck, and hotdeckvar) that perform hotdecks, but none of them handles sample weights as far as I can tell.

A poorly formatted copy of the help file is below, and I have uploaded the ado, help, and an example to github: https://github.com/johne13/wtd_hotdeck

Let me note that this is the first public version and you should of course be cautious in using this for any real production work. That said, I've used it in production work a couple of times, as have some colleagues and it seems to work like it's supposed to. That said, I'm sure there are plenty of bugs and edge cases yet to be discovered, so just be aware of that.

Comments, advice, and wisecracks are much appreciated!

Title
wtd_hotdeck -- Hotdeck (or statistical match) imputation that selects donor rows in proportion to their survey or sample weights

Syntax
wtd_hotdeck varlist(min=1) [, options]

options Description
--------------------------------------------------------------------------------------------------------------------------
Main
cells(varlist) (optional) Categorical-style variables that define the cells
weight(varname) (optional) Survey- or sample-type weights
seed(#) (optional, default=0) A positive integer will be used to set the seed, zero means no seed is set
verbose(#) (optional, default=0) A non-zero value will cause intermediate variables to be retained
--------------------------------------------------------------------------------------------------------------------------

Description

This is a fairly standard hotdeck program with the possibly interesting feature of allowing the use of frequency- or
survey-style weights. If provided, the donor rows are sampled in proportion to the weights, which may be either integers
or floats. If multiple variables are imputed to a row, then all values will be selected from the same donor row.

Note that donors and recipients are defined internally based on missing values in varlist. Rows with no missing values in
varlist are defined as donors, and rows with any missing values are defined as recipients. Also note that missing values
are replaced or over-written by the hotdeck, so it may be helpful to explicitly store the original values for later
comparisons.

This program is offered for free and "as is", with no guarantees except "your money back for any reason". It has mainly
been tested with Stata 12 (MacOS) and Stata 15 (Windows 10). Since it is a essentially just a specialized sorting
program, it will likely work with any semi-recent version of Stata (or your money back, of course).

Options

cells(varlist) Theses variables define the cells of the hotdeck. The user is responsible for checking that each cell
contains a sufficient number of donors and no checking is done by this program. The variables in "cells" are used
internally for sorting and will generally be of the categorical type, but any variable type is allowed (e.g. if you
have a float variable that only has five unique values, that should be fine).

weight(varname) These may be of frequency- or survey-type and can be integers or floats.

seed(#) Set to a postive integer in order to ensure reproducible results. The positive integer becomes the input for an
internal "set seed" command. If the seed is set to zero (the default value) or is not specified, then no seed is set
internally and Stata will use the system value of seed, whatever that happens to be.

verbose(#) If verbose is set to 1, a number of intermediate variables (beginning with "_") are retained at program
termination. This is mainly for debugging or curiosity.

Brief example

Start with the NMIHS data, then randomly set 20% of childsex & birthwgt to missing

. webuse nmihs
. keep finwgt marital age childsex birthwgt
. replace birthwgt = . if uniform() < 0.20
. replace childsex = . if birthwgt == .
. gen over25 = age > 25
. preserve

Impute childsex & birthwgt using cells based on age & marital status

. wtd_hotdeck childsex birthwgt, cells(marital over25) weight(finwgt)

Continuing the example...

Note that wtd_hotdeck does not check that all of your cells have enough donors observations, so you should always check
this manually. One simple way is to just tab the donor cells.

. table marital over25 if ~missing(childsex,birthwgt)

It can be interesting to check how much the weights matter. If you try the short example below, you are likely to find
that the weights matter substantially, although there will be some random variation with each run (if no seed is set).

. restore, preserve
. sum child birthwgt [w=finwgt] // before hotdeck

. qui: wtd_hotdeck childsex birthwgt, cells(marital over25)
. sum child birthwgt [w=finwgt] // after un-weighted hotdeck

. restore, preserve
. qui: wtd_hotdeck childsex birthwgt, cells(marital over25) weight(finwgt)
. sum child birthwgt [w=finwgt] // after weighted hotdeck

Author

John R Eiler
U.S. Dept of the Treasury
first.last at treasury.gov

Acknowledgements

Rachel Costello, Portia DeFillippes

Also see

hotdeck, whotdeck, hotdeckvar -- These are community-contributed commands that can be used for a hotdeck imputation. All
three can be installed with "ssc install" and include excellent help files. None of them allow sample weights as far as I
can tell.

Stata's mi -- Stata's mi command is very powerful and offers many alternative imputation approaches, but no option to do a
simple hotdeck, weighted or unweighted, to the best of my knowledge.

SAS's proc surveyimpute -- It appears that SAS offers a weighted hotdeck via the command "proc surveyimpute
method=hotdeck(selection=weighted);". I have not used this command and hence have not compared results to wtd_hotdeck.

↧

Collapse for double data type

April 22, 2020, 8:03 am

≫ Next: Question on Graph Combine Holes

≪ Previous: Announcement: wtd_hotdeck -- a community-contributed program to hotdeck with sample weights

Each row in my dataset corresponds to a municipality level. I would like to compute the difference between two variables at the national level. So I have the option to either (1) compute the difference for each municipality and then sum these differences using collapse, or (2) use collapse to sum each one of the variables to get the variables at the national level and then compute the difference of the this variable. I did so below but when I create a variable (teste, below in the code) to check whether these outputs are identical it seems like they are not (although it looks like they are when I display the values of the variables).

Code:

. gen double cvuti_add = cvuti_covid - cvuti_conc

. 
. collapse (sum) cvuti_covid cvuti_conc cvuti_add

. 
. gen double cvuti_add2 = cvuti_covid - cvuti_conc 

. gen teste = (cvuti_add == cvuti_add2)

. tab teste

      teste |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |          1      100.00      100.00
------------+-----------------------------------
      Total |          1      100.00

. gen dif = cvuti_add - cvuti_add2

. tab dif

        dif |      Freq.     Percent        Cum.
------------+-----------------------------------
  -2.38e-06 |          1      100.00      100.00
------------+-----------------------------------
      Total |          1      100.00

. format * %10.0f

. list in 1

     +----------------------------------------------------------------+
     | cvuti_co~d   cvuti_c~c    cvuti_add   cvuti_add2   teste   dif |
     |----------------------------------------------------------------|
  1. | 4199563157   681621782   3517941375   3517941375       0    -0 |
     +----------------------------------------------------------------+

. 
end of do-file

.

↧

Question on Graph Combine Holes

April 22, 2020, 8:03 am

≫ Next: Test of overidentifying restrictions. Fixed vs random effects

≪ Previous: Collapse for double data type

I am generating a series of graphs and combining them to allow for quick/easy comparisons.

Most sets have 4 columns and 4-rows, so I am combining into 4x4 summaries and they are great.

I have one set that has only 3 columns, but the same 4 rows. I would like to have the 4th column blank - so that the size/aspect of the graphs are the same in all of the sets.

I tried the following -

graph combine (List of 12 graphs) , rows(4) cols(4) hole(4 8 12) iscale(.3)

- this produces three columns and 4 rows that has the three columns spread across the page.

I can put the hole in 9 10 11 and I get the 4x4 with the hole as assigned, but if all of the holes are in the fourth column it eliminates that column - ignoring the cols(4) option.

Any help would be appreciated!

Thanks

↧

Test of overidentifying restrictions. Fixed vs random effects

April 22, 2020, 8:10 am

≫ Next: Quarterly invariant data points in hybrid model

≪ Previous: Question on Graph Combine Holes

Dear all,

I need to run the test of overidentifying restrictions and get the Sargan-Hansen statistic.
these are the comments I use:
xtreg dependent var independent variables i.Year, fe vce(cluster ID)
xtreg dependent var independent variables i.Year, re vce(cluster ID)
xtoverid

However, I get the error message: 1990b: operator invalid
Why might this be a case?

Thank you in advance!

↧

Quarterly invariant data points in hybrid model

April 22, 2020, 8:20 am

≫ Next: Experience squared

≪ Previous: Test of overidentifying restrictions. Fixed vs random effects

Hi,

I am calculating a hybrid model using xthybrid.
I have a quarterly dataset, however, for two of my variables the same value is repeated for all four quarters of that year.

Therefore in my results after running xthybrid I am told these variables are time invariant.

Is there a way for the hybrid model to calculate between and within effects for these two variables?

↧

Experience squared

April 22, 2020, 8:55 am

≫ Next: Interpretation of odds ratios of interaction terms. Testing for effect modification

≪ Previous: Quarterly invariant data points in hybrid model

Hello,
I was recently looking at Mincers earnings function and seen he had squared experience. I understand the intuition behind squaring age but not experience. I tried this on my own model where experience is age minus age completed continuous education and received a negative value. Any insight to why squaring age is beneficial to a model will be appreciated.

↧

Interpretation of odds ratios of interaction terms. Testing for effect modification

April 22, 2020, 9:03 am

≫ Next: Keeping top 25 observations

≪ Previous: Experience squared

Hi
I am a new member, an older learner/student and busy with my Master's degree in Public Health. I am busy with my dissertation, "Willingness to initiate insulin in people living with type 2 diabetes Investigating the role of diabetes-related distress"

I, however, need advice in interpreting odds ratios of interaction terms, the ratio of odds ratios. I have read several answers from different post on the subject including Clyde Schechter https://www.statalist.org/forums/for...e-interactions
I struggle to keep to a standard way of reporting each logistic regression model and compare with the different models. I have done and understand marginal effects

For the purpose of this post, I choose the following a priori variables and 3 different types of models

Willingness2 = Binary variable - Outcome
Age = continuous - Predictor of interest 1
Sex = binary variable - Predictor of interest 2
Diabetes distress = DDSmeanitem (continuous) OR DDSmeanitem_cat (categorical) - possible effect modifier

1.Logistic regression with Interaction terms: Variable AGE (continuous) and DDS mean item as a CONTINUOUS variable

. logistic willingness2 c.age##c.DDSmeanitem,nolog

Logistic regression Number of obs = 117
LR chi2(3) = 4.81
Prob > chi2 = 0.1860
Log likelihood = -77.968229 Pseudo R2 = 0.0299

willingness2 Odds Ratio Std. Err. z P>z [95% Conf. Interval]

age .9091612 .0460478 -1.88 0.060 .8232442 1.004045
DDSmeanitem .030789 .0570083 -1.88 0.060 .0008172 1.160014

c.age#c.DDSmeanitem 1.066365 .03491 1.96 0.050 1.000092 1.13703

_cons 146.9265 432.4104 1.70 0.090 .4591778 47013.15

Note: _cons estimates baseline odds.

.Using the odds ratios of the interaction term and diabetes distress, the additional effect that diabetes distress has on the relationship with willingness is 1.066 + 0.030 = 1.096 > 1
Therefore, the interaction effect of diabetes distress is “small positive”. (p-value could probably be interpreted as statistically significant, p=0.05)
Therefore, diabetes distress as a continuous variable has a “small positive” effect of modifying age in relation to willingness.
1.Conclusion: Effect modification is present with the variable, diabetes distress

2.Logistic regression with variables SEX (categorical) and c.DDSmeanitem (continuous)

logistic willingness2 i.sex##c.DDSmeanitem,nolog

Logistic regression Number of obs = 117
LR chi2(3) = 1.81
Prob > chi2 = 0.6131
Log likelihood = -79.470387 Pseudo R2 = 0.0112

willingness2 Odds Ratio Std. Err. z P>z [95% Conf. Interval]

sex
Male .3031514 .2988301 -1.21 0.226 .0439128 2.0928
DDSmeanitem .7914705 .3192276 -0.58 0.562 .3590175 1.744833

sex#c.DDSmeanitem
Male 1.961714 1.090289 1.21 0.225 .6600115 5.830688

_cons 1.178385 .7846311 0.25 0.805 .3195301 4.345726

Note: _cons estimates baseline odds.

The researcher investigated if diabetes distress (continuous) acts as a possible effect modifier with the predictor of interest, sex (categorical) in relation with willingness to initiate insulin
Sex was therefore used as a binary variable and the following results are presented.
Using the odds ratios of the interaction term and diabetes distress, the additional effect that diabetes distress has on the relationship with willingness is 1.96 + 0.79= 2.75 > 1
Therefore, the interaction effect of diabetes distress have on the predictor of interest, sex, is extremely large positive????
2. Conclusion: Diabetes distress is an effect modifier?

3.Logistic regression with variable SEX (categorical) and c.DDSmeanitem_cat (categorical)

. logistic willingness2 i.sex##c.DDSmeanitem_cat ,nolog

Logistic regression Number of obs = 117
LR chi2(3) = 0.87
Prob > chi2 = 0.8322
Log likelihood = -79.938455 Pseudo R2 = 0.0054

willingness2 Odds Ratio Std. Err. z P>z [95% Conf. Interval]

sex
Male .4235411 .4035914 -0.90 0.367 .0654316 2.741597
DDSmeanitem_cat .7818847 .3587579 -0.54 0.592 .3181117 1.921789

sex#c.DDSmeanitem_cat
Male 1.854332 1.264619 0.91 0.365 .4871743 7.058147

_cons 1.107827 .6712464 0.17 0.866 .3378457 3.632668

Note: _cons estimates baseline odds.

.
The study determined the stratified odds ratios for the interaction term of diabetes distress as a categorical variable in association with sex (categorical variable) .
The adjusted odds ratio for diabetes distress in association with sex (males);
Category 2 (moderate distress) vs 1 (no distress) is 0.96
Category 3 (high distress) vs 1 (no distress) is 7.23
Category for 3 (high distress) vs 2 (moderate distress) is 7.23/0.96 = 7.53
Therefore, diabetes distress has a small (0.96) to very large (7.53) positive effect of modifying sex (males) in relation to willingness (but p-values non-significant?)
3.Conclusion: Diabetes distress acts as an effect modifier

Am I interpreting the odds ratios in these 3 models the correct way?

Thank you
Elana

↧

Keeping top 25 observations

April 22, 2020, 9:16 am

≫ Next: Manual Breusch Godfrey test after ARIMA

≪ Previous: Interpretation of odds ratios of interaction terms. Testing for effect modification

Dear Stata users,

I have a problem that seems trivial also to me, but I would like to ask for some hints to implement the command properly.

I have a very simple dataset for two single years of bilateral trade (between France and ROW).

The variables I have are:

1) origin (FRA) and destination country

2) NACE classification

3) export value

4) year

What I would like to do, is to select (creating a variable for dropping later on the rest of the variables) the top 25 destination countries for France export, by single NACE.

Thank you very much for your time

Code:

* Example generated by -dataex-. To install: ssc install dataex

clear

input int year str3(origin destination) int NACE double export_val

1998 "FRA" "VCT" 11     0

1998 "FRA" "ABW" 11     0

2003 "FRA" "DMA" 11     0

1998 "FRA" "GRD" 11     0

2003 "FRA" "KNA" 11     0

1998 "FRA" "LBR" 11     0

1998 "FRA" "AFG" 11     0

2003 "FRA" "TON" 11     0

2003 "FRA" "ATG" 11     0

1998 "FRA" "MSR" 11     0

1998 "FRA" "LAO" 11     0

1998 "FRA" "KHM" 11     0

2003 "FRA" "MMR" 11     0

2003 "FRA" "BHS" 11     0

2003 "FRA" "GRD" 11     0

1998 "FRA" "CAF" 11     0

1998 "FRA" "SOM" 11     0

1998 "FRA" "FJI" 11     0

2003 "FRA" "BVT" 11     0

2003 "FRA" "PSE" 11     0

2003 "FRA" "COK" 11     0

2003 "FRA" "TJK" 11     0

2003 "FRA" "BLZ" 11     0

2003 "FRA" "KHM" 11     0

1998 "FRA" "DMA" 11     0

1998 "FRA" "MDV" 11   567

2003 "FRA" "NPL" 11   730

1998 "FRA" "BMU" 11   771

1998 "FRA" "BHS" 11  1324

2003 "FRA" "RWA" 11  1341

1998 "FRA" "JAM" 11  1511

2003 "FRA" "SUR" 11  2286

2003 "FRA" "PNG" 11  2360

1998 "FRA" "MMR" 11  2387

2003 "FRA" "GNQ" 11  2603

2003 "FRA" "GNB" 11  2807

2003 "FRA" "SWZ" 11  2807

1998 "FRA" "SUR" 11  2931

2003 "FRA" "SLB" 11  2999

2003 "FRA" "VCT" 11  2999

1998 "FRA" "BRB" 11  3136

2003 "FRA" "ANT" 11  3712

2003 "FRA" "PRK" 11  3759

2003 "FRA" "TKL" 11  3766

2003 "FRA" "SOM" 11  4472

1998 "FRA" "SLE" 11  5472

2003 "FRA" "MAC" 11  6210

2003 "FRA" "SYC" 11  6440

1998 "FRA" "AND" 11  6773

1998 "FRA" "GNB" 11  7143

2003 "FRA" "TCD" 11  7356

1998 "FRA" "TKM" 11  7913

2003 "FRA" "MDV" 11  8008

2003 "FRA" "BRN" 11  9453

1998 "FRA" "ANT" 11  9964

2003 "FRA" "CAF" 11  9976

1998 "FRA" "ARM" 11 11007

1998 "FRA" "STP" 11 12855

1998 "FRA" "BRN" 11 13109

2003 "FRA" "VUT" 11 13419

1998 "FRA" "NPL" 11 15179

1998 "FRA" "SYC" 11 15435

2003 "FRA" "LBR" 11 19859

1998 "FRA" "COG" 11 20602

2003 "FRA" "MDG" 11 20856

2003 "FRA" "ZWE" 11 21868

1998 "FRA" "GMB" 11 24856

2003 "FRA" "GRL" 11 25174

2003 "FRA" "STP" 11 26143

2003 "FRA" "MYT" 11 26380

1998 "FRA" "BEN" 11 28170

1998 "FRA" "CPV" 11 32258

1998 "FRA" "TGO" 11 34124

2003 "FRA" "PRY" 11 35003

2003 "FRA" "MLI" 11 35275

1998 "FRA" "COD" 11 35522

1998 "FRA" "LCA" 11 35775

2003 "FRA" "NAM" 11 36215

2003 "FRA" "CPV" 11 36666

2003 "FRA" "GMB" 11 39036

2003 "FRA" "DJI" 11 41679

2003 "FRA" "KGZ" 11 41705

1998 "FRA" "HTI" 11 41840

1998 "FRA" "TCD" 11 42340

2003 "FRA" "UZB" 11 42503

1998 "FRA" "GAB" 11 42572

1998 "FRA" "BHR" 11 44577

2003 "FRA" "COG" 11 44693

2003 "FRA" "BOL" 11 46520

1998 "FRA" "PNG" 11 47248

1998 "FRA" "TTO" 11 52918

1998 "FRA" "FRO" 11 56553

1998 "FRA" "KGZ" 11 58656

1998 "FRA" "PAN" 11 59381

1998 "FRA" "MAC" 11 61065

1998 "FRA" "SLV" 11 61086

1998 "FRA" "VNM" 11 61508

1998 "FRA" "TJK" 11 61804

1998 "FRA" "MOZ" 11 63995

2003 "FRA" "AFG" 11 65197

end

↧

Manual Breusch Godfrey test after ARIMA

April 22, 2020, 9:40 am

≫ Next: Plotting mean values for 4 variables across 3 years

≪ Previous: Keeping top 25 observations

Hi all,

I was wondering if somebody could talk me through how to perform a manual breusch godfrey test after estimating an ARIMA model that contains MA and seasonal terms.
I understand how to estimate the model residuals i am just unsure of the auxiliary regression test equation and the arima/sarima commands

For example,

I run the following model commands:

arima y if tin(..), arima(0,1,1), sarima(1,1,0,12) ( i have monthly data)

b/c i have monthly data i assume i want to check for serial-correlation of order 12? but how do i tell stata this?

If anybody could shed some light it would be greatly appreciated!! i can't find any guidance anywhere online

Thanks

↧

Plotting mean values for 4 variables across 3 years

April 22, 2020, 9:54 am

≫ Next: labelling rows and columns of summary statistics with esttab

≪ Previous: Manual Breusch Godfrey test after ARIMA

Hi all,

I have a panel dataset and I am trying to create a line graph of the mean values for 4 variables across 3 years.

The variables are: csh_sh; cc_sh; dc_sh; chk_sh
They are continuous variables taking values between 0 and 1
I want to show the mean value for each variable for each year

I am struggling to find the correct code for this. Any help would be much appreciated - thanks

↧

labelling rows and columns of summary statistics with esttab

April 22, 2020, 10:09 am

≫ Next: Summary score as an average of PC scores weighted by proportion of variance explained by each component

≪ Previous: Plotting mean values for 4 variables across 3 years

I am producing tables of summary statistics with -esttab-, from -ssc-, in Stata 14. Each statistic is in its own row, like in the following:

Code:

sysuse auto, clear
estimates clear
eststo: estpost su mpg
esttab, cells(mean(fmt(2)) sd(fmt(2)) count(label(N) fmt(0)))

The output table looks like this:

Code:

-------------------------
                      (1)
                        
                mean/sd/N
-------------------------
mpg                 21.30
                     5.79
                       74
-------------------------
N                      74
-------------------------

Question 1: How do I swap the variable name, mpg, and the names of the statistics, so that furthermore each statistic name is in its own row?
Question 2: How do I make the column number appear below all other model or group titles or column labels? I found this analogous question on statalist, but seemingly no answer.

My ideal table would look something like this:

Code:

-------------------------
                        
                     mpg
                      (1)
-------------------------
mean           21.30
SD                 5.79
N                       74
-------------------------
N                      74
-------------------------

Thanks for all your help.

↧

Summary score as an average of PC scores weighted by proportion of variance explained by each component

April 22, 2020, 10:40 am

≫ Next: DiD coefficient is insignificant for country FE but adding country and time FE changes coefficient sign and becomes significant?

≪ Previous: labelling rows and columns of summary statistics with esttab

Hi,
I would like to use PCA to generate a composite indicator which would equal the weighted average of the scores predicted for each PC, where used weights are equivalent to the proportion of variance explained by each component in the PCA. I would normalize PC scores before calculating the weighted average.

Two questions:

1) What command would be most appropriate for this purpose? Are there any options for 'predict' which would allow me to do it straightforward?
2) I need to calculate this score from a number of PCAs run on a number of datasets, and therefore I do not know a priori the number of PCs to be retained according to a certain criterion which I will adopt, e.g. mineigen(1). Therefore I am not able, in 'predict' to define a priori and label the new variables generated by PC scores extraction.

Hope it is clear

Thank you very much for your help

↧

DiD coefficient is insignificant for country FE but adding country and time FE changes coefficient sign and becomes significant?

April 22, 2020, 11:00 am

≫ Next: Problem with xtheckman - Random-effects regression with sample selection

≪ Previous: Summary score as an average of PC scores weighted by proportion of variance explained by each component

I'm looking at whether an environmental policy in 2011 had an affect on exports for the UK using difference-in-differences (Australia is my control). When I added country fixed effects to the equation, I got a positive coefficient for my treatment variable but it was insignificant. When I added time fixed effects as well, it gave me a negative coefficient that was significant at 1%. I am struggling to come up with a correct and detailed interpretation of these initial results - any help would be appreciated!

Exports = β₀ + β₁y2011 + β₂country + β₃y2011*Country + ε
(Basic diff-in-diff model_

↧

Problem with xtheckman - Random-effects regression with sample selection

April 22, 2020, 11:46 am

≫ Next: Should I use regression analysis after chi-square test?

≪ Previous: DiD coefficient is insignificant for country FE but adding country and time FE changes coefficient sign and becomes significant?

Hello,
after running

xtheckman L.sr c.aage12#c.aage12#c.ajsbgy#c.ajsbgy, select(employed = c.aage12#c.aage12)

on my dataset, I was met with the message:

21948 observations incorrectly specified as noncensored in select()

After displaying this message, Stata proceeded to appear working on the command for a few minutes, before showing me this error message:

initial values not feasible

Is there a way for me to fix this, or should I accept that my particular dataset is not set up in a way for this command to work?

↧

Should I use regression analysis after chi-square test?

April 22, 2020, 11:59 am

≫ Next: generating graph showing cohort for students enrolled in year 2005 in class I, class 2 b 2006 till 2015

≪ Previous: Problem with xtheckman - Random-effects regression with sample selection

Hello!

The research question is whether the employer-provided health insurance reduces the likelihood of employee turnover (the intentions to leave the job among employees). My response variable is binary (yes/no). The variable of interest is categorical (4 categories: 0- no health insurance, 1-employer-provided health insurance, 2-own health insurance, 3-other sponsor of health insurance)

I made a chi-square test, which shows the the differences in means is not statistically significant (p>0.05).

Does it mean that I should stop and not continue dealing with regression analysis to find out the relationship between the variables (and magnitude of that relationship)? Can regression model with the list of control variables provide me with the opposite significant result?

Would be glad for answers!

Artem.

↧

generating graph showing cohort for students enrolled in year 2005 in class I, class 2 b 2006 till 2015

April 22, 2020, 12:21 pm

≫ Next: Using Dif in Dif model and getting large coefficients

≪ Previous: Should I use regression analysis after chi-square test?

Sir/ Madam,
I am working on school data for the 10 year period from 2005 to 2015. I have total boys and girls enrolled separately and also total enrolled.
I want to create a cohort of students enrolled in the year 2005 and check how many students complete upto what grade till 2015
I have year-wise data for total enrolled .
I want to create the cohort and generate a graph showing students enrolled in class I completes till which grade.
I exported the total enrolled data and prepared an excel file.

Regards,
Sulochana

↧

Using Dif in Dif model and getting large coefficients

April 22, 2020, 12:33 pm

≫ Next: 2sls Regression with Categorical Endogenous Variable with Interaction Terms

≪ Previous: generating graph showing cohort for students enrolled in year 2005 in class I, class 2 b 2006 till 2015

Hello, I am using Dif in Dif model and getting large coefficients. Would that be a problem and are they in percentage points or in number of students unrolled?

I am looking at the policy effects on non-eu students (treat group) and my control group is EU students.

	(1)	(2)	(3)	(4)
	Model1	Model2	Model3	Model4
VARIABLES	PGFT	PGFT	PGFT	PGFT

1.non-EU PGPT	-203.4***	-203.4***	-203.4***	-203.4***
	(30.42)	(30.44)	(31.32)	(31.32)
1.2012-14	-10.55	34.09	34.09	34.09
	(21.48)	(34.43)	(35.42)	(35.42)
1.Treatment effect of Tier 4 restrictions	24.03*	24.03*	24.03*	24.03*
	(13.35)	(13.36)	(13.74)	(13.74)

Controls for Year	NO	YES	YES	YES

Controls for INSTID	NO	NO	YES	YES

Controls for the RUSSELL GROUP	NO	NO	NO	YES

Constant	1,048***	1,018***	1,718***	1,718***
	(82.15)	(79.72)	(18.94)	(18.94)

Observations	2,916	2,916	2,916	2,916
R-squared	0.004	0.004	0.508	0.508
Robust standard errors in parentheses
* p<0.01, p<0.05, * p<0.1

↧

2sls Regression with Categorical Endogenous Variable with Interaction Terms

April 22, 2020, 12:37 pm

≫ Next: gen new var = betas for factor variable with >2 levels

≪ Previous: Using Dif in Dif model and getting large coefficients

Hello everybody,

I am trying to run an instrumental variables regression where my endogenous variable is a categorical variable (which I created two dummy variables to account for) and with a need for interactions terms.

The set up is as follows:
- Primary dependent variable: Y (continuous)
- Exogenous independent variables: X
- Endogenous Variable: D (categorical with 3 possible values, but created 2 dummy variables D1 and D2)
- Instrument: Z
- Exogenous control variable: C
- Interaction terms: D*X

I have tried running it with the following code:

ivreg2 Y C X (D1 D2 D1#C.X D2#C.X = Z Z#C.X), robust

However, I am just not sure whether this is the right way to go given that I have two binary endogenous variables with interaction.

I've also tried running it the following way, but it keeps giving me the error message: "D1_hat: factor variables may not contain noninteger values"

probit D1 X C Z, vce(robust)
predict D1_hat

probit D2 X C Z, vce(robust)
predict D2_hat

ivreg2 Y C X (D1 D2 D1#C.X D2#C.X = D1_hat D2_hat D1_hat#C.X D2_hat#C.X)

I have read other similar postings such as https://www.statalist.org/forums/for...enous-variable but wasn't able to figure it out.

How should I approach this question using stata?

Thank you so much in advance for any advice!

↧