Quantcast
Channel: Statalist
Viewing all 65379 articles
Browse latest View live

Test for mediation using panel data (with some missing data).

$
0
0
I've tried this a few different ways now and each come up with an error message. Is there a command which can test for mediation when some years of the panel data are missing?

More specifically: I have six years worth of data. My independent variable has datapoints for each of the six years. My mediator has data for years 1-4. My dependent variable has data for years 5 and 6 only. I also have control variables with some missing data. Is it possible to test for mediation or would I need the missing datapoints (which I can't get)? I've been trying different versions of the following command (e.g., with and without lags; fixed effects and random) to no avail:

xtivreg DepVar (l1.MedVar=l2.IndepVar) l1.ControlVarOne l1.ControlVarTwo, fe

The two most common errors I get are: 1) sample specifies cross-sectional data and 2) no observations. Unfortunately, I've been requested NOT to run the analysis as cross-sectional, so thats not an option.

Any help would be greatly appreciated, as I'm fairly new to Stata!

Thanks in advance,
Steve

Research Setup: Count Model (ZINB) or totally different approach?

$
0
0
Hey there,

I would be delighted to get your opnion on the following question:

I am a little stuck at this moment (and I did not find a similar case in the forum) and am not sure if I am on track with the regression I choose.

- The dependent variable of the data is: USD Funding of companies (Hence, it is either 0 or can be VERY high (up to 20 Mio, USD) Also the "0" can be differentiated between: Company did not even apply for funding, and 0 company applied for funding but did not receive any)

After doing my research, I thought the answer of the right setup must be in the Count-Regression-Family.
-> Also, the dependent variable shows data overdisperssion (indicator for better fit of neg. binominal)
-> Furthermore, the data contains " excess zeros" for which zero-inflated models intend to account for.

--> Hence, my initial conclusion was the to use the ZINB model

However, the Dependent variable is not a classical count variable, correct (USD)? Hence, it is not that we had 1,2,3,4 or 5 trials or separate draws. So I am wondering whether it is "okay" to treat the USD funding as a count variable.

Any guidance, thoughts and comments are highly welcome.

Already thanks a lot in advance.
Kind regards
Dan













Linking 2 variables

$
0
0
I am new to STATA. I am looking for help on linking 2 variables in the same dataset. The dataset is similar to a census data. (Specifically its UK's understanding society dataset)
I have the below variables
a= person identifier
b= person's education level
c= persons wellbeing
p2= partner number

I am trying to run a regression with person's wellbeing related to partner's education level.
How do I link these 2 variables?

There is no direct value in the table for partner's education level.

So I need to link a-> p2 ->b

Splitting data in regression analysis

$
0
0
Hi,

I am new to Satalist so I apologise for any errors in my post.
I am currently running a regression model on unemployment (independent) and crime (dependent), using a panel-data approach of 50 U.S states between 2000-2015 with a multitude of control variables. I have used the xtreg command and it is a Fixed effects (FE) model.

I have ran my model i.e. 2000-2015 but now want to split my data into pre-recession (2000-2007) and post-recsession (2008-2013) periods. I am unsure weather this is econometrically correct? I have read the use of time-specific dummy variables may be useful however I wanted to understand if 'splitting the data' is acceptable to draw conclusions on?

Any help on this would greatly appreciated.
Thanks

Matching based on firmsize (problems with calculating the ratio)

$
0
0
Hello,

I posted this question earlier (http://www.statalist.org/forums/foru...g-sample/page3), but havent got a reply yet. I was hoping that more people would see my post if i would post it again (since the other thread is rather old).

For my master thesis, I am using a matching technique, based on sic-code, fiscalyear and size (30% margin).
This is the first code i tried (based on the posts on statalist):
Code:
set seed 16
joinby sic fiscalyear using `holding', unmatched(master)
gen ratio = control_lagsize/lagsize
drop if !inrange(ratio, 1/1.3, 1.3)
set seed 17
gen double shuffle1 = runiform()
gen double shuffle2 = runiform()
by control_gvkey (ratio shuffle1 shuffle2), sort : keep if _n==1
by case_gvkey (ratio shuffle1 shuffle2), sort : keep if _n==1
drop ratio shuffle1 shuffle2
Because this led to a rather "strange" sample, being:
X Y
0 | 464 15 | 479
1 | 464 15 | 479

That's why i assumed something went wrong and tried the following:

Code:
  
 set seed 16 joinby sic fiscalyear using `holding', unmatched(master) gen ratio = lagsize/control_lagsize drop if !inrange(ratio, 1/1.3, 1.3) set seed 17 gen double shuffle1 = runiform() gen double shuffle2 = runiform() by control_gvkey (ratio shuffle1 shuffle2), sort : keep if _n==1 by case_gvkey (ratio shuffle1 shuffle2), sort : keep if _n==1 drop ratio shuffle1 shuffle2
This led to a better looking sample, without changing the 1/1.3 and 1.3 (because that did lead to no matches). I probably did something wrong here.
I did however manually check some of the observations for both the first and the second code, and it does look like the difference is range is 30% for both codes..
More specifically, these are the first three observations:

- first code
lagsize control_lagsize
7.203108 5.622001
10.28004 8.313141
10.58946 8.162308

-second code
lagsize control_lagsize
7.475793 8.335548
8.672144 10.863
7.304893 9.221478

So, although i changed the ratio formula, which provided me with a totally different sample, it looks like the size is still within a 30% range.
Am I totally missing something here? Could you please tell my which code is more correct (and why)?

Thanks in advance!

Clustering using reghdfe- Full Rank Error

$
0
0
Hello. I am running a county-level fixed effects panel regression and I include state by year level fixed effects. Ideally, I'd like to cluster at the state level to account for some spatial correlation, but understand that doing so would give me fewer clusters than coefficients, which is problematic. Alternatively I could cluster at the county level. I have about 2500 counties and then about 100 variables, so I should be ok, right?

Nonetheless, I am getting the "Warning: estimated covariance matrix of moment conditions not of full rank. overidentification statistic not reported, and standard errors an model tests should be interpreted with caution."

My first stage F-stat is drastically smaller than it was with no clustering, and I'm wondering if I can trust this result. I'd appreciate any help you can give.

Thanks

Combination Cochrane-Orcutt with SURE

$
0
0
First, I would like to thank you all for this forum which has been really helpful. Now, I would like to ask my own question because I do not find any answer...

For my master thesis, I've to regress some bank profitability measures (4 different measures, so 4 dependant variables) on 7 explanatory variables such as the level of interest rates, volatility, and so on. My data runs from 1988 until 2009 for 6 different countries (thus 6x4 regressions).

My problem is that I have to make Cochrane-Orcutt correction and at the same time using SURE method. How can I compute that in Stata? I'm sorry because I'm not very familiar with Stata language.

I hope you will be able to help me,
thnak you in advance and all my apologizes if I made something wrong (i.e. missusing this forum).
Bruno

Panel data first differences- only left hand side variable

$
0
0
Hi,
I have panel data and using first differences with up to four lags the following way:

(1)
Code:
 areg  l(0/4).d.(y x), absorb(year) cluster(country)
This gives me 4 and 5 coefficient values for the y and x respectively.



What I like to do is to compare the model with the same number of lags, but now include only y, the dependent variable.
Ive tried with this syntax

(2)
Code:
 areg d.y l(0/4).d.y, absorb(year) cluster(country)
but it gives a very weird output where d.y and d.y is regressed on one another and the coefficient is obviously one, and zero for the other lagged values (from one to four).



The question: is there a way to to regress d.y on its one-two-three-four lagged variables with panel data? I would like to compare its results (with a simple F test) with the output of regression specified with the syntax (1)

error message: no imputations to compute between-imputation variance

$
0
0
Hi. I'm attempting to run a mixed regression analysis using pweights with the Add Health data set. I used the recommended scaling techniques pwigls and mpml (user-written commands), but when I run it I get the error message "no imputations to compute between-imputation variance".


SCALING SYNTAX.


*ALL Latinos (Only)
mi passive: gen latinowgtwvs = .00000001
mi xeq: replace latinowgtwvs = gswgt4 if latino==1

mi passive: gen latinowgttemp = .00000001
mi xeq: replace latinowgttemp = w4_wc if latino==1

mpml_wt, psu_id(psuscid) fsu_id(aid) psu_wt(schwt1) fsu_wt(latinowgttemp) mpml_wta(alllatwgt)

pwigls, psu_id(psuscid) fsu_id(aid) psu_wt(schwt1) fsu_wt(latinowgttemp) psu_m1wt(m1alllat) fsu_m1wt(pw1r_alllat) psu_m2wt(m2alllat) fsu_m2wt(alllatwgt2)
mi passive: generate mlalllatwt2=m2alllat
mi passive: generate mlalllatwt1=alllatwgt2




I've tried the approach of mi svyset, as well as embedding the weights in the mixed syntax without any luck. below the full syntax and output.


ATTEMPT #1
mi estimate, errorok: mixed depress wvage16 wvage16sq wvage16cub || psuscid:, pweight(mlalllatwt2) || aid:, pweight(mlalllatwt1) cov(un) variance mle
no imputations to compute between-imputation variance
r(2000);

ATTEMPT #2
mi estimate, errorok: mixed depress wvage16 wvage16sq wvage16cub || psuscid: || aid: cov(un) variance mle
no imputations to compute between-imputation variance
r(2000);

ATTEMPT #3
mi svyset psuscid [pweight=alllatwgt], strata(region)
pweight: alllatwgt
VCE: linearized
Single unit: missing
Strata 1: region
SU 1: psuscid
FPC 1: <zero>

mi estimate, errorok: svy: mixed depress wvage16 wvage16sq wvage16cub
no imputations to compute between-imputation variance
r(2000);


Any thoughts on what I'm doing wrong??

Multivalued treatment effects

$
0
0
Dear all,

I have quantitatively multivalued treatment variable.
So i believe i cannot use the command "teffects ipwra" as it estimates treatment model by multinomial logit.
I need to estimate the treatment model by ordered logit or ordered probit.

Anyone can help out?

What i did instead is as follows:

1. run oprobit

2. predict probabilities
predict double pr0 pr1 pr2 pr3, pr

then how can i assign inverse probabilities for treated and controlled?

and how to conduct a balance test?


Thank you in advance
Diane

Problem with local marco or If-conditions?

$
0
0
Dear Statalist,

I wish to generate variable "a" on basis of variable "a" (data generated within script) and "b" (cont.data) and scalar "c" and "d"

time_id a b
1 . 1
2 . 0
3 . 2.8
4 . 0.325
5 . 0
6 . 0
7 . 0.73
8 . 0
9 . 1.6
10 . 0

scalar c 0.01
scalar d 1


generate a = 0
if time_id>1 local i = a[_n-1] + b - c
if `i' < d {
replace a = `i'
}
else if `i' > d {
replace a = d
}
else if `i' < 0 {
replace a = 0
}

It appears to be some problem with the program which perhaps is related to how I conditioned the local macro.

Any suggestions?

Weighted Averages by Category

$
0
0
Hey!
So I am a newbie at stata programming and have a problem trying to find weighted averages by category.
I have a panel data set (region and month) that is separated by US regions (mid-west, east-south central, middle atlantic, etc). One of the variables are heating degree days (HDD) that are separated by region per month. I want to generate a variable for region i (Wit) that is the weighted HDD population average of the other regions (-i). Then, I want to sum those averages across lags from 2 to 12 (t-2 to t-12).
thanks for helping out!

Log Likelihood

$
0
0
Dear all,

Could somebody tell me how I can incorporate the log-likelihood in the output of my regressions since I did not find the right command for visualising the log-likelihood.

I've used "estout m*, cells(b(star fmt(%9.3f)) se(par)) stats(r2_a N chi2, fmt(%9.3f %9.0g) labels(R-squared)) legend label collabels(none) varlabels(_cons Constant)".

Thank you in advance.

Issues with AND/OR function with string variables

$
0
0
Hi all,

I'm having a bit of difficulty getting the AND/OR commands to work.

The problem that I have is that the coding is shared across all variables (it's a set of product codes), but I want to link it to the original variable specified in that line of code. I've added part of the code below, and all of the codes (code1/2/3/4) are being linked with both product 1 and 2 - even though 1/2 are only mentioned alongside product1 (and the same problem with product2). How can I prevent this?

Thanks,

Calum

gen newvar=0
replace newvar=1 if product=="product1" & productcode=="code1" | lsoanum=="code2"
replace newvar=1 if product=="product2" & productcode=="code3" | lsoanum=="code4"

Customize the table being exported

$
0
0
Suppose I would like to export a summary table, just as follows:

sysuse auto,clear
estpost sum price mpg rep78
esttab, cells("count mean sd min max") noobs

Below is what I got:


count mean sd min max
-----------------------------------------------------------------------------
price 74 6165.257 2949.496 3291 15906
mpg 74 21.2973 5.785503 12 41
rep78 69 3.405797 .9899323 1 5
-----------------------------------------------------------------------------

Is it possible to do the following things:
1. rename the column names: i.e. I want No.Obs. instead of count and Mean instead of mean
2. add rows to the table. for example, I would like to add one row between price and mpg, with the first cell being "New Dimensions", just as below:

count mean sd min max
-----------------------------------------------------------------------------
price 74 6165.257 2949.496 3291 15906
New Dimensions
mpg 74 21.2973 5.785503 12 41
rep78 69 3.405797 .9899323 1 5
-----------------------------------------------------------------------------

3. How about adding vertical and horizontal lines (possibly bold ones) to the table?


Thanks!

Pooled OLS vs FE and Two way FE

$
0
0
So I ran couple of different models to see how does position influence overall rating of a player. And the results I got are not what I expected. So I am curious if I did something wrong. In the attached screenshot, you could notice that coefficients on forward and center are positive and significant, but in FE model it becomes insignificant and negative. So I am really curious what happened there.
Another interesting thing in my results is the dramatic bias penalties has without controlling for time. And when I control for time and entitiy differences in Two-way FE coefficient on center becomes positive. So I am just curious to know what's going on here.

What's wong with data?

$
0
0
Dear All,

I'm doing an event study case now and would like to calculate the CAR, but somehow I got the same CAR results of all variables...How can I make it right?
Could anyone help me please?

Here are part of the data:
date rstock rmkt day_cnt target_day max_target_day evday evt_window count_evt_obs est_window count_est_obs rmse pred_rtn ab_ret cum_ab_rtn ar_sd
8082002 -2.22874 -0.37938 30 45 -15 0 17 0 6 1.028256 -30.40595 5.728559
9082002 -2.84943 -2.37722 31 45 -14 0 17 1 6 1.028256 -30.40595 5.728559
10082002 -1.51281 0.32429 32 45 -13 0 17 1 6 1.028256 -30.40595 5.728559
11082002 3.0721 2.685 33 45 -12 0 17 1 6 1.028256 -30.40595 5.728559
12082002 -2.70681 -2.71181 34 45 -11 0 17 1 6 1.028256 -30.40595 5.728559
13082002 0.750234 0.703029 35 45 -10 0 17 1 6 1.028256 -30.40595 5.728559
14082002 2.63729 1.61001 36 45 -9 0 17 1 6 1.028256 -30.40595 5.728559
15082002 2.5393 0.309207 37 45 -8 1 17 0 6 1.028256 0.206563 2.332737 -30.40595 5.728559
16082002 -1.59198 -0.89953 38 45 -7 1 17 0 6 1.028256 -1.17124 -0.42074 -30.40595 5.728559
17082002 -0.38946 -0.8509 39 45 -6 1 17 0 6 1.028256 -1.11581 0.726355 -30.40595 5.728559
18082002 -0.45113 0.080808 40 45 -5 1 17 0 6 1.028256 -0.05378 -0.39735 -30.40595 5.728559
19082002 0.81571 0.556564 41 45 -4 1 17 0 6 1.028256 0.488517 0.327193 -30.40595 5.728559
20082002 0.389571 0.383584 42 45 -3 1 17 0 6 1.028256 0.291343 0.098228 -30.40595 5.728559
21082002 -0.0597 0.023175 43 45 -2 1 17 0 6 1.028256 -0.11948 0.059775 -30.40595 5.728559
22082002 -3.6141 -3.06386 44 45 -1 1 17 0 6 1.028256 -3.63829 0.024191 -30.40595 5.728559

Many Thanks,
Best ,
Mico

How to change shortcut keys for mac

$
0
0
I assume most Stata users don't like mac, but I am stuck with one. I need to change the shortcut key for execute(do) from [Command+Shift+D] to [Command+D] to make it more similar to windows. Any advice would make my day!

Dealing with attrition in longitudinal panel data

$
0
0
Dear all,

I am working with longitudinal panel data for individuals (years 2007 and 2009) and there is a very big attrition rate between the two waves (64%). For now, I had just been working with the individuals that remained in the sample. However, I think I need to address this attrition issue.

I have never dealt with attrition before and I do not know any methods to deal with it. I know for a fact that the data is MNAR (Missing Not At Random): the people who left the sample are most likely those who were affected by the 2008 economic crisis.

From what I've read online (and understood) so far, there are a few ways of dealing with the attrition issue such as reweighing the sample with IPW (Inverse Probability Weighting), and the use of refreshment panels (which I have with the 2009 wave).

I am wondering if any of you had to deal with attrition in longitudinal panel before, and what was the best and "easiest" technique to use- I am a bit constrained by time. I was thinking perhaps to match some individuals that left the sample with some who "come in" the sample in the 2009 refresher data (with Propensity Score Matching?) and work with that.

Thanks a lot for your help!

regression discontinuity error no observations

$
0
0
Hi,

I ran the following code using the data (below). This is the first time that I have used dataex so please let me know if I didn't post correctly.

rd grade treatment assignment

and I received the follow error message: no observations.

I would appreciate any suggestions to fix this problem. I think it has something to do with my outcome being ordinal and not continuous.

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input float(id grade) long treatment float assignment
  1 3.7 0   10
  2   4 0   80
  3 2.7 1  -20
  4   2 0   20
  5   4 0  110
  6   4 0   30
  7 2.3 0  100
  8   4 0    .
  9 2.7 1  -30
 10   3 0   60
 11   4 0   60
 12   4 1  -20
 13   1 0  -90
 14   4 1    .
 15   4 0    .
 16   4 1  -20
 17   3 1  -50
 18   1 1 -150
 19   4 0  140
 20 1.7 1 -170
 21 3.3 0  170
 22   3 1  -20
 23   1 1  -60
 24   . 0   70
 25 3.3 0   90
 26 1.3 1  -60
 27   4 0  140
 28 3.3 1  -70
 29   4 1  -10
 30   2 1 -230
 31 3.7 0   10
 32 3.7 1 -140
 33 3.7 1  -10
 34 3.7 0   10
 35 2.7 0   20
 36   4 0   40
 37   4 0  160
 38   4 0   80
 39 3.3 0  120
 40   4 0   90
 41   3 1  -90
 42 1.3 1  -30
 43 1.3 1  -70
 44 3.7 1 -120
 45 2.3 0   40
 46   3 0  -50
 47   4 0   90
 48 3.7 0  -10
 49 1.7 0   80
 50 2.7 0  160
 51 2.7 1  -30
 52 3.7 0   20
 53   4 0  130
 54   4 0   10
 55   3 0   30
 56   4 0   30
 57 3.7 0   70
 58 3.3 0  -10
 59 3.7 0    0
 60 2.7 1  -10
 61 3.3 1  -20
 62 2.7 0    .
 63   3 1  -50
 64   4 0   20
 65   2 0  -20
 66 2.3 0   10
 67 1.7 1  -60
 68   4 0    .
 69   4 0  180
 70   4 0   10
 71 2.3 1 -110
 72   3 1  -80
 73 3.3 1  -30
 74 3.3 0  -40
 75   4 1  -80
 76   4 0    .
 77   3 1 -130
 78 1.3 1 -140
 79 1.7 0   20
 80   2 1  -60
 81 1.7 0   90
 82 1.7 1 -100
 83 3.7 0    .
 84   2 1 -100
 85   4 0   10
 86   2 0    0
 87   4 0    .
 88 2.7 0   70
 89 2.3 0   90
 90 3.7 1  -60
 91 3.7 1  -10
 92 3.3 1  -80
 93   2 1 -120
 94   . 0    0
 95   4 0  140
 96   4 0   70
 97 3.3 0    0
 98 3.7 0   90
 99 3.7 1  -90
100 3.3 1 -100
end
label values treatment treatment
label def treatment 0 "untreated", modify
label def treatment 1 "treated", modify
Viewing all 65379 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>