Quantcast
Channel: Statalist
Viewing all 65098 articles
Browse latest View live

Generate dummy if last observation for an individual in panel occurs before time x

$
0
0
Hello,

I have a panel of banks from year 1905 to 1910. Some banks have observations for the whole duration of the panel, some fail before the end of the panel (i.e. last observation in year 1907), some are founded after the beginning of the panel (i.e. first observation in 1906).

Each bank has a unique Id. My time measure is years.

I want to create a dummy variable called "disappear" - if a bank disappears before the end of the panel.

The code I have so far is:

gen disap = 0
sort Id_Bank Year
bys Id_Bank (Year): replace disap = 1 if & Year[_N]<1910

But I am not sure this does what I want - is this the correct way of creating the dummy?

A separate question: what if I wanted to *not* include banks that are founded after 1907, even if they fail before 1910? what code should I use in that case?

Thank you very much for your help.


Beatrice

midas: Hessian has become unstable or asymmetric r(504);

$
0
0
I have one more query on midas please.
While doing analysis, an error statement came up as below
Hessian has become unstable or asymmetric
r(504);

When I clicked on r(504), the following message came up.

Search of official help files, FAQs, Examples, and Stata Journals

[P] error . . . . . . . . . . . . . . . . . . . . . . . . Return code 504
matrix has missing values
This return code is now infrequently used because, beginning
with version 8, Stata now permits missing values in matrices.

(end of search)
In my database, there are no missing values.
Please help

A stochastic frontier model with correction for sample selection in Stata

$
0
0
Dear Statalist,

I am trying to estimate a SFM that can address selection bias. Theoretically, I am aware that Greene (2010) developed an approach that can address selection bias in SFA.

But, I couldn't find any way to conduct that analysis in STATA.

Thus, I kindly ask your help in the following two questions.

1. Could anyone provide me a stata command to estimate the Greene's 2010 model?

2. Though Greene's approach can be used for two choices, it couldn't work if you have more than two choice sets. Hence, I want to know if someone has extended this approach?


Thank you!



Tracking changes in dataset

$
0
0
Hi,

I tracked executives across firms and retained those executives who were employed for at least three years in each of at least two different firms. In addition, I breakdown executives by their title (e.g. CEO, CFO, OTHER), however i want to include two variables which provide the current title (i.e. the last title listed for the executive in dataset) and their prior title. For instance (see dataset), execid 28 moved from company 9563 with title OTHER (= prior title) to 28349 with title CEO (=current title). I am searching for a code that track these executive changes from companies and provide their current en prior title.
gvkey execid fyear CEO CFO OTHER
1246 19 1992 1 0 0
1246 19 1993 1 0 0
1246 19 1994 1 0 0
64117 19 1996 1 0 0
64117 19 1997 1 0 0
64117 19 1998 1 0 0
9563 28 1992 0 0 1
9563 28 1993 0 0 1
9563 28 1994 0 0 1
28349 28 1992 1 0 0
28349 28 1993 1 0 0
28349 28 1994 1 0 0
13099 523 1992 0 0 1
13099 523 1993 0 0 1
13099 523 1994 0 0 1
113419 523 2003 0 1 0
113419 523 2004 0 1 0
113419 523 2005 0 1 0
113419 523 2006 0 1 0

Matching based on market value with range.

$
0
0
Hi,

I am trying to Match the companies in my dataset that belong to the test sample 1:1 to a company out of that sample. To explain, the test sample was involved in Tax Disputes between 2003 and 2009, the control group was not. For each firm-year the dummy tax_disputes indicates whether this company was involved or not. Matching needs to be based on (1) industry classification; (2) firm size; and (3) time (YEAR). In particular, we identify non-tax-avoidant firms in the same four-digit North American Industry Classification Scheme industry as a given tax-avoidant firm in the year of the tax dispute. Matched firms need to have a market value (SIZE) of +-40% from each other. When multiple are possible, the one with the closest market value will be chosen.

Can anybody help me how to do this.

Generating ID variables in one data set based on some observations in another data set

$
0
0
HI all,

Please consider the following data set: data set_1

clear

. input str8 respid byte str5 commid

respid commid
1. XY050037 XY001
2. XY050038 XY001
3. XY050050 XY002
4. XY050014 XY003
5. end

. list

+-------------------+
| respid commid |
|-------------------|
1. | XY050037 XY001 |
2. | XY050038 XY001 |
3. | XY050050 XY002 |
4. | XY050014 XY003 |
+-------------------+

. clear

Now i have another data set, on the same observations, but without the commid. For eg: Data set_2:

input str5 respid

respid
1. 50037
2. 50038
3. 50050
4. 50014
5. end

. list

+--------+
| respid |
|--------|
1. | 50037 |
2. | 50038 |
3. | 50050 |
4. | 50014 |
+--------+
I'm trying to generate commid for the observations in Dataset_2, just like in Dataset_1. For instance, the respid-50037 in dataset_2 is XY50037 in dataset_1 and would like to introduce a commid- XY001 in dataset_2 corresponding to it.

I have been looking at posts about using loops for generating ID variables but am unable to find a solution to my problem. Would greatly appreciate your help.
Thank you.

Power calculation for time to event study

$
0
0
Hi,

I am powering an RCT where the primary endpoint is time-to-response from initiation of treatment (treatment vs. control).

The mean time-to-response in the control arm is 5 days & we estimate a clinically important elongation in response would be 7 days. The SD is 10 days & the expected event rate is ~100%. For the power calculation, I am currently using power twomeans, but I am unsure whether this fully captures the time-to-event nature of the study:

Code:
power twomeans 5 12, sd(10)
Would power cox be more appropriate? If so, how would I apply this command?

Thank you for any help,
Megan

Statistical power

$
0
0
I want to run a linear regression I have a 94 observations with 11 independent variables. I have theoretical justification to include all the variables in the model. What code should I use to run the model with high statistical power?

Refering to factor variables post estimation

$
0
0
Dear All,

I want to use lincom to estimate total effect after running a regression of the form:

Code:

. reghdfe scriptnumber i.post##treated2##(agedum1 agedum2 agedum3 agedum4 agedum5), absor
> b(i.wk2 i.prescriberid) vce(cluster prescriberid)
(converged in 3 iterations)
note: 1.post omitted because of collinearity
note: 1.treated2 omitted because of collinearity

HDFE Linear regression                            Number of obs   = 29,343,300
Absorbing 2 HDFE groups                           F(  21,  73320) =     333.08
Statistics robust to heteroskedasticity           Prob > F        =     0.0000
                                                  R-squared       =     0.3944
                                                  Adj R-squared   =     0.3929
                                                  Within R-sq.    =     0.0522
Number of clusters (prescriberid) =     73,321    Root MSE        =     4.3413

                               (Std. Err. adjusted for 73,321 clusters in prescriberid)
---------------------------------------------------------------------------------------
                      |               Robust
         scriptnumber |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
----------------------+----------------------------------------------------------------
               1.post |          0  (empty)
           1.treated2 |          0  (empty)
                      |
        post#treated2 |
                 1 1  |  -.2841009   .0354557    -8.01   0.000    -.3535939   -.2146078
                      |
            1.agedum1 |  -3.081656   .0585112   -52.67   0.000    -3.196338   -2.966974
            1.agedum2 |  -2.524654   .0460961   -54.77   0.000    -2.615003   -2.434306
            1.agedum3 |  -2.302338    .040131   -57.37   0.000    -2.380995   -2.223682
            1.agedum4 |  -2.011272   .0342483   -58.73   0.000    -2.078399   -1.944146
            1.agedum5 |  -2.152859   .0391209   -55.03   0.000    -2.229536   -2.076183
                      |
         post#agedum1 |
                 1 1  |   .0238845    .017022     1.40   0.161    -.0094785    .0572474
                      |
         post#agedum2 |
                 1 1  |   .0075982   .0134217     0.57   0.571    -.0187083    .0339047
                      |
         post#agedum3 |
                 1 1  |   .0227166    .011867     1.91   0.056    -.0005428    .0459759
                      |
         post#agedum4 |
                 1 1  |   .0236837   .0109215     2.17   0.030     .0022777    .0450898
                      |
         post#agedum5 |
                 1 1  |   .0698713   .0130745     5.34   0.000     .0442454    .0954973
                      |
     treated2#agedum1 |
                 1 1  |   -.396839   .1059009    -3.75   0.000    -.6044043   -.1892737
                      |
     treated2#agedum2 |
                 1 1  |   -.309352   .0846507    -3.65   0.000    -.4752671   -.1434369
                      |
     treated2#agedum3 |
                 1 1  |  -.1992315   .0706572    -2.82   0.005    -.3377193   -.0607438
                      |
     treated2#agedum4 |
                 1 1  |  -.1407674   .0579312    -2.43   0.015    -.2543122   -.0272225
                      |
     treated2#agedum5 |
                 1 1  |  -.1115767   .0639635    -1.74   0.081    -.2369451    .0137916
                      |
post#treated2#agedum1 |
               1 1 1  |   .2544823   .0340203     7.48   0.000     .1878026     .321162
                      |
post#treated2#agedum2 |
               1 1 1  |   .1834304   .0281337     6.52   0.000     .1282884    .2385724
                      |
post#treated2#agedum3 |
               1 1 1  |   .1683772   .0233473     7.21   0.000     .1226166    .2141379
                      |
post#treated2#agedum4 |
               1 1 1  |   .1832277   .0204976     8.94   0.000     .1430525    .2234028
                      |
post#treated2#agedum5 |
               1 1 1  |   .2174294   .0244478     8.89   0.000     .1695119    .2653469
---------------------------------------------------------------------------------------

Absorbed degrees of freedom:
------------------------------------------------------------------------+
          Absorbed FE |  Num. Coefs.  =   Categories  -   Redundant     |
----------------------+-------------------------------------------------|
                  wk2 |          100             100              0     |
         prescriberid |            0           73321          73321 *   |
------------------------------------------------------------------------+
* = fixed effect nested within cluster; treated as redundant for DoF computation

.
end of do-file

.  lincom post#treated2+post#treated2#agedum1
post#treated2 invalid name
r(198);
How do I refer to the variables
post#treated2 and post#treated2#agedum1 for lincom? Thank you for your help.

Sincerely,
Sumedha.

Find out what variables in .dta file are used in .do file

$
0
0
I am reviewing an old project by someone who is no longer reachable.

I have her .dta file with input data and the .do file with her analysis. However, the .dta file contains a large number of variables/columns and is simply a mess overall. Many of these variables were poorly named so that it's hard to distinguish what variables are used in the .do file and what variables are never used. For instance, there are several variations of variables containing GDP data (e.g. GDP_11, GDP_12, GDP_101, etc.) when in the end only one of these variations is used in the analysis contained in the .do file. This takes place with other variables as well.

Is there a way for me to find out what variables in the .dta are actually used in the .do file? I would like to remove all variables from that .dta file that are not used the code contained in the .do file.

Thank you in advance.

Creating panel data from cross sectional data

$
0
0
Hi everyone,

I am trying to create a panel dataset starting from cross-sectional data.

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input long account str11(service supplystartdate) str10 supplyenddate
1 "A" "30/11/2018" ""          
1 "B" "25/11/2017" "03/01/2019"
2 "A" "19/11/2017" "23/01/2019"
3 "A" "08/10/2017" "22/11/2018"
4 "A" "09/08/2017" "25/10/2017"
5 "A" "26/10/2016" "20/07/2017"
6 "A" "19/04/2014" ""          
7 "A" "26/07/2015" "09/11/2019"
7 "B" "23/07/2015" "04/11/2019"
8 "A" "11/09/2013" ""          
9 "A" "09/07/2015" "18/09/2019"
end

The goal is to know, for each account, for how many days she has used a service (A and/or B).
However, for future analyses, rather than a single variable with values the number of days, I would need to systematically generate (and associate to each account) several dummy variables (starting from e.g. 11/09/2013 .. up to today, 14/01/2020) equal to 1 if the service was used in that day and 0 otherwise.
Indeed, in the end I will have to aggregate, for each day, the number of accounts using a service at a particular zipcode.

How can I do?

Binary Variable Graph

$
0
0
Hi All;

I need help about binaries. I reviewed the topics but i couldn't find solution.
I have five binary variable and i want to draw bar plot that contains only 1 value of binaries. And whole variables must be side by side in the same bar graph. Bars must be show, e.g How many 1 values does var1 contains.

I'm so sorry if there is any topic same with this in forum, I have time shortage. Thank you all!

Array

Multitple color graph bar

$
0
0
Dear Stata users,

I'm trying to produce a twoway graph, plotting size effect as bar and adding confidence interval. Everything run smooth, except when I'm trying to change bar's color to distinguish the different treatment. See code example below:

Code:
 graph twoway bar T1_C trt, text(.06  .05 "Control" , place(c))                 ///
                               text(.36 1.05 "Civ. Ed.", place(c))                 ///
                               text(.22 2.05 "Sec."    , place(c))                 ///
                               text(.45 3.05 "Combined", place(c))                 ///
                               bargap(1) barwidth(0.40) legend(off) color(grey%30) lcolor(%30) lcolor(black) ///
                               title("Civic Participation", box fcolor(white)) xlabel(,nolabels)     ///
                               xtitle("Treatment") ytitle("Index Civic Participation")            ///
                               subtitle(" ") graphregion(color(white)) plotregion(color(white)) ///
                               || ///
                rcap T1_C1 T1_C2 trt, vertical
Where, T1_C are the coefficient size, trt an indicator for the different treatment, T1_C1 and T1_C2 lower and upper value of CI interval. I tried to add the line of code below:

Code:
 bar(1,color(green)) bar(2,color(red))bar(3,color(blue))bar(4,color(orange))
but it doesn't works. Based on previous post on Statalist it seems that I might being confused between graph bar and bar, but I can't find what's wrong.

Many thanks in advance,

Calculate the percentage of other firms that are larger than the firm in the same industry

$
0
0
Dear stata users,

I am working on a replication study for an assignment, but am stuck.

One of the tables reports coefficients from estimating the first-stage regression of the 2SLS model. the description is as follows:


The determinants of CEO decision horizon. This table reports coefficients from estimating the first-stage regression of the 2SLS model. The first step involves a
regression model wherein decision horizon (DH) is estimated. The predicted value of DH is used in the second-stage models in other tables. DH = decision
horizon, SIZE = log of total assets, LEV = long-term debt scaled by total assets. EBIT = earnings before interest and taxes, CAPX = capital expenditure, R&D =
research and development expenditure, ADV=advertising expenditure, SALES=Gross sales, TOBINQ=Tobin's q, ISIZE=the percentage of other firms that are
larger than the firm in the same industry, ICOMP = the percentage of other CEOs who are paid more than the CEO in the same industry, ECOMP = the ratio of
equity compensation to total compensation. *, **, and *** denote significance at the 10%, 5%, and 1% levels.

I have trouble generating ICOMP and ISIZE and running the right regression when I have found these variables.

If there is someone able to help, it is really appreciated.

kind regards,
Stef

Additional info:
To reduce the potential problem of endogeneity in the multi-factor regression models, we use a two-stage least squares (2SLS)
model that involves the estimation of two regression models, one for CEO decision horizon and one for firm performance. The 2SLS
procedure requires that the first-stage equation contain at least one instrumental variable that is unrelated to the error term in the
second-stage model. Here, we use industry characteristics as instruments. Our full 2SLS model is structured as follows.
DH = f ðfirm performance; firm characteristics; CEO compensation package; and instrumentalsÞ; ð3Þ
where firm performance (TOBINQ) is computed as [market value of common equity+preferred stock liquidating value+longterm
debt−(short-term assets−short-term liabilities)] / (total assets), following Chung and Pruitt (1994). Firm characteristics
are: the log of total assets (SIZE), the ratio of long-term debt to total assets (LEV), profitability (EBIT/SALES), capital expenditure
(CAPX/SALES), and intangibles' intensity that is measured as the sum of R&D expenditure (R&D) and advertising expenditure
(ADV) scaled by SALES. The first-stage model also accounts for the fact CEOs approaching retirement are likely to receive different
pay packages than those recently put into office (see Anderson et al., 2006).23 For example, to mitigate the horizon problem,
associated with CEOs nearing retirement, firms are likely to increase incentive-based compensation.24 To account for that, we
include a CEO's equity compensation ratio (ECOMP), which is computed as the value of unexercised stock options (OPTION)
divided by the value of total compensation (TCOMP). Gibbons and Murphy (1992) argue that since career concerns are weaker
when executives near retirement, incentive contracts should be the strongest for these workers. We therefore expect to find a
negative relationship between DH and ECOMP. In addition, we include industry characteristics because the job market
environment in the industry in which a CEO competes can be related to the CEO's decision horizon. Based on this expectation, we
control for two variables: the percentage of other firms that are larger than the firm in the same industry (ISIZE) and the
percentage of other CEOs who are paid more than the CEO in the same industry (ICOMP). Both ISIZE and ICOMP serve as proxies for
the existence of better employment opportunities in the industry, which are expected to have an impact on CEO decision horizon.
Specifically, we anticipate that in the presence of good employment opportunities in the industry, a CEO will be more inclined to
adopt a long-term horizon in order to improve external employment opportunities in the future. Finally, several indicator
variables that account for year, industry and exchange listing effects are also included. The industry indicators are based on the
Fama-French 12 industry classification.

missing values after matrix multiplication

$
0
0
Can someone tell me why the vector s1 below contains missing values?


Code:
version 14.2

webuse sysdsn1, clear
keep if insure !=.
mlogit insure age male nonwhite i.site
matrix define b = e(b)
tempname noomit
_ms_omit_info b
local cols = colsof(b)
matrix `noomit' =  J(1,`cols',1) - r(omit)
tab insure, g(insure)
tab site, g(site)

mata

st_view(y1=.,., "insure1")
st_view(y2=.,., "insure2")
st_view(y3=.,., "insure3")

N=rows(y1)
cons=J(N,1,1)
st_view(X=.,.,"age male nonwhite site2 site3")
X=X, cons
b=select(st_matrix("b"),(st_matrix(st_local("noomit"))))


xb2 = X*b[1..6]'
xb3 = X*b[7..12]'
p1 = 1 :/ (1 :+ exp(xb2) :+ exp(xb3))
p2 = exp(xb2) :/ (1 :+ exp(xb2) :+ exp(xb3))
p3 = exp(xb3) :/ (1 :+ exp(xb2) :+ exp(xb3))

s1 = X'*(y2 - p2)

end
Thanks in advance!

Jessica

ttest panel data

$
0
0
Hello everyone

I have a issue with ttest. I have a panel data with 10 countries for years 2013-2018 and 10 variables (x1, x2...x10). For each variable I want to test whether the average observed in country "a" is different from the average observed in country "b" for the period under analysis. Does Stata not recognize that it is a panel? It states: "more than 2 groups found, only 2 allowed".

Thak you all in advance

Pannel Regression problem

$
0
0
Hi all,

I have a problem regarding the fixed effect model (Panel Regression). I try to find an effect between a dependent variable Y with the independent variable X during the years 2002-2017 within industry class. Moreover, the dataset contains 600 companies of the STOXX600.

When I try to start a panel regression based on Industry ID and Year, Stata returns an error stating multiple duplicates were found (xtset ID year)
(repeated time series found in panel)

I don't know how to fix this problem, because every company in the STOXX600 has an industry sector (e.g. finance, manufacturing and mining). All of these sectors have given an ID (e.g. 1 = finance, 2 = manufacturing and 3 = mining) and multiple companies have the same industry sector and therefore the same ID. The same industry code for a company duplicates with the year and therefore causes the panel error on the entire dataset (e.g. my dataset contains 167 manufacturing sector firms)

Example of the data:
1 Finance 2002 Firm1
1 Finance 2003 Firm1
1 Finance 2004 Firm1
1 Finance 2005 Firm 1

1 Finance 2002 Firm2
1 Finance 2003 Firm2
1 Finance 2004 Firm2
1 Finance 2005 Firm2

1. Is there any way to rearrange this with a by or if statement?
2. making an egen x1 = group(ID) wont work, and IMO creating a separate value for every company as identifier will not give the effect to figure out if there are differences between industries? (or can is the use of by(industry) than somehow possible?
3. Lastly, using a different code like: areg Y X i.year absorb(ID) will give the same result? And is it still name a PANEL REGRESSION than?

Hopefully you guys can help me,

Thanks in advance

Sebas Kalkman



SFIToolkit in v16.0

$
0
0
This question is about the display and displayln methods of the SFIToolkit added in Stata version 16.

According to the documentation, they are defined as:
display(s[, asis]) Output a string to the Stata Results window.
displayln(s[, asis]) Output a string to the Stata Results window and automatically add a line terminator at the end.
Yet when I use them I find that display actually does what displayln is declared to do, and displayln adds yet another empty line.

For example, the following code:

Code:
  SFIToolkit.display('A') 
  SFIToolkit.display('B')
  SFIToolkit.display('C')
  SFIToolkit.display('D')
  SFIToolkit.displayln('.')
  SFIToolkit.display('X')
  SFIToolkit.displayln('Y')
  SFIToolkit.display('Z')
  SFIToolkit.display('T')
  SFIToolkit.display('.')
Produces this output:
Code:
A
B
C
D
.

X
Y

Z
T
.
While my expectation is this:
Code:
ABCD.
XY
ZT.
How do I tell SFIToolkit to continue the output in the same line??

Thank you, Sergiy

H-Statistics for Market Power

$
0
0
Hello forum members!!
I am working on Non-performing assets in Indian banking sector. I have bank specific data for 45 banks for the time period 2005-2019, which makes my data set a balanced panel. Now in order to examine the market power prevailing in Indian banking sector, I need to calculate H-statistics. Can anyone please help me with the same. I do have required data. however, I need help with running the commands.

ttest using a null hypothesis other than zero

$
0
0
I would like to run one-sided ttest (either paired or independent samples) with the null hypothesis set to values other than zero. One possible application for this might be to set the null hypothesis to either a superiority margin or inferiority margin to test for a clinically important difference; however, for that type of application, I would typically calculate the confidence interval and see if it crosses the margin. Unfortunately, I need the actual p-values. My intent is create a graph of p-values by variable difference by setting the null hypothesis to a range of values. This would be similar to a p-value function graph. Does the ttest (or other command) allow me to set the null hypothesis to a specific value or variable list? Thanks.
Viewing all 65098 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>