Quantcast
Channel: Statalist
Viewing all 65488 articles
Browse latest View live

Competing risk survival analysis

$
0
0
Hello,

I am approaching for the first time a competing risk survival analysis. I am using Stata/SE 12.
I have two cohorts of patients with cancer and I am looking at the estimate of their risk of thrombosis; however, there is death as competing risk.

Here is an example of my dataset:

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input int patient_id float cohort_num byte(age genderm thrombosis) float follow_up byte(thrombosis_competing_analysis death)
  3 1 58 0 0 .33 2 1
  4 1 73 1 1 .38 1 0
  5 1 60 1 0 .19 2 1
  6 1 38 0 0   1 0 0
  7 1 58 1 0   1 0 0
  8 1 44 0 0   1 0 0
  9 1 64 0 0 .49 2 1
 10 1 47 0 0 .48 2 1
 11 1 70 1 0 .01 2 1
 12 1 37 1 0 .05 2 1
 13 1 26 0 0  .1 0 0
 14 1 54 1 0  .2 0 0
 15 1 37 0 0  .6 0 0
 16 1 65 1 1  .5 1 0
 17 1 76 0 0   1 0 0
 18 1 34 1 0   1 0 0
270 2 74 0 0 .78 2 1
271 2 73 1 0   1 0 0
272 2 70 0 0   1 0 0
273 2 72 0 0   1 0 0
274 2 61 1 0   1 0 0
275 2 80 1 0 .08 2 1
276 2 74 0 0   1 0 0
277 2 73 0 0   1 0 0
278 2 73 0 0   1 0 0
279 2 81 1 1   1 1 0
280 2 35 0 0  .2 0 0
281 2 46 1 0  .3 0 0
282 2 56 0 0  .8 0 0
283 2 75 1 1  .7 1 0
284 2 76 0 0   1 0 0
285 2 24 1 0   1 0 0
end


I have first tried a standard Kaplan-Meier survival analysis, as follows:

Code:
stset follow_up, id(patient_id) failure(thrombosis == 1)
by cohort_num, sort: stptime
sts graph, failure by(cohort_num)
sts test cohort_num, logrank
stcox cohort_num age genderm

Then I tried a competing risk survival analysis, as follows:

Code:
stset follow_up, id(patient_id) failure(thrombosis_competing_analysis == 1)
stcrreg cohort_num, compete(thrombosis_competing_analysis == 2)
stcurve, cif at1(cohort_num = 1) at2(cohort_num = 2)
stcrreg cohort_num  age genderm, compete(thrombosis_competing_analysis == 2)

My questions are:

1) in the competing risk analysis is there a way to provide the failure rate (or the cumulative incidence) for each of my cohorts? Something similar to the stptime command of the standard survival analysis?


2) I have read that to compare two cumulative incidence curve I should use the Gray's test (which is the corresponding of the log-rank test). Is there a way to perform the Gray's test in Stata?
Lacking of the Gray's test, the only possible alternative that came to my mind was to provide the p value of the corresponding SHR from stcrreg (e.g. if cohort_num is not statistically significant, can I say that there is no difference between the two curves?)


Many thanks in advance.

Nicoletta

Unbalanced Panel Data

$
0
0
Hi everyone,

I have the following unbalanced panel data which stands from year 2000 to 2019:
ID Year X1 aweight
1001 2000 0 0.6
1002 2000 1 1.05
1003 2000 0 0.114
1004 2000 0 1.03
1001 2001 1 1.156
1002 2001 0 0.59
1003 2001 1 0.89
1001 2002 1 0.123
1002 2002 0 1.17
... ... ... ...
1001 2019 0 0.93
1002 2019 1 1.24
1003 2019 0 1.25
1004 2019 1 1.3
I would like to compute for each year, the difference P0,t-P1,t ; where P0,t is the proportion of "X1=0" in the year t; P1,t is the proportion of "X1=1" in the year t and the column "aweight" denotes the analytical weight. Specifically, I would like to have something like that:
Year P0,t - P1,t
2000 ..................
2001 ..................
2002 ..................
....
2019 ..................
Thank you in advance for your comments and suggestions.

How to sum two variables from different observations of the same category

$
0
0
Hello together,

I have a data set with companies, which conduct deals. Furthermore, these deals have the variables entry year, exit year, entryy_count and exity_count. Each deal has its own ID.

Here a short outline of my data set
Company Deal Entry year Exit year Entryy_count Exity_count Active_deals
1 1 2004 2006 3 2 4
1 2 2004 2006 3 2 4
1 3 2004 2007 3 3 4
1 4 2001 2004 2 1 2
1 5 2001 2007 2 3 2
1 7 2000 2007 1 3 1
2 8 ... ... ... ...
2 9 ... ... ... ...
The variable that I want to calculate in stata is "Active_deals". This variable is the sum of the number of deal entries (Entryy_count) in one investment year + the number of deal exits (Exity_count) that were made in this investment year.

For Deal #1 - #3 the "Active_Deals" Variable should be "4" because these deals are made in 2004 and in addition, one firm is sold (Deal #4) in 2004 by Company # 1. Consequently, 4 deals are active in the investment year 2004 for Company #1.

I tried several commands (e.g. egen and by) but I did not find the right way to calculate the active_deals variable.

I am happy for suggestions.

Regards,
Sebastian

Generate New Variable Only For Certain Data

$
0
0
Hi all,

I tried searching for this and I just can't find the exact answer. It seems like this should be easy, but I just can't get it to work.

I want to create a new variable that is based off existing data, but only when it has certain data in a different variable. For example, I have a "countryID" variable that assigns a number depending on which country the data came from.

So if I want to create a new variable that only has data from USA (and that is coded "1" in CountryID) how would I do this? Is there a type of "when" command that could help? I couldn't find one.

I tried:

generate USAOnly = CountryID=1

Thank you!

String timevar to quarterly panel data

$
0
0
Hello,

I have a dataset that I want to declare as panel.
However, the quarters (my timevar) are categorised as a string variable.
How do I convert 'Period' (my timevar) into a variable Stata will recognise as quarters?

Thanks

[CODE]
* Example generated by -dataex-. To install: ssc install dataex
clear
input str15 Period
"Jan 16 - Mar 16"
"Apr 16 - Jun 16"
"Jul 16 - Sep 16"
"Oct 16 - Dec 16"
"Jan 16 - Mar 16"
"Apr 16 - Jun 16"
"Jul 16 - Sep 16"
"Oct 16 - Dec 16"

Regress by choosing time period condition

$
0
0
Hi everyone , it is my first post.

I have the following unbalanced panel data which stands from year 200308 to 201812 , and there is a thousand of id .
Array
id date y x1 x2
61 200308
-0.00906
-0.08343
54.98228
61 200310
0.016922
0.012575
-62.3779
61 200402
0.009859
-0.00543
-7.41705
61 200405
0.014048
0.088575
57.44393
61 200406
-0.00561
-0.00443
-24.738
61 200507
-0.00814
-0.06143
-35.5317
61 200601
-0.00127
0.002575
54.0285
61 200602
0.021544
0.062575
-29.5378
... ... ... ... ...
** the date is unbalanced , so it is not same for each id
Now , I need to regress by date and by id.
Example :
in date (201001) for id (100) , do a regress using this id (100) previous 24month data. To estimate the coef of x2 of id (100) in date(201001).
1) If in that date , the id dont have enough previous 24month , then 12-23month data is ok to regress .
2) If in that date , the id don't have enough 12month data , then don't regress.

How could I deal with such regression condition?

Thank you in advance for your comments and suggestions.







Merging CRSP monthly with Compustat annual

$
0
0
Hello everyone. I'm trying to merge CRSP, from which I downloaded firms on monthly basis data with my general dataset from COMPUSTAT that is on year basis.
More precisely, I'm following this procedure founded in a paper for the construction of one variable:
"Share turnover is the annual average of total monthly trading volume (vol) divided by the number of shares outstanding (shrout)."
If I download the data from CRSP I have the variable "vol" on a monthly basis.
My question is, how can I merge it with my COMPUSTAT database taking the average of 12 months for the variable "vol" for each year ?


Splitting data based on a given condition.

$
0
0
I am new to Stata.I want to split my dataset into 4 parts based on a condition on another column like:
1)High school dropout if var<=11
2)High school graduate if var=12
3)College participant if var between 13 and 15
4)4-year college graduate (16 or more)
Find frequencies for each group

How do I go about doing this?Any help would be appreciated

I can't find &quot;pairwise compare using t tests with pooled SD&quot;

$
0
0

Hi everyone, I'm writing to you because I'm having trouble with an example that one of my teachers gave me. I can't find the path he used to get the "pairwise compare using t tests with pooled SD" and when I copy the same command, I get an error message that says "pairwise.t.test is not a valid command name". Can you help me please ? thank you in advance
This is the result of the teacher Array

Overlay two marginsplot after dydx option

$
0
0
Hi all,
I am running the following to obtain the marginal contribution of unemployment cash transfers and being an unemployment receiver:
Code:
bootstrap : rifsureg log_income unemp_std i.unemp_receiver i.age_cat ib3.pt022 i.pe003 i.pd004 i.Household_type if period==0 & welfare_regime==1 & pt022!=4, qs(5(10)95) 
  
 margins, dydx(unemp_std) nose 
 margins, dydx(unemp_receiver) nose 

marginsplot, xdim(_equation) // to plot the marginal contribution by percentiles of income
However, I would like to obtain a unique marginsplot showing the two marginal effects i.e. the monetary one and being a receiver or not. I tried with combomarginsplot but it failed, returning
_marg_save has a problem. Margins not uniquely identified..
Any suggestion and/or idea on how can I overlay the two marginsplot?
Thanks all

Regress by choosing time period condition

$
0
0
Hi everyone , it is my first post.

I have the following unbalanced panel data which stands from year 200308 to 201812 , and there is a thousand of id .
Array
id date y x1 x2
61 200308
-0.00906
-0.08343
54.98228
61 200310
0.016922
0.012575
-62.3779
61 200402
0.009859
-0.00543
-7.41705
61 200405
0.014048
0.088575
57.44393
61 200406
-0.00561
-0.00443
-24.738
61 200507
-0.00814
-0.06143
-35.5317
61 200601
-0.00127
0.002575
54.0285
61 200602
0.021544
0.062575
-29.5378
... ... ... ... ...
** the date is unbalanced , so it is not same for each id
Now , I need to regress by date and by id.
Example :
in date (201001) for id (100) , do a regress using this id (100) previous 24month data. To estimate the coef of x2 of id (100) in date(201001).
1) If in that date , the id dont have enough previous 24month , then 12-23month data is ok to regress .
2) If in that date , the id don't have enough 12month data , then don't regress.
How could I deal with such regression condition?

Thank you in advance for your comments and suggestions.

Error in Establishing Lagged AR Variable using xtset

$
0
0
Good morning,
I'm working with a balanced panel time series dataset, ultimately l'll be running a linear regression with panel-corrected standard errors using xtpcse. The data is monthly and I'm trying to designate the lag for autocorrelation at 12 months. Here is the code I'm using:

xtset parknum monthnum, monthly delta(12)

'monthnum' is a sequential number from month 1 to 480. I keep getting the following error:

time values with period less than delta() found

I'd like to correct for seasonal (same month, previous year) autocorrelation in the model, and just have those 12 months of dependent variable data thrown out. What am I doing wrong here to keep getting these errors?

Thanks in advance for any help you can provide.

Averaging certain variables across different variable combinations

$
0
0
Hey all. I have some beneficiary data which I need some help with. I need to know whether something in my mind is possible at all, and if it is, what commands can help me

I need to know the average of three variables for various kinds of customers.

These variables are: annual income, family size, age

I need to know the average of these variables for different customer profiles. And these profiles include only categorical variables: gender (male, female), location (rural, urban, semi-urban), employmenttype (bluecol, whitecol, unemployed). I believe these are 18 different customer profiles

Thanks in anticipation

Panel Data Collinearity and Omitted Variables

$
0
0
Good afternoon,

I am currently doing research on the economic effects of certain proscription measures. Economic effects are operationalized as GNI per capita, Atlas, GDP growth, inflation rates, and net barter in terms of trade index (2000=100). I am using unbalanced panel data (country year) with a large n and small t. "xtreg..., re/fe vce(robust)" has done well for me until my most recent regressions. Hypothesis 1 uses random effects for the whole dataset which considers almost all countries. Hypothesis 2 uses fixed effects for a certain number of those countries, using "if y==1." I recently added specific controls for each economic operationalization. Only after this change, some of my most important variables showed be omitted due to collinearity. I cannot seem to fix the error. Did I add the wrong economic controls or do I need to do something else? The omitted variables are sometimes my most important independent variables and so necessary for this project.

The new control variables for each economic operationalization:
GNI, Atlas: Population growth, gross capital formation, labor force participation rate
GDP Growth: Official exchange rate, gross capital formation, labor force participation rate
Inflation: Broad money to total reserves ratio, real interest rate, official exchange rate
Net Barter: Official exchange rate, tariff rate mean of all products

All suggestions would be greatly appreciated!

ARDL Modelling

$
0
0
I am running some analysis looking at the effect of a depreciation of a currency on commodity trade. My variables are stationary, so i applied first difference to them and one variable was still non stationary. I applied varsoc to the variables in original log form and the first differenced variables and the optimal number of lags was 4. I then applied Vecrank and it showed that the variables are cointegrated at order 2. I applied ARDL with 4 lags. An error then occurs saying -

ardl lnrex lnx lnm lnukgdp lnchinagdp, lags(4 4 4 4 4) ec
note: L.lnchinagdp omitted because of collinearity
note: L3.lnchinagdp omitted because of collinearity
note: L4.lnchinagdp omitted because of collinearity
Collinear variables detected.

What does this mean. I am able to generate ARDL results using 3 lags which is confusing

How to draw &quot;Plot of residuals with a Lowess line in STATA&quot;

$
0
0
Could you please help me to draw the graph as the below pic?
Thanks so much for your all help.
Array

stcox and handling of events with zero durations

$
0
0
Hi all,

I'm trying to understand why, when -stset-ing my event data, Stata will ignore events that have duration times of exactly zero. It's appears that Stata uses the logic that event times must be strictly positive. It's there a way around this, other than artifically setting times for those with _t=0 to something positive and very small (like 0.001)?

Here's an example toy data and selected output.

Code:
input byte(id t fail)
1 0 1
2 0 1
3 0 0
4 0 0
5 0 0
6 2 1
7 3 1
8 4 1
9 10 1
10 12 1
11 14 1
end
Code:
. stset t, id(id) failure(fail==1)

                id:  id
     failure event:  fail == 1
obs. time interval:  (t[_n-1], t]
 exit on or before:  failure

------------------------------------------------------------------------------
         11  total observations
          5  observations end on or before enter()
------------------------------------------------------------------------------
          6  observations remaining, representing
          6  subjects
          6  failures in single-failure-per-subject data
         45  total analysis time at risk and under observation
                                                at risk from t =         0
                                     earliest observed entry t =         0
                                          last observed exit t =        14

. sts list


           Beg.          Net            Survivor      Std.
  Time    Total   Fail   Lost           Function     Error     [95% Conf. Int.]
-------------------------------------------------------------------------------
     2        6      1      0             0.8333    0.1521     0.2731    0.9747
     3        5      1      0             0.6667    0.1925     0.1946    0.9044
     4        4      1      0             0.5000    0.2041     0.1109    0.8037
    10        3      1      0             0.3333    0.1925     0.0461    0.6756
    12        2      1      0             0.1667    0.1521     0.0077    0.5168
    14        1      1      0             0.0000         .          .         .
-------------------------------------------------------------------------------

* all subjects now accounted, but person-time is artificially inflated.
. gen double t_mod = t
. replace t_mod = 0.001 if t==0

. stset t_mod, id(id) failure(fail==1)

                id:  id
     failure event:  fail == 1
obs. time interval:  (t_mod[_n-1], t_mod]
 exit on or before:  failure

------------------------------------------------------------------------------
         11  total observations
          0  exclusions
------------------------------------------------------------------------------
         11  observations remaining, representing
         11  subjects
          8  failures in single-failure-per-subject data
     45.005  total analysis time at risk and under observation
                                                at risk from t =         0
                                     earliest observed entry t =         0
                                          last observed exit t =        14

. sts list

           Beg.          Net            Survivor      Std.
  Time    Total   Fail   Lost           Function     Error     [95% Conf. Int.]
-------------------------------------------------------------------------------
  .001       11      2      3             0.8182    0.1163     0.4474    0.9512
     2        6      1      0             0.6818    0.1578     0.2861    0.8894
     3        5      1      0             0.5455    0.1755     0.1798    0.8072
     4        4      1      0             0.4091    0.1768     0.0996    0.7072
    10        3      1      0             0.2727    0.1622     0.0413    0.5887
    12        2      1      0             0.1364    0.1260     0.0071    0.4480
    14        1      1      0             0.0000         .          .         .
-------------------------------------------------------------------------------

GLMM for &gt;2 levels available in Stata 11?

$
0
0
Hello,
I'm trying to run a GLMM with a negative-binomial distribution for a dataset that has 4 levels. Can Stata 11 accommodate more than 2 levels? If so, could you point me in the direction of the correct command or menu for this?
Thanks for your help,
Alicia

Estimating the impact of changes in capital requirements on bank lending!

$
0
0
Hi!

I am writing my master thesis on the impact of changes in capital requirements on bank lending to households and corporations across countries. In doing this, I have selected several control variables, both bank specific and macroeconomic. I am using country level data (aggregated bank data), and have been trying to run Pooled OLS, Fixed Effects and Random Effects models, and after performing several diagnostics checks, i.e Breusch Pagan LM and F-test, I have found out that Pooled OLS is the most appropriate model for my dataset.

However, as I am looking to estimate the impact on the banking sector across countries, I am wondering if there is a way to check this effect for each country separately without having the run the regression on each country? I am looking to estimate if impact of changes in capital requirements are stronger in economies like Italy and Greece, than for example more settled economies like Sweden and Germany.

Below is a picture of the output I get when I run the Pooled OLS regression. As this output gives me the significance across countries, and not for each country individually, I am just wondering if there is a way to extract the effect on each

[ATTACH=CONFIG]temp_17575_1586199047053_452[/ATTACH]

Hopefully this was clear, but if you have any issues understanding my request, do not hesitate to ask for a clarification!

Thank you in advance!

MANOVA - how to include control variables

$
0
0
Hi everyone,




For our thesis we make use of stata 16. The aim of our thesis is to find out to which extend chain affiliation may affect quality. We want to test if chain affiliation (coded 0 if non-chain, coded 1 if chain) has an impact on several quality measures (on the one side quality of life, which consists of 11 quality measures going from 1-5, and on the other side quality of care, which consists of 9 measures and are percentages). So we want to do two regressions: one to test chain affiliation (= independent variable) on quality of life (dependent variables) and one regression to test chain affiliation (= independent variable) on quality of care (= dependent variables).

We already did a MANOVA-test to determine whether there are differences between multiple independent variables, in our case these are chain vs non-chain, are significant (and they were). But now, we want to include control variables (which are dependent variables). Does anyone know how to do this?




Tank you in advance!
Viewing all 65488 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>