Quantcast
Channel: Statalist
Viewing all 65024 articles
Browse latest View live

Confidence Interval Estimate for Cohen's h

$
0
0
Cohen's h is a standardized effect size for the difference in two proportions. It uses an arcsine transformation of the two proportions. Was wondering if anyone knows of a way to calculate 95% CI estimates for this statistic? Thanks.

reg3 and fixed effects

$
0
0

I'm running a 3SLS (reg3) including time fixed effects. I also wanted to include firm fixed effects but it exceeds the maximum number of variables allowed by STATA. Is there some "absorb" to deal with this? Thanks in advance.
Antonio

Building matrix knowing values positions

$
0
0
Hello,

I am finding troubles building a sort of matrix from a dataset that looks like this :

The values are numeric values.
the column numbers and the row numbers are not always successive, and are not ordered.
row_nb column_nb value variables_name series_name
1 2 val1 budget revenue
2 2 val2 budget expenditure
6 2 val3 budget receipts
.. .. ... ..
2 3 val4 account1 expenditure
1 3 val5 account1 revenue
6 3 val6 account1 receipts
.. ... ... ...
6 4 val7 account2 receipts
....

What I want to do is, let's say that the matrix is M.
budget account1 account2
revenue M(1,2) val1 M(1,3) val5 M(1,4)
expenditure M(2,2) val2 M(2,3) val4 M(2,4)
empty row
empty row
empty row
receipts M(6,2) val3 M(6,3) val6 M(6,4) val7
... ... .... ...
And after that I will delete all empty lines.


Can you please help me with this ?

Thank you

Creating monthly data from a string date variable

$
0
0
Hello,

Could anyone help me find out what this is not working?

Code:
. des dtnasc

              storage   display    value
variable name   type    format     label      variable label
-------------------------------------------------------------------------------------------------------------------------------
dtnasc          str8    %8s                   

. list dtnasc in 1/10

     +----------+
     |   dtnasc |
     |----------|
  1. | 01012004 |
  2. | 01012004 |
  3. | 19012004 |
  4. | 22012004 |
  5. | 13012004 |
     |----------|
  6. | 13012004 |
  7. | 18012004 |
  8. | 31012004 |
  9. | 02012004 |
 10. | 02012004 |
     +----------+

. gen anomes = monthly(dtnasc,"MY")
(2801727 missing values generated)

. tab anomes, mis

     anomes |      Freq.     Percent        Cum.
------------+-----------------------------------
          . |  2,801,727      100.00      100.00
------------+-----------------------------------
      Total |  2,801,727      100.00

Parametric survival analysis: testing proportionality.

$
0
0
I have a parametric survival model with the baseline hazard specified according to a Weibull distribution. I'm looking to test the assumption that hazard ratios are proportionate over time through the inclusion of a covariate*time interaction variable, but am wondering whether there's any particular reason I should choose one function of time over another when deriving such a variable.

I've seen some individuals opt to use linear time when using an exponential or Gompertz baseline function, and log time when using a Weibull function: https://lra.le.ac.uk/bitstream/2381/...HER_MJ_PhD.pdf

Elsewhere I've seen it mentioned that 'for proportional hazards models such as the Weibull, there is no method for the detection for non-proportional hazards': http://pan.oxfordjournals.org/content/18/2/189.abstract

In case it makes any difference, my covariate is a continuous variable scaled to log base 2.

Thoughts?

Difference in Abadie-Imbens Standard Errors between <psmatch2> and <teffects psmatch>

$
0
0
Hello,

I am using propensity score matching for a sample with 462 observations. I have been using <teffects psmatch> for nearest neighbor matching. However, when limiting matches to be within a certain caliper I am using <psmatch2> because it completes the operation when there are not matches within the caliper for all units. However, I have realized that (prior to implementing caliper matching) the standard errors reported by the two commands are not equal, even after specifying the <ai(#)> and <ties>option for <psmatch2>. The standard errors given by <teffects psmatch> are similar to those given by <psmatch2> when I use the option <vce(robust, nn(3))>. I realize that this is different than the default because the default AI standard errors with <teffects psmatch> uses 2 matches. However, the vce(robust, nn(#)) option is only allowed when matching with k>2 neighbors. My Stata code is below:

// teffects vce(robust,nn(3)) k=3
teffects psmatch (my_outcome_var) (treatment covariates),nneighbor(3) vce(robust,nn(3)) atet /*SE=9546*/
//psmatch2 AI k=3
psmatch2 treatment (covariates), neighbor(3) outcome(my_outcome_var) logit ai(3) ties /*SE= 9552*/

*****************************************
// teffects default k=1
teffects psmatch (my_outcome_var) (treatment covariates),nneighbor(1) atet /*SE= 1302*/

//psmatch2 AI k=1
psmatch2 anggota_apkj (my_outcome_var) logit ai(1) ties /*SE= 10,142*/


When k=3 the coefficient is the same for each command. However, when k=1, the coefficient is different for the two commands. Why I am I getting different results? Are there any suggestions for changes that I can make in order to be able to get the same output for each so that I am comfortable using <psmatch2> for caliper matching?

Many thanks,
Corinna

Difference in Abadie-Imbens Standard Errors between &lt;psmatch2&gt; and &lt;teffects psmatch&gt;

$
0
0
Hello,

I am using propensity score matching for a sample with 462 observations. I have been using <teffects psmatch> for nearest neighbor matching. However, when limiting matches to be within a certain caliper I am using <psmatch2> because it completes the operation when there are not matches within the caliper for all units. However, I have realized that (prior to implementing caliper matching) the standard errors reported by the two commands are not equal, even after specifying the <ai(#)> and <ties>option for <psmatch2>. The standard errors given by <teffects psmatch> are similar to those given by <psmatch2> when I use the option <vce(robust, nn(3))>. I realize that this is different than the default because the default AI standard errors with <teffects psmatch> uses 2 matches. However, the vce(robust, nn(#)) option is only allowed when matching with k>2 neighbors. My Stata code is below:

// teffects vce(robust,nn(3)) k=3
teffects psmatch (my_outcome_var) (treatment covariates),nneighbor(3) vce(robust,nn(3)) atet /*SE=9546*/
//psmatch2 AI k=3
psmatch2 treatment (covariates), neighbor(3) outcome(my_outcome_var) logit ai(3) ties /*SE= 9552*/

*****************************************
// teffects default k=1
teffects psmatch (my_outcome_var) (treatment covariates),nneighbor(1) atet /*SE= 1302*/

//psmatch2 AI k=1
psmatch2 anggota_apkj (my_outcome_var) logit ai(1) ties /*SE= 10,142*/


When k=3 the coefficient is the same for each command. However, when k=1, the coefficient is different for the two commands. Why I am I getting different results? Are there any suggestions for changes that I can make in order to be able to get the same output for each so that I am comfortable using <psmatch2> for caliper matching?

Many thanks,
​Corinna

tabulate function: how to group and create ranges with continuous variables?

$
0
0
Hello Stata users,
I have a dataset with two continuous variables: the households income and the monetary bonus a family obtains if fulfills some parameters. Something like that:
household_income bonus
100 100
200 0
300 0
400 100
500 0
600 100
700 0
800 0
900 0
1000 100
I would like to create a synthetic table to understand how many families obtains the bonus, grouped by income ranges.
Someone can help me?
Thanks,
Andrea

Error: too many variables specified

$
0
0
I am using VAR model. The number of variables in my model is 1828. I am getting the following error:

too many variables specified
r(103);


Please provide any solution to the above problem.


comparing observations through matrix manipulation

$
0
0
Dear all

Suppose i have the following variables:

var1 var2 var3 id var5 var6

var1 - var3 have say 100 observations each, while id, var5 and var6 have 7.

I would like to eliminate ids that have an increased value of var5 and reduced value of var 7 compared with at least one other id. However i would like at the same time to retain the rows in the other variables (var1 var2 var3). If the above condition is not met then no change should be made.

I think the way to go forward would be through matrix manipulation but so far i have been unable to implement this.

Does this require mata? Any help will be appreciated.

Cynthia

Creating variables with loops

$
0
0
Dear all

I am having some trouble with the generation of variables with loops. The problem I am facing looks as follows:

For all observations I have two variables: v_1 and v_2.

- For the first observation, the first variable, v_1, takes the value 10 and the second variable, v_2, takes the value 1.
- For the second observation, v_1 takes the value 5 while the second variable v_2 takes the value 2.

What I want to do is to generate new variables that all take the value stored in v_2. The thing I am having difficulties with is that I want to generate the amount of variables that is stored in v_1.

Thus, in total, I want to generate 10 new variables, all taking the value 1 for the first observation. For the second observation, the first five new variables should take the value 2 and the latter 5 should remain empty.

The actual data I have is much more complex which is why I am looking for a solid solution and probably will have to work with loops?

I tried to work the functions foreach and for and also read through the following document: http://www.stata-journal.com/sjpdf.h...iclenum=pr0005

However, I still cannot find a solution.

All the best and thank you in advance,
Max

Predicting Residuals by groups

$
0
0
Hi,
I am new to Stata and I need some help on the following problem.
I have used regression by groups and would now like to predict residuals for each group. Lets say I have estimated the regression based on the industry code of the companies and for each industry code I now need the residuals. The simple predict, residuals is not helping nor is e(sample) as it only computes the regression based on the last estimation. I need to get the residuals based on groups.
Any help would be much appreciated.

Restructuring data

$
0
0
Hi,

I am currently learning to use stata, in order to asses the inter-rater reliability (IRR) of my data. My data consists of four different coders who coded streets on different feautures (number of bars, number of CCTV camera's etc.). A subset of this data has been analyzed independently by two observers. However, the composition of the two coders differs per street.

One of the ways I want to asses IRR is through the use of kappa. However, my inexperience does not allow me to structure the data as wished.
Currently it looks something like this (simplified of course):
Streets Bars Observer
1 5 A.
1 4 B
2 5 C
2 3 D
3 3 A
3 3 D
I am thinking I should use the command kap observerA-observerD, but I don't know how to order the data to obtain data which looks similar to this:
Street observerA observerB observerC observerD
1 5 4 . .
2 . . 5 3
3 3 . . 3
I am hoping someone can help me!

Kind regards,

Dymph

Problems with Unicode Stata 14

$
0
0
Currently, I'm using Stata 14. I bought a computer here in South Korea, I changed the display system and the system local for non-unicode programs. I made a do file back in 2014 that all the coding was in Spanish and instead of getting a clean do file, I get the do file like the image bellow.

I need the do file without the black boxes because when I run it I get error because it doesn't recognize the variable name.

Array

PSM with Multiple Treatment - Iterations

$
0
0
I am trying to use PSM with Panel Data and Multiple outcome. My treatment variable is comprised of four treatments. I have reshaped the data but when I run the mlogit model, it keep on generating a number of iterations. At the end warning appears, "convergence not achieved". A number of variables are omitted and some strange results appears. Initially, when I was using mlogit without reshaping the data, the results of mlogit were fine. But I guess for PSM I have to reshape the data. If somebody knows about this problem, please help me.

Ticks alignment in a graph with two y-axes

$
0
0
Dear all,

I am writing here because I need some help to fine tune a graph's aesthetics.
My graph uses two y-axes which have different scales. On the LHS I use a 0(.5)2.5 scale regarding a bar chart, while on the RHS I use a 0(.2)1 concerning a scatterplot, the latter being superimposed to the bar chart.

The two y-axes have the same number of ticks. However, they are not perfectly aligned in that, when I use grid lines for the LHS y-axis, they do not correspond (although slightly, yet visibly) with the RHS y-axis ticks.

Here's the code I am using: twoway bar avg_GH n_country, barw(.7) color(emidblue) xlabel( ang(45) labsize(vsmall)) xtitle("") ytitle("titleLHS" , size(small) axis(1)) yscale(titlegap(3) axis(1)) yla(0(.5)2.5, axis(1)) yaxis(1) ///
|| scatter avg_PH avg_MH n_country, msymbol(O D) msize(small small) yscale(titlegap(3) axis(2)) yla(0(.2)1, axis(2)) yaxis(2) ytitle("titleRHS", size(small) axis(2)) legend(rows(1) order(1 2 3)).

Any ideas on how to fix this would be greatly appreciated. Thanks in advance!



Balancing panel, fixed effects difference in difference with xtnbreg

$
0
0
Dear All,
I have data on dispensed opioid drugs for 2 states. The data spans from January 1, 2011- December 15, 2013 for both states. On September 1, 2012 State 1 implemented a policy for stricter opioid prescribing rules. I am trying to do a difference in difference study comparing states 1 and 2 to see treatment effect of the government policy mandating stricter prescribing rules in state 1. My data is dispensed data from pharmacies. It looks something as follows:

patient ID Doc. ID datescript written state policy treat (state treat*post policy)
1 2 1/5/2011 1 0 0
2 2 5/8/2011 1 0 0
2 1 10/1/2012 1 1 1
3 4 2/9/2011 1 0 0
3 4 5/7/2012 1 0 0
3 4 12/12/2012 1 1 1
3 4 6/8/2013 1 1 1
4 1 1/2/2013 1 1 1
5 3 1/1/2011 0 0 0
6 3 5/8/2011 0 0 0
6 5 11/3/2012 0 1 0
7 5 8/4/2012 0 0 0
7 5 11/4/2012 0 1 0
7 5 12/4/2012 0 1 0
8 6 5/6/2013 0 1 0
8 6 9/9/2013 0 1 0

For each state (1- treated, 2 not treated) we have the following:
  1. The data has 3 types of doctors and 3 types of patients– those who show up just before policy change, show up both before and after policy change, show up after policy change only
  2. Unbalanced panels of doctors and patients – do I need to balance the panel? How – daily level, prescriber level, patient level?
  3. I don’t observe patients who for any reason, policy or otherwise, did not get any opioids dispensed. How do I account for these folks? If they didn’t get opioids post policy due to the policy effect I will underestimate the impact of the policy.
  4. Daily level data from 1/1/2011-12/12/2013

Research questions I want to answer:
  1. Did policy result in fewer doctors prescribing opioids? I think this should be simple count data difference in difference estimate.
  2. Did policy result in each doctor writing fewer opioid scripts? I think i need to have a count data, (doctor) fixed effects difference in difference model. Not sure how to estimate this with the xtnbreg command.
  3. Did policy result in fewer patients getting opioids? Again a count data fixed effects regression with DiD.
  4. Did each patient get fewer scripts? Count data, (patient) fixed effect DiD estimation?
Sorry for such a long post. But I am really struggling with dispensed data which needs to be set up as a panel - patient, prescriber - and then count outcomes to add to the complexity. Any help would be much appreciated.
Sumedha Gupta.

Fama MacBeth approach

$
0
0
Hi everyone,

I have run 132 cross-sectional regressions of one dependent variable on 9 independent variables. I already have the coefficients and t-statistics for each variable (i.e. 132 coefficients and 132 t-statistics for each of the 9 variables). My question is how can I apply the Fama Macbeth approach to have time series average coefficients and t-statistics. Is it enough to average the coefficients and t-statistics or is this approach completely wrong? If I am wrong, how can I calculate the time series average coefficients and t-statistics on Stata? Thank you in advance.

too big for joinby - operations on pairwise combinations within groups -

$
0
0
I have 490,000 observations in groups (var1), each with between 1 and 8100 observations (var2). I need to do an operation on var3 from all pairwise combination within each group (including i=i matches, so Nk*(Nk+1)/2 for a group with Nk observations); I then need to sum across the group (or do a running sum as these calculations are done). I have dropped all unnecessary variables but don't have the RAM to do this with joinby. Could this possibly be done with loops?

Commands for lagged dependent regression when using three indexes

$
0
0
I hope some of you can help me with what command to use when analyzing panel data with lagged dependent. I couldn't find any threads containing what specific commands to use.

I'm analyzing X: leadershipstyles (three seperate indexes from 0-100) --> Y: sick absence (in number of days) which are the one I need to lag.
I have a time-variable: 0: before treatment, 1: after treatment. Furthermore some control variables.

I've set the dataset as time series.

Now, how would the command look like? Should I have all my three indexes in one command, or three separate?

Thanks!
Viewing all 65024 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>