Quantcast
Channel: Statalist
Viewing all 65689 articles
Browse latest View live

total with if condition

$
0
0
Hello!

I have a dataset with a number of startups per row. I want to generate the sum of startups that are part of a treatment group.
My command is: egen obsTR = total(n_startups_kreis) if treatment != .
However, Stata returns the number of cases (13 rows) rather than giving the sum of startups (27). What am I doing wrong?

Array

testing equations

$
0
0
\[ P\left(y_1 = 1| y_2, z\right) = \Phi \left[\frac{z_i\delta +\rho \left(y_{2i}- x_i\pi\right)/\sigma}{\left(1-\rho^2\right)^{1/2}} \right]\]

renvars command

$
0
0
Hi,

I am trying to run the renvars command but it keeps on giving me a particular error.
Command: renvars PERSONID yrssch Age inhh, prefix(FATHER)

Error: command renvars is unrecognized

Simulate correlated binary variables

$
0
0
Hi,

I would like to simulate correlated binary variables.

Theoretically, I should have been able to do the simulation from the following paper

https://journals.sagepub.com/doi/pdf...867X1501500118

and use the command proposed in the aforementioned paper

rbinary

Unfortunately, this command does not work. I get an error message

file "rbinary_simudata.dta" not found
r(601);




Do you have any suggestions? How can I overcome the problem of this command? Or, do you have any recommendation of an alternative way to simulate correlated binary variables?

Many Thanks,
Michalis


Problem with import delimited - characters "¿" "1/2" "1/4" etc

$
0
0
Hi everyone,

I want to import a public data set of standardized tests in Colombia. This information is available as a text file. I tried importing it with the import delimited command, but I encounter problems as some contents of the variables are characters such as "¿" and "1/2". This makes stata divide the values between two variables, creating an non existing variable and moving all contents of the following variables.

My code is:

forvalues x=2009/2019 {

forvalues y=1/2 {

clear
import delimited "${raw_dir}/SB11_`x'`y'/SB11_`x'`y'.txt", ///
delimiter("¬Â") varnames(1)
dropmiss, force // This is to try to solve the problem but doesn't quite work out
cap rename ïestu_exam_nombreexamen estu_exam_nombreexamen
save "${use_dir}/original/S11_`x'`y'.dta", replace

}
}

Thabk you very much for your help!

Tobit model with two steps

$
0
0

Dear statalisters, I am trying to estimate a generalized tobit model with two steps. The first step is a probit equation which determines whether a firm undertakes R&D or not.
HTML Code:
 y1*=x1*b1 +u1
where
y1* is a latent variable and the firm undertakes research if y1*>0.
The second step is a tobit equation which determines the intensity of research if the firm actually does invest in R&D.
HTML Code:
 y2*=x2*b2 +u2
where
HTML Code:
 y2*=y2  (y2=actual observed research intensity) if y1*>0.
The errors u1 and u2 are correlated (jointly normally distributed). Then my question please is: 1)Is there a program/command which allows me to estimate the above steps/equations and the covariance matrix in a simple way? I am using Stata 15 Any suggestion and help is extremely welcome. Thank you very much.

Temporary frame

$
0
0
Dear All,

if a data frame is created with a temporary name it is disposed of after the end of the program, ok.
If it happens to be the current frame when it is disposed of, which frame will become active afterwards?
  • first one alphabetically?
  • first one historically?
  • last active still existing?
  • active at entry to program?
  • random?
  • ...something else?
Thank you, Sergiy

if condition "name global variable == " then enter the subroutine

$
0
0
Hi,

I am trying to come up with an "if condition" that would allow me to enter parts of the code if the global variable == a certain name, otherwise ignore part of the code. I have tried the following but the syntax is not correct:

Code:
global universe "LNASCOMP"

if $universe == "LNASCOMP" { 
 display "Hello, world" }
I am getting the following output:
Code:
. if $universe == "LNASCOMP" {
LNASCOMP not found
Please help. Thanks.



Merging datasets for specific observations in a variable

$
0
0
Hi,

I'm trying to merge two datasets (Disease_reshaped and DevelopmentStage) for only specific observations of a "Disease" variable in the Disease_reshaped dataset. The goal is to create a dataset that merges data for just these observation values. I first created a local of these specific observation values and then created a loop to merge based on just these observation values. However, when running the loop as below, I get the following error "Asthma not found" where the observation value mentioned in the local is reported as not found. This happens for all the values mentioned in the local (i.e. even when I change the order in the local). The values in the local are mentioned as they are found in the dataset. Any guidance on how to resolve this will be much appreciated. Thank you.


Code:
#delimit ;

local classList `" "Asthma", "Pain, nociceptive, general", "Cancer, unspecified", 
"Hypertension, unspecified", "Inflammatory disease, unspecified", "Cancer, breast",
"Infection, HIV/AIDS", "Diabetes, Type 2", "Arthritis, rheumatoid", 
"Cancer, solid, unspecified" "';

#delimit cr

foreach class of local classList {
    use "${datadir}\Pharmaprojects_Disease_reshaped", replace
    keep if Disease == `class'
    merge 1:m DrugNameX using "${datadir}\Pharmaprojects_DevelopmentStage_step1" 
    }

Generating unique id across variables

$
0
0
Hello

I have data in the following format where each firm is audited by either 1 or two auditors.
Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input str7 firm str1(aud1 aud2)
"Firm 1"  "A" "" 
"Firm 2"  "B" "A"
"Firm 3"  "B" "D"
"Firm 4"  "A" "E"
"Firm 5"  "C" "" 
"Firm 6"  "C" "B"
"Firm 7"  "D" "F"
"Firm 8"  "E" "" 
"Firm 9"  "E" "A"
"Firm 10" "F" "G"
end
I want to generate an id variable for each auditor variable in such a way that the same auditor appearing in aud1 or aud2 get the same id. In short, I would like my final output like this:

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input str7 firm str1 aud1 float unique_id_1 str1 aud2 float unique_id_2
"Firm 1"  "A" 1 ""  .
"Firm 2"  "B" 2 "A" 1
"Firm 3"  "B" 2 "D" 4
"Firm 4"  "A" 1 "E" 5
"Firm 5"  "C" 3 ""  .
"Firm 6"  "C" 3 "B" 2
"Firm 7"  "D" 4 "F" 6
"Firm 8"  "E" 5 ""  .
"Firm 9"  "E" 5 "A" 1
"Firm 10" "F" 6 "G" 7
end
Is there a way to get it done? Please help me out

Warm regards

Amish

Identify variables by their order in the dataset

$
0
0
Hi,

I am working with a new feed of data every week (in real time). Therefore, as real time passes my time series dataset increase in size and variable names change due to new real dates added (currently AE, but next (real) week it will become AD etc...)... I would like to identify variables by their order in the dataset. I would to be able to rename by two different methods (large project, I need to do it several times and need the 2 methods)
1) "the second variable in the dataset as currently ordered" and also
2) "the variable next (on the right) of a particular variable", in the example below "DATATYPE"

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input str8 DATATYPE double(AE AF AG AH AI AJ)
"SP500_w" 6247.66 6375.54 6529.42 6389.67 6209.38 6075.76
"VIX_w"     33.84   33.47   27.57   25.66   27.62   27.99
end
This is how I currently rename the variables. I would like to replace "AE" by "the variable next to DATATYPE" or "the second variable in the dataset"
Code:
* Rename variables as t1, t2, t3 etc.
qui ds
    loc lastvar: word `c(k)' of `r(varlist)'
local j 0
foreach var of varlist AE-`lastvar' {
    local j `=`j'+1'
    rename `var' t`j'
}
Any idea how to solve this problem? Thanks.


Import excel dataset to Stata with MM/DD/YYYY dates as 1st row / variable

$
0
0
Hi,

I currently have a dataset that looks like that in Excel:
Code:

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input str8 C str10(D E F G H I J)
"DATATYPE" "12/30/2020" "12/23/2020" "12/16/2020" " 12/9/2020" " 12/2/2020" "11/25/2020" "11/18/2020"
"RI"       "NA"         "NA"         "NA"         "NA"         "NA"         "NA"         "NA"        
end
I am having problems importing the dates to Stata as variables (start with a number not allowed in Stata). I would like to import the dataset in a way that I can keep track of the dates and be able to merge by data with other datasets. This is important because dates are dynamics and change every weeks since I am working with live data. Can someone help? Thanks!



PSM (ATET and ATE)

$
0
0
Hi all,

I will really appreciate your help on this. I estimated my analysis using the PSM method in stata. I calculated both the Average Treatment Effect (ATE) and the Average Treatment Effect on the Treated (ATET) and find opposing results.
ATET- shows a positive and significant result for the effect of childhood obesity on adult depression
ATE- however shows a negative and insignificant result for the effect of
childhood obesity on adult depression. Please could someone help on interpreting this as results are different?
Many thanks.

Metaprop Forestplot Editing

$
0
0
Hello, I am trying to perform a proportion meta-analysis. The dataset is https://imgur.com/F0Eikiz

As suggested in this paper I lauched this command: metaprop num denom, random by(tgroup)cimethod(exact) -- that gave me everything I needed and the forestplot (as in this picture).

However, how can i label each number with author and year?


Merging datasets for specific observations

$
0
0
Hi. The local below comprises of observation values within a variable "Disease". I am to merge two datasets only for the observation values mentioned in the local. However, the code in the loop deletes each disease case/observation value in the local before running the next. So, instead of getting a new dataset which has merged values for all the observation values in "classList", I only get a dataset merged for the last observation value "Cancer, solid, unspecified". What am I missing here? Any help would be much appreciated. Thanks.

Code:
clear 

#delimit ;
local classList
    Asthma
    "Pain, nociceptive, general"
    "Cancer, unspecified"
    "Hypertension, unspecified"
    "Inflammatory disease, unspecified"
    "Cancer, breast"
    "Infection, HIV/AIDS"
    "Diabetes, Type 2"
    "Arthritis, rheumatoid"
    "Cancer, solid, unspecified"
;

#delimit cr


foreach class of local classList {
    use "${datadir}\Pharmaprojects_Disease_reshaped", clear
    keep if Disease == "`class'"
    merge 1:m DrugNameX using "${datadir}\Pharmaprojects_DevelopmentStage_step1", keep(match)
    save "${dodir}\Disease+Stage", replace
    }

How to drop the first twelve month of fund returns

$
0
0
Hello all,

I am new to the Stata. Can anyone provide any assistance as I need to drop the first twelve month of fund returns in long format where the months that contain no returns data do not count. There are lots of funds in my data set. Below is the example of fund 2 from June 2004 to May 2006. Therefore, following the example in fund 2, I would like to remove the returns from May 2005 to April 2006 (12 month returns). Thank you for any help you can provide.


ic_fund year returns Date
2 174 01jun2004
2 175 01jul2004
2 176 01aug2004
2 177 01sep2004
2 178 01oct2004
2 179 01nov2004
2 180 01dec2004
2 181 01jan2005
2 182 01feb2005
2 183 01mar2005
2 184 01apr2005
2 185 -.6 01may2005
2 186 4.5 01jun2005
2 187 -.2 01jul2005
2 188 3.3 01aug2005
2 189 -.1 01sep2005
2 190 -2.7 01oct2005
2 191 -2.1 01nov2005
2 192 2.3 01dec2005
2 193 1.4 01jan2006
2 194 .6 01feb2006
2 195 4.3 01mar2006
2 196 2.9 01apr2006
2 197 -.4 01may2006


stata bar graph binary variable

$
0
0
Dear All,

I am trying to make a graph bar similar to the one attached below (see 'capture'). I have a dummy variable called "religion_dummy" coded 1=jews 2=arabs, and a categorical variable for life satisfaction coded 1=very dissatisfied, 2=not so satisfied, 3=satisfied, 4=very satisfied.

I have found a pretty useful code for this:
graph bar, over(religion_dummy) over( life_satisfaction) asyvars blabel(bar, format(%9.1f))

by running this code I get a graph shown in 'capture 2'

However, I am trying to show the percentage of each category by itself (as shown in the first capture). So for the Jews (Arabs) I would like to see a separate percentage out of all the Jews (Arabs) respondents with a total of 100% for each religion.

I hope I am clear. Let me know if you have any more questions.
Kind regards,
Shir.


Identify the time dimension in unbalanced dataset

$
0
0
Hi there,
I'm having an unblanced dataset which look like as the following:
id year values
1 2000 123
1 2001 234
1 2002 .....
1 2003 ....
2 2001
2 2002
2 2003
2 2004
2 2005
3 2000
3 2004
3 2005
3 2008
3 2009

I would like to balance the unbalanced dataset. Difficuties I face here:
1. for each id, they may not have data across all period.
2. for each id, the year may not be continous. E.g. for id 3 from 2000 to 2009, value of x is missing during 2001 and 2003. This difficulty stops me from trying xtbalance, range(), since I'm not sure about the rangem, that is which years covers the most of my observations.

what I will have finally from the table above:
id year values
2 2004
2 2005
3 2004
3 2005

That is to say, keep the years which covers most of my observations.

Any suggestions will be appreciated!

Reghdfe F test on fixed effects function disabled

$
0
0
Hello!
​​​​​A friend of mine and I are now working on a seminar paper, in which we need to replicate the econometric analysis of a published article. We have chosen an article by Bertrand&Schoar(2003) in which they investigate the affect of executives on company policies. They run regressions with multiple FE(time, firm, and various manager FE).

Since there are multiple FE, reghdfe package suits best for this analysis. But we need to include the F statistic for the fixed effects only, and the function e(F_absorb) is currently disabled.

I've searched the forum, and found out, that if you write 'old' after the reghdfe regression code, the historic version of the package will run, and then the function e(F_absorb) can be used. But this did not work for me.

So my question is: is there a way we can calculate the F statistic for our FE using reghdfe package? We have tried searching the Internet for an answer, but couldn't find one.

Thank you all in advance.

Generate dummies basded on market share and portfolio shares

$
0
0
Dears

I would like to generate a new variables MS & PS as it show in the below:


Array


I used audit fees as X.


According to the previous studies, I generate a dummy variable to measure if an auditor is a specialist and coded as 1 if MS greater than 30% and 0 otherwise. I used the following codes and kindly advise me the codes incorrect:
Code:
bysort sic2 year : egen audit1= sum(audit_fees)
bysort sic2 year audit_ID: egen audit2= sum(audit_fees)
gen MS = audit2 /audit1
gen MS_DUMMY = 0 
replace MS_DUMMY = 1 if MS>.30
I have the following questions:

1- How can I measure a dummy variable if an auditor is a specialist "when an auditor has a market share that is the highest in a given industry and also more than 10 percent higher than the next-largest competitor during the year, and is 0 otherwise".

2- How can I measure PS that shows in the equation above?

3- Some studies use "the ratio of audit fees that an audit office generates in a two-digit SIC industry to the total audit fees generated by an audit office in a Metropolitan Statistical Area (MSA) for a given year". I am stuck at this point and have some difficulties to measure it.






ID : Firms ID.
audit_ID: Auditor ID or key.
sic2: Two-digit of Standard Industry Classification Code (Industry Code).
msa_code: Metropolitan statistical area code (City code).


Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input long ID double year int audit_ID double audit_fees float sic2 double msa_code
1608 2005     4 1265000 37 16980
1608 2006     4 1335000 37 16980
1608 2007     4 1325000 37 16980
1608 2008     4 1440000 37 16980
1608 2009     4 1490000 37 16980
1608 2010     4 1275000 37 16980
1608 2011     4 1745640 37 16980
1608 2012     4 1689980 37 16980
1608 2013     4 1794370 37 16980
1711 2005   748   53000 37 31100
1711 2006   748   54000 37 31100
1711 2007   748   56000 37 31100
1711 2008  5644   41500 37 31100
1711 2009 11556   10750 37 31100
1781 2005     3  173000 15 12060
1781 2006     3  175000 15 12060
1781 2007     3  185000 15 12060
1781 2008     3  203500 15 12060
1781 2009     3  210000 15 12060
1781 2010     3  220000 15 12060
1819 2005   598   12200 73 14460
1819 2006   598   12100 73 14460
1819 2007   598   10000 73 14460
1819 2008   598   17000 73 14460
1819 2009   598   23000 73 14460
1819 2010   598   26500 73 14460
1819 2011   598   26500 73 14460
1819 2012   598   26500 73 14460
1819 2013   598   26500 73 14460
1892 2005     4 1294000 51 35620
1892 2006     7  854000 51 35620
1892 2007     7  912000 51 35620
1892 2008     7  982000 51 35620
1892 2009     7  962000 51 35620
1892 2010 11761  886000 51 35620
1892 2011 11761  959000 51 35620
1892 2012 11761  943000 51 35620
1892 2013 11761  981000 51 35620
1956 2005     2  183000 34 14860
1956 2006     2  220000 34 14860
1956 2007     2  231000 34 14860
1956 2008     2   43000 34 14860
1956 2009  1687  192000 34 14860
1956 2010  8256  173000 34 14860
1956 2011  8256  174000 34 14860
1956 2012  8256  180000 34 14860
1956 2013  8256  188000 34 14860
1993 2005     1 2773000 73 19100
1993 2006     1 3741000 73 19100
1993 2007     1 5142000 73 19100
1993 2008     1 4484000 73 19100
1993 2009     1 3029000 73 19100
2036 2005     3  338100 51 26420
2036 2006     3  425534 51 26420
2036 2007     3  544747 51 26420
2036 2008     3  634253 51 26420
2036 2009     3  846528 51 26420
2036 2010     3  741268 51 26420
2036 2011     3  610922 51 26420
2036 2012     3  827934 51 26420
2036 2013     3  889800 51 26420
2044 2005     7  150500 36 37340
2044 2006     7  141000 36 37340
2044 2007     7  181500 36 37340
2044 2008     7  170000 36 37340
2044 2009     7  183750 36 37340
2044 2010 11761  125000 36 37340
2044 2011 11761  181700 36 37340
2044 2012 11761  181700 36 37340
2044 2013 11761  181700 36 37340
2346 2005     2 9300000 36 41940
2346 2006     2 4500000 36 41940
2346 2007     2 5100000 36 41940
2346 2008     2 4400000 36 41940
2346 2009     2 4000000 36 41940
2346 2010     2 3610790 36 41940
2346 2011     2 3325280 36 41940
2346 2012     2 3485230 36 41940
2346 2013     2 3868890 36 41940
2349 2005     3 3345470 73 29820
2349 2006     3 6502200 73 29820
2349 2007     3 4864620 73 29820
2349 2008     3 5127300 73 29820
2349 2009     3 2824200 73 29820
2349 2010     3 2029550 73 29820
2349 2011     3 1804370 73 29820
2349 2012     3 1900470 73 29820
2349 2013     3 2050300 73 29820
2459 2005     4 3977000 36 35620
2459 2006     4 2872000 36 35620
2459 2008     4 2300000 36 35620
2459 2009     4 1900000 36 35620
2459 2010     4 2200000 36 35620
2827 2005     4 6000000 28 10900
2827 2006     4 5200000 28 10900
2827 2007     4 5500000 28 10900
2827 2008     4 5700000 28 10900
2827 2009     4 6000000 28 10900
2827 2010     4 5700000 28 10900
2827 2011     4 6100000 28 10900
end
Viewing all 65689 articles
Browse latest View live


Latest Images

<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>