Quantcast
Channel: Statalist
Viewing all 65664 articles
Browse latest View live

non-stationary dependent variable

$
0
0
Dear all,

Hope you are well.
I am using panel data and I have found out that my dependent variable is non-stationary.
Is it possible to use the instrumental variable estimation using first difference variables?



Kind Regards,
Katerina
Stata/ SE 16.0

Measures of accuracy of predicted probabilities (QPS), (LPS), (KS)

$
0
0

Hello all,

I am trying to predict future recessions using a Probit model and now I want to evaluate the out-sample forecasting performance using several measures such as the QPS (briers score), log probability score (LPS) and the Kuipers Score (KS).
I was wondering how to perform those tests in stata since I’m pretty new to the program and wasn’t able to find any code or package in order to calculate those scores automatically.

Thanks in advance and best regards,
David
Stata / SE 15.0

Gini coefficient by group and year

$
0
0
Greetings,

Hope everyone is keeping safe.

I would like to compute the Gini coefficient for a large number of regions by year.

I would like to generate a new variable and have tried the following loop;

Code:
foreach yr of numlist 1970/1972 {
    gen gini = . 
    qui ineqdeco income if year==`yr', by(group)  
    replace gini = $S_gini if year==`yr' 
}
However, I get the following error;

Code:
variable gini already defined
Any help will be highly appreciated. Thank you and keep safe!

Best,

Chiara

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input byte group int(year income)
1 1970 1200
1 1970  720
1 1970 2160
2 1970 2400
2 1970 1440
2 1970 4800
3 1970 3480
3 1970 2616
3 1970    0
1 1971  360
1 1971 1200
1 1971  960
2 1971 2760
2 1971 3600
2 1971    0
3 1971 6000
3 1971  960
3 1971 3600
1 1972  600
1 1972 2472
1 1972  600
2 1972 1200
2 1972 4944
2 1972 7944
3 1972 6000
3 1972 2400
3 1972    0
end

Calculating weighted totals (sums) of a variable by groups (levels) of another variable

$
0
0
Hello everybody;
I'am new in this forum. I've studied Stata at my university for about 2 years, and now i'am preparing a paper for my graduation. And I needed Stata to deal with a 25000 observation dataset
Here's a little of it:

id region salaries weight
1 1 1200 23.15
2 1 500 7.65
3 2 100 10.12
4 2 1700 2.95
5 3 2050 14.00
6 3 1435 3.60

My goal is to calculate weighted totals (sums) of salaries by region
I've tried most of the commun commands: bysort egen, bysort asgen (bysort doesn't allow weights), i've tried collapse but i didn't understand how to use it with all it's options for my kinda of data and goal

I would be so happy to hear your suggestions about a Stata comman or algorithm that could help me get the total of salaries, by region, considering the weights

THANKS VERY MUCH !

recode with different conditons.

$
0
0
Hi,

I need to generate a categorical variable from a continuous variable (sleep time) but I want participants to be in different categories depending on their age. I have attempted to achieve this using recode but have received an error message:

Code:
. recode sleep min/539 = 1 540/660 = 2 661/max = 3 if age<14
& min/479 = 1 480/600 = 2  601/max = 3 if age>=14, generate(sleep_duration)
min not found
r(111);

Tag question

$
0
0
I created a tag for my ID variable. When I try to look at my IDs with the tag=1, I receive an error message-- "not allowed." How do I look at casefile IDs with tag =1 in browse mode?

Thanks!

Lisa

Logit regression discrete dependent variable panel data

$
0
0
Hi

I am having difficulties with my dataset in regards to my dependent variable (pctagreechina) being discrete. The variable states the percentage of how much a country voted in accordance with China in UNGA in a given year. Thus fx 1 = 100% and .83 = 83%. I have previously tried to use xtreg for my statistical analysis but the results did not make much sense. However, after searching through the internet I found that it is because with xtreg STATA considers my dependent variable as continous.

How can I run my regression (I wish to used fe)? And is there a command that I should use to tell STATA that variable 'pctagreechina' is in percentage?

For clarification, I wish to explore the effect of Chinese foreign aid on the recipient country's voting behaviour in UNGA from 2000-2014. I expect to find that countries who have received more in the given period will to higher degree have changed their voting behaviour to be in more accordance with China's.

Thank you in advance!


Code:
    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
     country |        645    22.16279     12.6588          1         44
        year |        645        2007    4.323847       2000       2014
pctagreechina |        633    .8329394    .0751723   .3333333          1
      amount |        645    9.51e+09    7.46e+10          0   1.34e+12
  Population |        645    1.81e+07    2.69e+07      81131   1.76e+08
-------------+---------------------------------------------------------
         GDP |        645    2.36e+10    6.42e+10   3.50e+08   5.68e+11
NaturalRes~s |        645    3.20e+09    8.87e+09          0   7.87e+10
    PolityIV |        612    2.542484    4.990645         -7         10

Creating a dummy variable for one year by using information from the other year [data in long format]

$
0
0
Hi everyone,

I have 2 years datasets (2009 and 2007) which has information on household food borrowing networks. In 2007 and 2009, households were asked to report up to 3 household ids from which they borrow food from. My dataset is in the long format and it's for this reason why for each household in a branch and spotno, I have 3 observations in the _J variable representing the 3 food networks. The variable "food_borrow_netid" contains the household id of that food network. Not everyone mentioned 3 food network ids so there are missing in the variable food_borrow_netid

I want to create a dummy variable (food_link) ONLY for the year 2009 which takes a value of 1 if the food network hh id was mentioned in both 2007 and 2009, 0 otherwise.

example from dataset attached below: In 2007, hh 46 in branch 1 and spot 1 reports borrowing food from hh 45 (mentioned as a first network).

In 2009, the same household mentions borrowing food from hh 45 (mentioned as a third network) and two other hhs (205 and 44). So I want a value of 1 in the food_link variable for hh 45 and 0 for hh 205 and hh 44.


Any help in this regard will be much appreciated?

Some notes:

1) branchid spotno hhno and year together uniquely defines a household. So food network of hh ids (across years) should be searched for in the same branch and spotno.

2) food network ids may not be mentioned in the same network order meaning that in 2007 someone may report borrowing food from hh 45 and mention hh 45 as their first food network, but in 2009 the same hh may mention hh 45 in either 1st, 2nd or 3rd food network or not mention hh45 at all.

3) It's possible that we may a hh in 2009 in a given branch and spot who is not in the dataset for 2007 - for these networks food_link variable can take the value of -99. For everything else, the value will be missing.

Danish

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input byte branchid int(spotno year hhno) byte _j int food_borrow_netid
1 1 2007 46 1  45
1 1 2007 46 2   .
1 1 2007 46 3   .
1 1 2009 46 1 205
1 1 2009 46 2  44
1 1 2009 46 3  45
1 1 2007 57 1  56
1 1 2007 57 2   .
1 1 2007 57 3   .
1 1 2009 57 1 202
1 1 2009 57 2  93
1 1 2009 57 3  47
1 1 2007 85 1  77
1 1 2007 85 2  76
1 1 2007 85 3  78
1 1 2009 85 1  86
1 1 2009 85 2  21
1 1 2009 85 3  90
1 1 2007 86 1  77
1 1 2007 86 2   .
1 1 2007 86 3   .
1 1 2009 86 1  16
1 1 2009 86 2  77
1 1 2009 86 3  87
1 1 2007 87 1  77
1 1 2007 87 2   .
1 1 2007 87 3   .
1 2 2007  6 1   4
1 2 2007  6 2   .
1 2 2007  6 3   .
1 2 2009  6 1   1
1 2 2009  6 2   5
1 2 2009  6 3   4
1 2 2007 26 1  23
1 2 2007 26 2  24
1 2 2007 26 3  25
1 2 2009 26 1  23
1 2 2009 26 2  24
1 2 2009 26 3  25
1 2 2007 30 1  31
1 2 2007 30 2   .
1 2 2007 30 3   .
1 2 2009 30 1  30
1 2 2009 30 2  31
1 2 2009 30 3  36
1 2 2007 36 1  39
1 2 2007 36 2   .
1 2 2007 36 3   .
1 2 2009 36 1  36
1 2 2009 36 2  37
1 2 2009 36 3  39
end

Create a new variable that is the absolute difference between the midpoints of two other variables

$
0
0
Hello,

I want to create a variable that calculates the absolute difference between the midpoints of two continuous variables var1 var2. I need the new variable to be categorised into 1=<1 2=≥1.

Can anyone help?

Many thanks.

Calculatiing cumulative return

$
0
0
Hi everyone,

I want to calculate the cumulative return, which t0=1/30/2020 is the first day, as follows:
cum_re (at t=0) = return (at t=0)
cum_re (at t=1) = [(1 + return (at t=0)) * (1 + return (at t=1))]^(1/2) - 1
cum_re (at t=2) = [(1 + return (at t=0)) * (1 + return (at t=1)) * (1 + return (at t=2))]^(1/3) - 1
...
cum_re (at t=10)
However, I don't know how to code it in Stata.

I would really appreciate all the help I can get.

Best regards

P/s: I attache the small sample for example.
Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input int company_id float(date return t cum_re)
1 21937   .004804236 -5 .
1 21938 -.0028854576 -4 .
1 21941  -.029846454 -3 .
1 21942   .027896367 -2 .
1 21943    .02071675 -1 .
1 21944 -.0014503612  0 .
1 21945   -.04535168  1 .
1 21948  -.002750865  2 .
1 21949   .032480955  3 .
1 21950   .008121479  4 .
1 21951   .011628815  5 .
1 21952  -.013652723  6 .
1 21955   .004738188  7 .
1 21956    -.0060511  8 .
1 21957    .02346898  9 .
1 21958  -.007145829 10 .
2 21937   .006136745 -5 .
2 21938  -.010127876 -4 .
2 21941  -.016864497 -3 .
2 21942    .01940629 -2 .
2 21943   .015472523 -1 .
2 21944    .02781715  0 .
2 21945  -.014868793  1 .
2 21948    .02408653  2 .
2 21949   .032386478  3 .
2 21950 -.0012224098  4 .
2 21951    .02052206  5 .
2 21952  .0014147813  6 .
2 21955   .025820585  7 .
2 21956   -.02283419  8 .
2 21957  .0014628787  9 .
2 21958  -.005428527 10 .
3 21937 -.0015270904 -5 .
3 21938  -.012246813 -4 .
3 21941  -.018049529 -3 .
3 21942    .01353241 -2 .
3 21943   .002559859 -1 .
3 21944   .006800847  0 .
3 21945    .07119624  1 .
3 21948 -.0022526423  2 .
3 21949    .02243314  3 .
3 21950  -.004792484  4 .
3 21951   .005066657  5 .
3 21952   .014068715  6 .
3 21955   .025935084  7 .
3 21956    .00788418  8 .
3 21957   .004267564  9 .
3 21958  -.004700152 10 .
end
format %tdnn/dd/CCYY date

Splitting time series data into training and testing set

$
0
0
Hello,

I am working on a recession probability model using a LASSO probit framework. My goal is to predict recessions out of sample. I want to split my data into a training sample pre 2000 and a testing sample from 2000 until October 2019, currently the ending of my data set. I am having trouble figuring out how to split my data into a training and testing sample. Could anyone provide me some guidance on how to do this? Currently I've gotten my code to run while randomly splitting the sample, but that's not exactly what I want. My data is all monthly. I have posted my code below for consideration.

Code:
*format my date variable at set it as my time series
format %tmMonth_CCYY date2
tsset date2, monthly

*splitsample
vl set, categorical(4) uncertain(0)
vl list vlcategorical
splitsample, generate(sample) nsplit(2) rseed(1234)
tabulate sample

*Here, I use LASSO to select a model based off 8 variables
lasso probit nber_rec spread6monthlag ff6monthlag cape snp6monthlag pmi awhman6monthlag smb6monthslag hml6monthlag if sample == 1

cvplot
estimates store cv

*display a table of information about each of the models that were fit
lassoknots, display(nonzero bic)

*Select the model with the chosen lambda
lassoselect id = 45
cvplot
estimates store firstmodel

*View a table of the variables selected
lassocoef cv firstmodel, sort(coef, standardized)

*Assess the goodness of fit
lassogof firstmodel, over(sample) postselection

*create recession probabilities based off LASSO model
predict rec_prob
summarize rec_prob

twoway (tsline rec_prob nber_rec)
My problem I believe is in the split sample stage. Any help would be greatly appreciated.

Diverging PNG and PDF graphs on Linux machines

$
0
0
SOLVED!
So far I've tried with ubuntu 20.04 and Manjaro (updated on April the 3rd). The issue seems to affect only text, such as legends and axis labels.

Code:
sysuse uslifeexp, clear
line le_male le_female year
graph export /tmp/test.pdf
graph export /tmp/test.png
The above code produces the following results. (PDF on the left, and PNG on the right). Array


As you can see, the legend on the PDF version is weird. It is not a matter of size, because a vsmall legend will reduce it's overall size, but it will also be out of bounds.

Anyone facing a similar issue?
I believe it might be a problem with a missing font. but I don't which font Stata uses for plotting or how to change it.

possibly (un)related and sill unsolved: https://www.statalist.org/forums/for...aro-arch-based


search of model in stata for masters' thesis

$
0
0
Hi everyone,

For our thesis we make use of stata 16. The aim of our thesis is to find out to which extend chain affiliation may affect quality. Therefore we need some regressions to find out if chain affiliation (categorical variable) is associated with multiple variables (numeric) we use to measure te quality. Chain/non-chain is our independent variable while the quality variables are depedent. Our problem is that we don’t know how we can measure this impact in stata. We thought we might could use OLS or ANOVA but are not sure if this is possible and if so how we should interpret those results.
Does anyone know how we could solve this? Or any suggestions.

Thank you in advance

Keeping/Dropping variables based on large list

$
0
0
Hi - I am working with a very large datasets (~15-20GB) and I would like to reduce the number of observations in each dataset.

For example, say I have medical procedures dataset (PROCEDURES) with variables ID (patient's ID), proc (procedure code for the patients procedure), and gender (gender of the patient).
ID proc gender
1 23 M
1 20 M
2 19 F
2 18 F
3 23 F
4 13454 F

Let's say I only care about patients who underwent proc 23 BUT if the patient did undergo proc 23, then I also want to keep all their other observations (ie, ID == 1 underwent proc 23 so I want to keep the first two rows).

My approach so far has been to read in the PROCEDURES dataset and write 'drop if proc != 23', this way I can obtain the unique values of ID for which I want to keep in my datasets (but these are large datasets so this list ends up being ~100,000). Since I am working with many datasets that use ID as the unique identifier (one dataset with procedure codes like above, one dataset with patient address information, one dataset with insurance information, etc.), I would like to preserve this list so that I can go into each separate datasets and only keep the observations that have an ID in the unique list.

Since the list is so long, I don't think it would be useful to write them all out in a keep if command. Does anybody have any suggestions of how to do this effectively? I'm hoping there is something more simple than what I have tried below - which seems to work but is extremely slow compared to running drop commands on these datasets.

I saved the unique list as a separate dataset calling it `Unique ID' and then performing a merge command as follows:

merge 1:m ID using PROCEDURES
keep if _merge == 3
drop _merge

Maybe there is some way to store the list in a macro? I've also been referring to https://www.stata.com/support/faqs/d...-observations/

Cheers,
Peter


How to run regressions with 28 years worth of data

$
0
0
Hello All,

This is my first post here on Statalist.org. I appreciate having the community here for help.

I am currently doing a study on the economic effect of an additional electric power development on emerging economies in Africa. I have a data set of 21 observations (countries) ranging years from 1990-2017. I also have multiple variables that I will be using for this study.
  • Independent Variable:
    • KWH Per Capita
  • Dependent Variables:
    • Education index
    • Life expectancy
    • Mortality rate under 5
    • The adjusted net national income per capita in US dollars
What I am having a problem with is running a regression with any of the dependent variables on my independent variable. I am a bit confused to how to combine all years to see if there is a statistical significant change between each variable. Should I be doing a panel data set since I have 28 years for each country?


I am no expert on Stata, I have done simple regressions with cross sectional data before but never panel data.

Problem with -esttab -, after –gsem- to run negative binomial.

$
0
0
I am having a problem with -esttab -, after –gsem- to run negative binomial – gsem , family(nbinomial mean) link(log). I am using Stata 15.1 for Windows.

I run a series of negative binomial models using – gsem-. The models run correctly. When I use –esttab- to export the results, instead of the neat table typically produced by this command, the coefficients are all scrambled up and repeated. The problem ONLY occurs with the Negative Binomial model (please see below). --esttab- works fine with other types of models using --gsem--. I would really appreciate any help.

Below is a toy example.

Thanks in advanced,

Arkangel


/* Negative Binomial*/
webuse fish
quietly gsem(count <- child, family(nbinomial mean) link(log))
estimate store model1
quietly gsem(count <- child livebait , family(nbinomial mean) link(log))
estimate store model2
quietly gsem(count <- child livebait camper , family(nbinomial mean) link(log))
estimate store model3
quietly gsem(count <- child livebait camper persons , family(nbinomial mean) link(log))
estimate store model4
esttab model1 model2 model3 model4

Array

/* OLS */
webuse fish, clear
quietly gsem(count <- child, family(gaussian) link(identity))
estimate store model1
quietly gsem(count <- child livebait, family(gaussian) link(identity))
estimate store model2
quietly gsem(count <- child livebait camper, family(gaussian) link(identity))
estimate store model3
quietly gsem(count <- child livebait camper persons, family(gaussian) link(identity))
estimate store model4
esttab model1 model2 model3 model4
Array

/* Poisson */
webuse fish, clear
quietly gsem(count <- child, family(poisson) link(log))
estimate store model1
quietly gsem(count <- child livebait, family(poisson) link(log))
estimate store model2
quietly gsem(count <- child livebait camper, family(poisson) link(log))
estimate store model3
quietly gsem(count <- child livebait camper persons, family(poisson) link(log))
estimate store model4
esttab model1 model2 model3 model4
Array

/* Loigt */
webuse fish, clear
quietly gsem(count <- child, family(bernoulli) link(logit))
estimate store model1
quietly gsem(count <- child livebait, family(bernoulli) link(logit))
estimate store model2
quietly gsem(count <- child livebait camper, family(bernoulli) link(logit))
estimate store model3
quietly gsem(count <- child livebait camper persons, family(bernoulli) link(logit))
estimate store model4
esttab model1 model2 model3 model4

Array

Stratified Meta Analysis

$
0
0
I am trying to replicate this code, as I am trying to perform a stratified meta-analysis. I am new to macros, hence I am struggling to understand the process as a whole. Id really appreciate if someone could provide an explanation of the process.


Also, why when creating a local macro its equal to 1


local l = 1
foreach num of numlist 1/11 {
foreach x in design period usstates type_of_cs_analysed parity previous_cs type_of_risk quips_risk{

count if `x'==`num'
if r(N)==0 continue

metan logefsize selogefsize if `x'==`num', randomi lcols(study) nograph
replace glor = r(ES) in `l'
replace gselor = r(seES) in `l'
replace strat="`x'" in `l'
replace stratgrp=`num' in `l'
replace tau2=r(tau2) in `l'
replace nstd=r(df)+1 in `l'



local l = `l'+1
}
}


Thank you




modelling approaches for handling varying number of measurements among individual in mixed-effect models

$
0
0
Hello,
What are the modelling approaches for handling varying number of measurements among individual in mixed-effect models?
Thank you very much

Counting Occurrences in Observations

$
0
0
Hi everyone,
I have a problem here which I hope to find help for. I have a dataset of 3 vs 3 soccer matches, where each match pits teams of 3 players against another team of 3 players, who are all drawn from a pool of players. In each match, each team is assigned to either the HOME team or the AWAY team. I like to calculate for each player in each match,

A) How many times the player has played with the other players in his team as team-mates before
B) How many times the player has played with the other players in his team as opponents before
C) How many times the player has played against the other players in his opponent team as team-mates before
D) How many times the player has played against the other players in his opponent team as opponents before

The sample dataset is reproduced here, where match is the ID for each match, player is the ID for each player, and home is if the team the player is in for the match is the home team or not.

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input double match long player float home
1  2 0
1  1 0
1  3 0
1  4 1
1  5 1
1  6 1
2 11 0
2  1 0
2  5 0
2  3 1
2  2 1
2 12 1
3  5 0
3  1 0
3  6 0
3  7 1
3  9 1
3 10 1
4  1 0
4  2 0
4  8 0
4  4 1
4  5 1
4  7 1
5  9 0
5 12 0
5 11 0
5  3 1
5  1 1
5  4 1
end

Previously I attempted to brute force this by calculating the number of total player-player dyads. But because I have more than 150k players and over 1 million matches, this turned out to be computationally impossible. I am looking for a more feasible method. Any help to calculate this would be most welcome and appreciated. Thank you.


Kenneth Zeng

Problem when estimating modified jones model

$
0
0
Dear all,
Firslty, nice to meet you. It is my first time in this forum and forgive me in advance for any errors related to writing my post.
I'm building the Modified jones model to estimate No discretionary accruals and this is my code
Code:
gen TA= (oancf -ibc)/L.at
gen x1= 1/L.at
gen x2= (D.sale-D.rect)/L.at
gen x3= ppegt/L.at
gen modJones=.
forval y=1985(1)2018{
forval i=1(1)48{
display `i'
display `y'
reg TA x1 x2 x3 if `i'== ffind & `y'== fyear, noconstant
predict r if `i'== ffind& `y'== fyear, resid
replace modJones=r if `i'== ffind& `y'== fyear
drop r
}
}
The problem is that I get
Code:
no observations
r(2000);
Just to let you know that I'm working with COMPUSTAT database from 1985 to 2018. I read other posts were this problem was related to few observations for year and industry (I'm using the Fama Frenh 48 industry classification.
May you help me? I'm new in STATA and sorry in advance for very elementary mistakes
Thank you for your time.
Viewing all 65664 articles
Browse latest View live


Latest Images

<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>