Quantcast
Channel: Statalist
Viewing all 65561 articles
Browse latest View live

Reduced form regression

$
0
0
Dear all,

I have the following problem. I am able to estimate the following two stage regression equation:

Yi,T+1 - Yi,T = α + λ (Y*i,T+1 - Yi,T) + εi,T + 1 where Y*i,T+1 = ˆβXi,T

by obtaining Y*i,T+1 first and than solving the first regression equation by inserting Y*i,T+1 into it.

Some researchers however use so called reduced-form model in the following form:

Yi,T+1 = α + λβXi,T + (1 - λ)Yi,T + εi,T + 1

Apart from using two stage model, I would also like to use described one stage model to obtain λ. Could anyone help me how to solve my problem in Stata?

Thank you in advance,
Klemen.

Multilevel regressions and comparing the groups in these regressions

$
0
0
Dear reader,

Currently I am trying to analyse from the data below what industry (sibcode is industry code) gives the most “success” to an individual. This success is based on the ranking of an individual in a list for “richest people in a country” . So, essentially what I want is a ranking of industries from most successful to least successful.

Example of data:
sibcode rank wealth year id
2 1 8500 1998 1
2 3 8500 1999 1
3 2 7400 1998 2
3 2 8800 1999 2
6 1 15000 1999 3
6 3 7000 1998 3
11 4 6000 1998 4
11 4 6600 1999 4
10 5 5000 1998 5
10 5 6300 1999 5
6 8 3400 1999 6
6 6 3800 1998 6
9 7 3400 1998 7
9 6 3600 1999 7
10 15 1400 1999 8
10 8 3000 1998 8
6 9 3000 1998 9
6 7 3500 1999 9
10 10 1900 1998 10
10 10 2000 1999 10
3 12 1800 1999 11
3 11 1800 1998 11
6 12 1800 1998 12
6 9 2200 1999 12
6 14 1600 1999 13

This data is a panel data set for:
  • Sibcode: industry code
  • Years: 1998 – 1999
  • Rank: 1- 500
  • Wealth: total amount of wealth
  • ID: individual person
Dependent variable: rank
Independent variable: wealth
Group: industry (=sibcode)

Because this data contains groups in the form of industries, I decided to use multi-level regressions to analyse it.

After many steps this gave me: the constant, slope of x1(wealth), u2 and u1 for every industry group. I listed these results using the list command.
This gives me, for every industry, rank=_constant + B1 * Wealth.

To answer my research question, is it sufficient to compare: rank=_constant + B1 * Wealth

OR

Can stata compare these estimates and rank them automatically?

Kind Regards

Generating variable that assigns a number for each duplicating sets of variables, within each group of study participants

$
0
0
Dear all,

I'm trying to generate a variable which will group a subset of observations within a group.
For example, my dataset looks something like this:
ID var1 var2
1 a 1
1 a 2
1 b 1
1 b 2
2 a 3
2 a 4
3 b 5
3 b 6
3 b 7
3 c 5
3 c 6
3 c 7
So var2 was duplicated for each new value/set of var1.
How do I generate a new variable that will assign a number (starting from 1) for each unique var1 within the same ID? So my final data will look like this:
ID var1 var2 var3
1 a 1 1
1 a 2 1
1 b 1 2
1 b 2 2
2 a 3 1
2 a 4 1
3 b 5 1
3 b 6 1
3 b 7 1
3 c 5 2
3 c 6 2
3 c 7 2
I have tried coding it like this because the by command is not allowed with egen

foreach x of varlist id {
egen var3=group(`x' var1)
}

However, with that code, the var3 generated does not restart from 1 for each new ID, but will instead continue in sequence for every new value of var1 like this:
ID var1 var2 var3
1 a 1 1
1 a 2 1
1 b 1 2
1 b 2 2
2 a 3 1
2 a 4 1
3 b 5 2
3 b 6 2
3 b 7 2
3 c 5 3
3 c 6 3
3 c 7 3

Any help on how to achieve my objective will be greatly appreciated. Many thanks.

Ray


Computing median odds ratio after meglm (or any other mixed command)

$
0
0
Hello,
Could anyone provide some advice about how to compute a median odds ratio after running meglm or any other mixed command? I found a package called xtmrho that is designed to do this, but it produces no output, which to me suggests that it was designed for an earlier version of Stata.
Thanks
Chris

Count models versus fractional response models

$
0
0
A student asked me the following Q that I only know part of the answer for:

for my project, i am running a model that looks at crime in a given tract. So far i have been running these models with tract population as the exposure option. But how does the exposure option differ from just controlling for tract population size in a model? or from just making my dependent variable a per capita variable (i.e. muders per capita)?
With a count model, the exposure option has the effect of entering the log of the exposure variable into the model and fixing its coefficient at 1. But what about analyzing per capita rates instead, e.g. # homicides / population size? That sounds like the sort of thing you would use a fractional response model for.

I am inclined to use count models, partly because there seem to be a lot of options (poisson, nbreg, zero-inflated, etc.) But are there other advantages or disadvantages to the two approaches?

Creating a unique value after aggregating data

$
0
0
Hello Stata users,

My goal is to run xtreg on monthly data.

1. The original data set is at the hourly level. In this set, there are four columns for time: year month day hour. I have created a date_time variable that combines those four -FYI

2. I have aggregated the data to the monthly level, using the bysort year month : egen var_2 = total(var) function. However, at this point I do not have a unique identifier and I cannot set the panel to month.

Question: Is there a way to export the single aggregated value, which will give me the option to run an xtreg?

All the best,
Icebetty

2spls CHECK IDENTIFICATION (i.e. Exclusion criteria)

$
0
0
"Dear Statalist Users
I am estimating a set of simultaneous equations as below:
Y=a0 + a1E+ a2X + e
E= b0 + b1Y+ b2X+v
Where E is a dichotomous variable and Y is a continuous variable. X is a set of exogenous variable.
I want to estimate this system of equation using the Two Stage Probit Least Square (2SPLS) method. The explained method for 2SPLS in literature is that we first estimate Y and E equations using exogenous variables and use their predicted values as instruments for second stage regression. However, when I implement cdsimeq in stata, I get an error message that advice me to check for exclusion criteria. At this point I am not sure if I need to use, beside exogenous variables (X), instruments for Y and E in the first stage. I went through several papers that use 2SPLS but did not find clear explanation for if they have used separate instruments .
Your response will be highly appreciated."

-Dadhi Adhikari

How to make Graph for Seasonality Data

$
0
0
Dear Statlist,

​I am trying to make figure for seasonality data. I have households cross-section data over 5 quarters of two years, e.g. two quarter of 2012 and three quarter of 2013 ( each quarter have 2800 households). My purpose is with 5 variables (i.e. in case 5 products) which indicate the quantity consumed of households over seasons to show seasonality on quantity consumption of households during season.

I have followed the this journal
PHP Code:
http://www.stata-journal.com/sjpdf.html?articlenum=gr0025 
however still didn't get the point how to do it.

I am sorry for my ​inconvenience but I high appreciate if someone could to give me idea, how to show these seasonality quantity food -consumed in Stata?

More less, I would like to do somehow similar to this figures:

Array


or

Array





Thank you in advance



Replacing dates manually (

$
0
0
Hello,

Wondering if anyone could help me. I've got an int variable "datesxonset" in format %td. The dates look like "09jan2015". However, some observations are missing that information so "datesxonset" is missing. However, a second data source has the data so I can replace the information. I've tried this:

replace datesxonset = 18dec2014 if id=="COUNTRY-YR-####"
replace datesxonset = 12/18/2014 if id=="COUNTRY-YR-####"
Both of those without and without quotes around the date proposed. None work. The error message is "invalid syntax".

I've tried this as well: gen datesxonset2 = date(datesxonset, "MDY") and that didn't work.

How can I manually replace the variable "datesxonset" with the dates that I have from the second source?

Thanks!


Quantile Regression for Panel Data

$
0
0
Hi, I am trying to estimate a quantile regression with panel data following Ivan A Canay (2011) A simple approach to quantile regression for panel data, Econometrics Journal, volume 14, pp. 368–386. Unlike Koenkar (2004), this paper uses a two stage GMM estimation.

I have estimated once this fixed effect quantile regression with R using Koenkar methodology. But this has a lot of limitations. This is why, this time I am trying this Canay methodology.

Unfortunately I do not find any Stata codes for this, though there are few literatures which report to have done this using Stata. It would be a great help if anybody has any clue where I can find this Stata codes for fixed effect quantile regression following Canay (2011).

Ujjwal Kumar Das
Leeds University Business School, UK

Rolling and xtreg return only missing values for beta estimate

$
0
0
Dear all, I am new to Stata so apologies in advance if my questions are basic. I have been working on this problem for hours and can't find a solution so any help would be welcome.

I have a panel data set with the monthly returns of the Fama French 48 industries (ff_return) from 1982 to 2015. I would like to regress them on the monthly return of an index as the independent variable (crsp_return) so that I get a beta coefficient for every industry in every month.
The code I am using (below) runs and sets up the right table in betas.dta, however instead of the betas I get only "." in every cell. Same for the standard errors.

. xtset FF_Code date, monthly
panel variable: FF_Code (unbalanced)
time variable: date, 1982m1 to 2015m9, but with gaps
delta: 1 month

rolling _b _se, window(60) saving(betas, replace): xtreg ff_return crsp_return

When I run the code, i already get the x's below in the message window, which I think mean that the calculations cant be done for some reason by Stata:

. rolling _b _se, window(60) saving(betas, replace): xtreg ff_return crsp_return
(running xtreg on estimation sample)

-> FF_Code = 1

Rolling replications (345)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 50
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 100
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx--Break--
r(1);

end of do-file

My input data looks like the below for all 48 industry codes.

FF_Code crsp_return ff_return date
1 -,7609566 0 1982m1
1 -5,53736 3,432735 1982m2
1 -,8042108 2,505121 1982m3
1 5,975481 -,5749969 1982m4
1 -1,767752 -7,013632 1982m5
1 -2,266344 -10,49334 1982m6
1 -2,39196 2,459984 1982m7
1 10,63346 -8,555081 1982m8
1 2,140129 1,729964 1982m9
1 15,09451 14,96911 1982m10
1 9,641202 4,372953 1982m11
1 2,824253 20,1565 1982m12
1 4,990622 -,0727161 1983m1
1 5,347841 9,523816 1983m2
1 4,14886 -3,479138 1983m3
1 6,266451 1,0778 1983m4
1 5,637278 2,93434 1983m5
1 4,196735 6,772766 1983m6

Any help on this would be highly appreciated!

Is there an option to use all defined constraints in a regression?

$
0
0
I am trying to run a regression using sureg, and I have defined my constraints such that the program is flexible to changes in the variables I wish to use and there are no constraints that I do not want to use in the regression command. Along the lines of making the program flexible, I would like to be able to call all constraints in the sureg line, so that I do not need to update this line if the number of constraints I have changes. To clarify, my code currently looks like this:

Code:
 sureg wbrand1 wbrand2 wbrand3 wbrand4  = lnxp p* , constraints(1-45)
but I would like to be able to do something like this:

Code:
 sureg wbrand1 wbrand2 wbrand3 wbrand4  = lnxp p* , constraints(all)
.

Obviously, this second version is not actually working, I only include it to give an example that illustrates what I am hoping to do. Thanks for the help!

label values from a "dictionnary"

$
0
0
Hi Statalist

I use Stata 14 on Mac (and more frequently Stata in batch mode from shell)

I have two datasets (examples below)


THE FIRST ONE:
ID CODE1 CODE2
1 10 11
2 12 12
3 13 14
and the SECOND ONE:
CODE EXAM
10 x-ray
11 ct-scan
12 mri
13 surgery
Is there any mean to replace CODE* in the first dataset using the "dictionary" provided by the second one.
I tried merge, update, but it does not work (a posteriori, it was obvious).

Of course, the real number of codes (10,000), the number of observations (30,000) and the number of repeated code (60) preclude me to do that by hand.
Thank you for your welcome help!
Best

perfect prediction in -heckman, select()-

$
0
0
Do heckman / heckprob attempt to identify perfect prediction in the selection equation? From the output, it does not seem like they do; logit or probit output would have said something like "blah predicts success perfectly; it is dropped and this many observations not used". However heckman does not say that.

I think I am running into the issue with that, as Heckman model fails to converge (without the -difficult- option) or produces coefficients like 5 with a standard error of zero for a dummy variable in the selection equation. As normal(-5) is about the same as c(epsfloat), I suspect maximization just sends that parameter to a large enough value for the likelihood not to change... rather than attempting to remove it the way logit or probit do.

Gini decomposition grouped data

$
0
0
Dear Statalists,

I'm new to the forum. I'm using Stata 13 to estimate Gini coefficients for grouped data. My data is organized as follows: I have the number of farms (varname "farms") and the number of hectares (varname "total_land) for 37 districts (varname "prov_n") for 5 years (1955;1965;1975;1997 and 2007) in Chile. In each district I have the distribution of these 2 variables for 10 farm size categories (i.e., 1-5 has, 5-10 has, 10-20 has, 20-50 has, 50-100 has, 100-200 has, 200-500 has, 500-1000 has, 1000-2000 has and >2000 has). The number of farms (population) per size category is not equality distributed . I'm using the command ginidesc (user-written command by Aliaga and Montoya) to get one coefficient per district for each year (e.g., ginidesc total_land if year==1955, by(prov_n) ). I would really appreciate if you could give me some feedback on the use of this command for my case, maybe you have some experience with this type of data.

Thank you very much to all !!

Best

Consuelo Moraga

Problems with parmest using loop

$
0
0
Hi.
I am trying to run models using loops, and I want to store estimates and p-values in stata so that I can select which model I want to keep/discard. I did some research on statalist, and I found that the command "parmest" does the trick. However, parmest only store the most recent result. Since I use loops, how can I use parmest to store all results of each model ran in the loop.
Here is the code I was trying to do.
Code:
local patents "pat cit wpc pat5 wpc5" // choose 1
local n_size "n_at n_emp l1yn_at l1yn_sale l2yn_at l1yn_emp l2yn_emp l2yn_sale" // choose 1
foreach var1 of varlist `patents' {
    foreach var2 of varlist `n_size' {
        wls0 n_car h2 h3 h4 h5 `var1' `var2' , wvar (event_date) type(e2)
        parmest, format(estimate %8.2f p %8.1f) saving(model1.dta)
    }
}
It seems that parmest will create a new dataset each time it stores. It seems that I need to "append" each dataset if I want to combine all results into a single dataset. Unless, you have other suggestions.

All suggestions would be really appreciated. Thank you.
Best,
Pawinee

"no observations" in unpaired t-test from two datasets

$
0
0
Hello,

I have a very urgent problem with my data analysis. I am doing a comparative analysis for my studies.
I want to compare one question from the Eurobarometer over time (2007 and 2015) for the participants in Germany and Greece, meaning I have TWO separate datasets which I have appended. Therefore I created two variables for each country, one only showing the observations from Germany (resp. Greece) in 2007 and one showing the observations from Germany (resp. Greece) in 2015. What I want to do is a simple mean-comparison, for which I'd need a ttest (unpaired I guess). But it says "no observations" when I try to do the t-test.

My syntax is ttest DeutschEuropa07, by(DeutschEuropa15)

Is there anything I do wrong?

Thanks for your help!

reg3 - r(2001) Insufficient observations error

$
0
0
Dear Statalists,

I try to run a system of two simultaneous equations imposed on Monte Carlo generated data using the 3sls method. The problem is, while I have set the number of observations at 50000000, the simulation stops with error message of: "insufficient observations - an error occurred when simulate executed my3sls".

I would appreciate if some one could help me with this. The code is:


#delimit ;
clear all ;
set seed 10101 ;

* Set the values of the parameters

global numobs = 50000000 ; /*numobs = sample size*/
global numsims = 2000 ;/*numsims = replication number*/

scalar beta12 = 1.5 ;
scalar beta21 = 1.8 ;
scalar gamma11 = 1.5 ;
scalar gamma12 = 0.5 ;
scalar gamma21 = 1 ;
scalar gamma32 = 2 ;

capture program drop my3sls ;
program my3sls, rclass ;
version 13 ;
drop _all ;
set obs $numobs ;
tempvar x1 x2 x3 eps1 eps2 u1 u2 y1 y2 ;

generate `y1' =. ;/* initiate y -- all missing values */
generate `y2' =. ;/* initiate y -- all missing values */

generate `x1' = runiform() ;
generate `x2' = runiform() ;
generate `x3' = runiform() ;

generate `eps1' = rnormal() ;
generate `eps2' = rnormal() ;

generate `u1' = 1.708*`eps1' + 1.404*`eps2' ;
generate `u2' = 1.732*`eps2' ;

replace `y1' = (beta21)*(`y2') + (gamma11)*(`x1') + (gamma21)*(`x2') + `u1' ;
replace `y2' = (beta12)*(`y1') + (gamma12)*(`x1') + (gamma32)*(`x3') + `u2' ;

reg3 (`y1' = `y2' `x1' `x2') (`y2' = `y1' `x1' `x3') ;

return scalar b1 = _b[`x1'] ;
return scalar b2 = _b[`x2'] ;
return scalar b3 = _b[`x3'] ;

end ;

simulate b1=r(b1) b2=r(b2) b3=r(b3), reps($numsims) saving(results, replace) nolegend nodots: my3sls ;

use results, clear ;
summarize ;




Thanks in advance,
Homa

Save matrix with Matsave: syntax error

$
0
0
Hi,

I know this is very simple, but I'm experiencing difficulties to save matrices with "matsave" (ssc install matsave). Where am I getting it wrong? Thanks!

Code:
sysuse auto.dta, clear

mkmat price mpg rep78 headroom weight

matsave price mpg rep78 headroom weight, replace p("Save_matrix") saving

metan - a replacement for dp(#) - how to set the number of decimal places in output?

$
0
0
Hello all,
I am using metan in Stata/SE 14.1.

Previously, there was an option dp(#) that allowed the user to specify the number of decimal places displayed for the effect size in the forest plot. This does not appear to work now, nor can i find an alternative in output_options or forest_plot_options.

For example:
metan percentage lcl ucl, by(group) xlab(0,20,40,60,80,100) dp(1)

returns:
r(198); "option dp() not allowed"


I see that as recently as June 2015 users of Stata 12.1 were still able to use dp(#) successfully.
http://www.statalist.org/forums/forum/general-stata-discussion/general/1299384-metan-command-risk-differences-with-whole-numbers

Am I doing something wrong? Have I missed something? Is there a work-around?

Thank you.
Viewing all 65561 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>