Mata syntax: single line if statements

May 14, 2017, 8:53 am

≪ Previous: What is next after a referee rejects an instrumental variable strategy?

what is the syntactic rule that underlines the following behaviour (compile error in the first and fourth if stmts but not the second or third):

. mata: if (1==1) x=1;

unexpected end of line
<istmt> incomplete
r(3000);

. mata: if (1==1) x=1;;

. mata: if (1==1) x=1; x;
1

. mata: if (1==1) x

unexpected end of line
<istmt> incomplete
r(3000);

↧

Two-way fixed effects model

May 14, 2017, 8:58 am

≫ Next: Mata: confusing error message

≪ Previous: Mata syntax: single line if statements

Hi Statalisters,

I have a panel data which spans from 2008 through 2015 and covers 181 Italian listed family firms. My main interest is the relation between founding-family ownership and firm performance. The analysis also incorporates variables that identify CEOs as firm founders, descendants of the firm's founder, or outsiders. I would like to use a two-way fixed effects model for my regression analysis.

The paper I have read that does something similar describes the fixed effects to be dummy variables for each year of the sample and dummy variables for each two-digit SIC code (I would like to use ATECO 2007 Code since I am talking about Italy), and the regression they employ is the following:

Firm Performance= δ0 + δ1 (Family Firm) + δ3 (control Variables) + δ3 + δ54 (Two digit ATECO Code) + δ'93-'99 (Year Dummy Variables) + 𝛆

where
Firm Performance = ROA based on EBITDA and net income, and Tobin's q;
Family Firm = binary variable that equals one when the founding family is pre- sent in the firm, and zero otherwise; Control Variables = officer and director holdings less family holdings, fraction of independent directors serving on the board, research and development expenses divided by total sales, long-term debt divided by total as- sets, stock return volatility, natural log of total assets, and the natural log of firm age;
Two-Digit ATECO Code = 1.0 for each two-digit SIC code in our sample;
Year Dummy Variables = 1.0 for each year of our sample period."

How should I build the model on STATA?

Thank you a lot!

↧

Mata: confusing error message

May 14, 2017, 10:13 am

≫ Next: Panel data

≪ Previous: Two-way fixed effects model

when you type:
. mata: rmdir ("vendor")

you get:
could not create directory vendor
rmdir(): 693 could not remove directory
<istmt>: - function returned error

The error seems to be an extra errprintf in rmdir.mata

*! version 1.0.0 15dec2004
version 9.0

mata:

void rmdir(string scalar dirpath)
{
if (_rmdir(dirpath)) {
errprintf("could not create directory %s\n", dirpath)
_error(693, "could not remove directory")
/*NOTREACHED*/
}
}

end

↧

Panel data

May 14, 2017, 11:37 am

≫ Next: Generating new variable from residuals

≪ Previous: Mata: confusing error message

Dear Sir,

I have an unbalanced panel set for two time period with respect to Unique Person ID. To make it balance panel i tried the following command:

by IDPER : gen copies=[_N]

keep if copies==2

However, when I am browsing the data pre and post using the - above mentioned - command it is only keeping the observation for second period.

I have cross checked this with the duplicates command in stata, which checks for the list of duplicates with respect to Person ID. I have used the following command:

duplicates list IDPERSON

It states the data set has 4456 groups, meaning we have two time period information for 4456 person. Thus my panel shall have 8912 observation, however, after using the copies command I am left with only 4456 persons i.e. I have only second period observation.

HOW CAN I CREATE A BALANCED PANEL SET.

↧

Generating new variable from residuals

May 14, 2017, 2:28 pm

≫ Next: CALIPMATCH: caliper matching without replacement. Available in SSC.

≪ Previous: Panel data

Hi All,

Can I kindly ask your help regarding a creation of a new variable from the residual of a regression command. I know it is easy and I can do it for a simple scenario, but the issue I have is to calculate the residual for each industry in each year. So, I wrote a command to run the regression separately for each industry in each year, and then another command to extract the residual in each regression.

I just want to check whether I have written the correct set of commands.

quietly: bysort Year Industry: regress Y X₁ X₂X₃ ... X_n
predict R, residuals

Thanks you all in advance, and I look forward to hearing from you

Regards,
Mohamed

↧

CALIPMATCH: caliper matching without replacement. Available in SSC.

May 14, 2017, 4:23 pm

≫ Next: Estimating weighted logit in multilevel models

≪ Previous: Generating new variable from residuals

calipmatch matches case observations to control observations using "caliper" matching and (optionally) exact matching. Controls observations matched to a case observation will have values within +/- the caliper width for every caliper matching variable. Matched observations will also have identical values for every specified exact matching variable, if any are specified. calipmatch supports 1:1 or 1:m matching of cases to controls, without replacement.

calipmatch was written in collaboration with Allan Garland, of the University of Manitoba Faculty of Medicine.

It is now available in the SSC, with thanks to Kit Baum. It can also be viewed on Github.

Details

This program allows you to perform fuzzy case-control matching, matching cases to controls that have close-but-not-identical values for caliper matching variables. You specify a "caliper width" for each caliper matching variable, and all controls matched to a case will have values within +/- that width for the corresponding variable.

Controls are randomly matched to cases without replacement. For each case, calipmatch searches for matching controls until it either finds the pre-specified maximum number of matches or runs out of controls. The search is performed greedily: it is possible that some cases end up unmatched because all possible matching controls have already been matched with another case.

calipmatch is optimized to run extremely efficiently. Exact matching is performed before caliper matching, using a sort. Caliper matching is implemented in Mata, and searches only within exact match groups. This program was created because our original caliper matching Stata code ran on our problem for 10 days without completion. The version of calipmatch now available to you on SSC completed our matching problem in under 5 minutes.

↧

Estimating weighted logit in multilevel models

May 14, 2017, 5:12 pm

≫ Next: Multilevel Analysis Using Complex Survey Data -- Analysis of Subset of Data and gllamm

≪ Previous: CALIPMATCH: caliper matching without replacement. Available in SSC.

My model looks like
gllamm y x, i(year cohort) link(logit) fam(binom)

where y is a dummy variable.

Now I want to add an analytical weight to the logit model (weighted logistic regression). However, I realize that there is no option in gllamm to add analytical weights. How could do it in Stata?

Thanks!

↧

Multilevel Analysis Using Complex Survey Data -- Analysis of Subset of Data and gllamm

May 14, 2017, 6:45 pm

≫ Next: Model convergence

≪ Previous: Estimating weighted logit in multilevel models

Hello,

I am new to multilevel analysis, and am learning about weighting data and the Stata program gllamm. My question here is about subsetting my data.

I am planning a multilevel analysis to understand if variation in state-level measures of structural stigma against persons with mental illlness (operationalized as state availability of services, state mental health expenditures, etc.) predicts health outcomes in persons with mental illness. The individual-level data I am using comes from BRFSS, a complex survey that collects state-specific health data. Within the BRFSS, the mental health data I am using comes from an optional module, so I am only analyzing data from the subset of states that selected this module (25 states). Further, I am interested in the subset of individuals with mental illness.

My understanding is that I will need to use the Stata program gllamm in order to appropriately weight my data, but that gllamm does not support the subsetting of data. I would appreciate any advice or references to materials that inform me about best practices in this area.

Thank you.

↧

Model convergence

May 17, 2017, 9:07 am

≫ Next: Histogram axis ??unrelated to data

≪ Previous: Multilevel Analysis Using Complex Survey Data -- Analysis of Subset of Data and gllamm

Dear STATA user fellows,

I just want to report something that I find a bit strange happening. Not sure if I should worry about my model. My model looks like:

nbreg depvar x1 x2 x3...x8 i.year i.country, vce(cluster country)

My model does not converge even if I use "difficult" option or other techniques, e.g bfgs
However, if I move x3 in front of x1, i.e

nbreg depvar x3 x1 x2 ...x8 i.year i.country, vce(cluster country)

then, the model is converging!

Any idea as why this happens? I assume it has to do with different initial values, but it still I thought that x1...x8 sequence should not matter.
Ioannis

↧

Histogram axis ??unrelated to data

May 17, 2017, 9:18 am

≫ Next: can't re-fit a model that I previously fit: advice on setting initial values

≪ Previous: Model convergence

I wonder if someone could explain to me where Stata is finding my y axis when I plot this histogram? I want to to read N with whole numbers, or % of total. I know how to change the actual titles for the axis, but not to get Stata to recognise the graph to be N or % of total.
This is to plot the spread of ages in a study of 607 patients.
This is the beginning of the table:
. tab Age_num

Age | Freq. Percent Cum.
------------+-----------------------------------
9 | 1 0.17 0.17
19 | 1 0.17 0.34
21 | 1 0.17 0.50
23 | 1 0.17 0.67
25 | 1 0.17 0.84
27 | 1 0.17 1.01
28 | 1 0.17 1.17
29 | 3 0.50 1.68
33 | 2 0.34 2.01
Then with histogram Age_num I get this graph, it is the red box that is my problem and I don't understand why Stata is showing me this - neither the frequency nor the Percentage correlates to a peak of just over 0.04??? Many thanks for any explanations
Array

↧

can't re-fit a model that I previously fit: advice on setting initial values

May 17, 2017, 9:30 am

≫ Next: problem merging variables

≪ Previous: Histogram axis ??unrelated to data

Hi:

I have a problem I'm hoping someone can help with...

I previously fit a multilevel binomial model using 'meqrlogit'. I stored the estimates after fitting the model, and as I haven't ended the Stata session (i.e. I haven't turned my computer off for a few days!), I can restore the estimates and make predictions etc.

To fit the model, I went through a number of iterations, storing the matrix e(b) and using this as initial values when fitting the next model (i.e. after fitting a model I used 'matrix b = e(b)', and I suffixed the subsequent model with 'from (b, skip)').

The problem I have is that when writing the do-file, I jumped back-and-forth a few times as I thought of additional model configurations I should test. This was foolish, as the order of models in the do-file isn't the order in which I initially fit the models. And now, I'm unable to re-fit my final model as I can't work out the sequence of e(b) matrices I used to fit it: I keep getting the message 'initial values not feasible'.

This means if I turn off my computer, I won't be able to refit the model.

I realise you can save the estimates as an 'est' file using 'estimates save', but if I load these into a new Stata session I can't make predictions with the model (which is my main purpose).

I can see the matrix e(b) from my final model, but I don't know if it's possible to manually replicate this matrix to use it as starting values for re-fitting the final model. If I could manually replicate e(b), presumably I could then easily re-fit the model.

Does anyone know if there's a way to either:

- save the model estimates in such a way that I can load them into a new Stata session and make predictions? (bearing in mind this is a multilevel model)

- manually create a matrix that could then be used as initial values to refit the model in a new Stata session

Any help would be greatly appreciated!

Cheers

↧

problem merging variables

May 17, 2017, 9:33 am

≫ Next: descriptive statistic table

≪ Previous: can't re-fit a model that I previously fit: advice on setting initial values

I have 5 different variables with different data for each observation. these 5 variables represent years 2011-2015.

How can I group them as One variable named "year" and 5 different labels? is this even possible?

↧

descriptive statistic table

May 17, 2017, 9:34 am

≫ Next: correlation table

≪ Previous: problem merging variables

Hi all,

I would want to create a table, that can be exported in word file, that shows the mean, sd and also percentiles of my variables. These values have to be shown in the same line but in different columns.

Doing the following commands I just get the mean and sd, and the sd below the mean:

summarize TotalAssets Revenue CMarkCap Equity ROA_0
estpost summarize TotalAssets Revenue CMarkCap Equity ROA_0
eststo summstats
esttab summstats using table4.rtf, replace main(mean %6.2f) aux(sd)

Thank you!

↧

correlation table

May 17, 2017, 9:47 am

≫ Next: Error r(123) when running a marginscontplot

≪ Previous: descriptive statistic table

Hi all,

I would want to create a table, that can be export in word file, that shows the correlations between all the variables.
More precisely, I would want to create a table that replicate the result I get when I run the following command:
pwcorr y x1 x2 x3 x4, sig star(.05)

Thank you!

↧

Error r(123) when running a marginscontplot

May 17, 2017, 10:13 am

≫ Next: Meta-Frontier Analysis in Stata: Estimation

≪ Previous: correlation table

Hey STATALIST

I am working on an assignment, where i am examining Public Service Motivation's (PSM) effect on the likelihood that individuals would want to work in the public sector.
PSM is at continuous variable and public sector is a dummyvariable denoted off (0="does not want to work in the public sector" and 1="Wants to work in the public sector").
I am using the following logit command: logit off PSM i.v71_1 i.sektor_* i.køn v25 i.v59_*
And then a margins command: margins, dydx(*)

But when a problem occurs, when I want to make a graphical representation with the following command: marginscontplot PSM, ci
I get the following error messages, and STATA says its has to do with a r(123) error that is "invalid numlist has too many elements".

Are there a way around this problem?
Array

↧

Meta-Frontier Analysis in Stata: Estimation

May 17, 2017, 10:27 am

≫ Next: Markers on scatter plot overlapping the labels

≪ Previous: Error r(123) when running a marginscontplot

Does anyone know how to estimate a Meta-Frontier Production Function in Stata? Any syntax ideas would help. The only work I have seen so far about Meta-frontier analysis has been in Shazam Software. Thanks.

↧

Markers on scatter plot overlapping the labels

May 17, 2017, 11:05 am

≫ Next: Need to Calculate yearly "t" value for monthly rolling beta's of multiple firms

≪ Previous: Meta-Frontier Analysis in Stata: Estimation

Hi I'm trying to produce a scatter plot but unfortunately the markers in the diagram overlap some of the labels of other markers. Changing the position of the label relative to the marker will not help because there are markers at any degre around the labels ...
Therefore I want to set the labels to be above all the markers (and not only above some of them like can be seen in the example picture below). How can this be done?

This is my command:

graph twoway (scatter Etn psychometric, mlabel(labels) mlabv(pos) mlabcolor(grey)) (lfit Etn psychometric), graphregion(color(gs16)) legend(size(small) label(1 "Majors") label(2 "Linear fit")) /// ytitle("Share of Etn Applications" " ", size(small)) xtitle(" " "Mean Psychometric of Applicants" " ", size(small)) title("Share of Etn Applications and Mean Psychometric" "For Each Major") /// subtitle("First choice, age <= 30")

Array

Thanks, Ami

↧

Need to Calculate yearly "t" value for monthly rolling beta's of multiple firms

May 17, 2017, 11:22 am

≫ Next: Cluster regression or mixed-effect model

≪ Previous: Markers on scatter plot overlapping the labels

Dear All,

I am converting my Monthly Rolling Beta's of multiple firms into yearly average value Beta for all firms, and that is going well with these codes

Code:

gen year = year(dofm(month_year))
format year %ty
collapse beta_mkt beta_w, by (id_firm year)
collapse beta_mkt beta_w, by (year)

But Now problem is that i also want to report t stat for each yearly beta by using formula {beta/sd(beta)sqrt(N)}, but unable to code this. please help in this regards.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(year beta_mkt beta_w)
2000 .8671392 -.010338334
2001 .7339575 -.019243315
2002 .7249059    .0655215
end
format %ty year

I just want that with each beta value of t is also reported. please help

↧

Cluster regression or mixed-effect model

May 17, 2017, 11:22 am

≫ Next: Warning: variance matrix is nonsymmetric or highly singular

≪ Previous: Need to Calculate yearly "t" value for monthly rolling beta's of multiple firms

I would ordinarily have chose mixed-effect/multilevel models in this example:

...dataset contains data on 400 schools that come from 37 school districts. It is very possible that the scores within each school district may not be independent, and this could lead to residuals that are not independent within districts. We can use the cluster option to indicate that the observations are clustered into districts...(https://stats.idre.ucla.edu/stata/we...-4-beyond-ols/)

But the tutorial suggests cluster regression. What is the difference between cluster regression and mixed/multilevel models? And when should I use each?

↧

Warning: variance matrix is nonsymmetric or highly singular

May 17, 2017, 11:58 am

≫ Next: Exporting the results of a sorted sum command to a table

≪ Previous: Cluster regression or mixed-effect model

Hi All,

I have a panel data with retailer, year, host country, home country, and store format type fixed effects. I using the xtscc command with the fe option to control for serial correlation and heteroskedasticity. I run the following model:

xtscc lnbannersales_store sfd lncheckouts countries_regions sqcountries_regions cubecountries_regions foreignmarketgrowth intspeed globecultdist lnpoldist lnhomeregionsales poldistcountriesregions sqcountriesreg_poldist cubecountriesreg_poldist globecultdist_countriesregions sqcountriesreg_globecultdist cubecountriesreg_globecultdist, fe

But I get the following error message: Warning: variance matrix is nonsymmetric or highly singular.

When I drop the fe option, I don't have this problem. Any ideas what the problem could be?

Thank you!

↧