Stata and LIMDEP Yield very different results

March 15, 2016, 5:34 am

≫ Next: Variance/R-squared decomposition at annual intervals in time series data

≪ Previous: how to save output directly in wide format?

I estimate a trivariate probit model in stata using the user written mvprobit command. The results (both coefficients and standard errors) are very different from that obtained using LIMDEP. The difference is indeed shocking. Does anyone know why this might be the case?

↧

Variance/R-squared decomposition at annual intervals in time series data

March 15, 2016, 5:39 am

≫ Next: The mysterious J function and the magic number 3

≪ Previous: Stata and LIMDEP Yield very different results

Hi,

I have a regression of TFP growth on 5 independent variables. All variables are time series. I'm looking to determine the extent to which the independent variables influence TFP, the dependent variable, at certain points in time. I have attempted a basic R-squared decomposition: http://www.uni-leipzig.de/~rego/ though this only yields observations at one point in time. I have heard that a variance decomposition may also be suitable and have attempted implementing a Basic Variance model though have been unsuccessful thus far.

Any help would be greatly appreciated.

Thanks,

Ralph

↧

The mysterious J function and the magic number 3

March 15, 2016, 6:12 am

≫ Next: counting under conditions

≪ Previous: Variance/R-squared decomposition at annual intervals in time series data

Hi
I'm trying to develop a sort of an iterator - generate_numbers.
When it gets a #.# value the integer value is repeated the decimal number of times.
And it can handle vectors as well.

So generate_numbers((1, 2.2, 3.3)) should return 1,2,2,3,3,3

But something strange happens when I want to repeat a number 3 times (I only get it repeated 2 times!!)
I've tried a lot of other things without luck eg assigning rep[r] before it's use.

Code:

:         mata clear

:         mata set matalnum on

:         //mata set strict on
: 
:         function generate_numbers(real rowvector base)
>         {
>                 real rowvector values, rep
>                 real scalar r
>                 
>                 values = floor(base)
>                 rep = (d=10 * (base - values)) :+ (d :== 0)
>                 out = J(1,0,0)
>                 for(r=1; r<=cols(values); r++) {
>                         out = out, J(1, rep[r], values[r])
>                 }
>                 return(out)
>         }

:         
:         x = generate_numbers((3.2, 6.3, 5, 7.4))

:         // Should have 6 3 times, but only 2 occur
:         1..cols(x) \ x
       1   2   3   4   5   6   7   8   9
    +-------------------------------------+
  1 |  1   2   3   4   5   6   7   8   9  |
  2 |  3   3   6   6   5   7   7   7   7  |
    +-------------------------------------+

:         // 6 4 times is no problem
:         x = generate_numbers((3.2, 6.4, 5, 7.4))

:         1..cols(x) \ x
        1    2    3    4    5    6    7    8    9   10   11
    +--------------------------------------------------------+
  1 |   1    2    3    4    5    6    7    8    9   10   11  |
  2 |   3    3    6    6    6    6    5    7    7    7    7  |
    +--------------------------------------------------------+

:         // Should have 5 3 times, but only 2 occur
:         x = generate_numbers((3.2, 6.4, 5.3, 7.4))

:         1..cols(x) \ x
        1    2    3    4    5    6    7    8    9   10   11   12
    +-------------------------------------------------------------+
  1 |   1    2    3    4    5    6    7    8    9   10   11   12  |
  2 |   3    3    6    6    6    6    5    5    7    7    7    7  |
    +-------------------------------------------------------------+

The only thing I can find is that it is the J function that is the problem.

The J function works in almost similar cases like:

Code:

: values = 1..5

: repeat = 5..1

: for(r=1; r<=cols(repeat); r++) J(1, repeat[r], values[r])
       1   2   3   4   5
    +---------------------+
  1 |  1   1   1   1   1  |
    +---------------------+
       1   2   3   4
    +-----------------+
  1 |  2   2   2   2  |
    +-----------------+
       1   2   3
    +-------------+
  1 |  3   3   3  |
    +-------------+
       1   2
    +---------+
  1 |  4   4  |
    +---------+
  5

:

Is there a workaround for this?

↧

counting under conditions

March 15, 2016, 6:46 am

≫ Next: Marginal effects - Probit

≪ Previous: The mysterious J function and the magic number 3

Hi everybody,

I hope someone can help me.

I have a big data set and it Looks like:

A B
100 2
100 1
100 3
105 1
108 3
108 3
108 1
108 2
110 3

how can I count the numbers in column A only if in B is a 1 and a 2 and a 3?
So in this example the number would be 2 because of 100 and 108.

I hope you understand my Problem and can help me.

Thank you, Lea

↧

Marginal effects - Probit

March 15, 2016, 7:58 am

≫ Next: Hausman test - number of variables

≪ Previous: counting under conditions

Hi all,

I did a probit regression (dependent (binary) variable: withdrawal or not) and now want to get the marginal effects to better interpret the model (I am using Stata 13.1).

I used
. mfx compute
but realized that it is slightly old and instead wanted to use
. margins, dydx(*)

Since I got two different results, I was wondering which command is the correct one.
What I want to get is the change in the withdrawal probability given a change in the independent variable.
I also have dummy variables in my regression and am not sure if I need to consider this when calculating the marginal effects.

Any help on the topic is appreciated!

↧

Hausman test - number of variables

March 15, 2016, 8:52 am

≫ Next: Working with multi-row common characteristics

≪ Previous: Marginal effects - Probit

Dear all,

I am currently trying to fit a clogit model, and I would like to test the IIA with the Hausman test.
My problem is currently that in a first version of the model which was set up to estimate mainly the effects of the alternative-specific attributes and a few effects of interaction terms with case-specific (soci-economic) attributes, the Hausman test told me to reject the null hypothesis (that there is no significant difference between the full model and a model with one alternative less).
Then I included more interaction variables and performed the Hausman test again, and now the test results tell me that the null hypothesis cannot be rejected.

Am I doing something wrong here? How do I decide which model to use for the Hausman test?

Thanks a lot!

Kind regards,
Cordula

↧

Working with multi-row common characteristics

March 15, 2016, 9:25 am

≫ Next: Regression with interaction variable

≪ Previous: Hausman test - number of variables

Dear All,

I am trying to work with all the individuals (PID) from households with father-son pairs present, in the following data(example):

bhid	pid	hhrole	childsamp	hhheadsamp	pair	sex
1	1	1	.	1	1	1
1	2	2	.	.	.	2
1	3	3	1	.	1	2
2	4	1	.	1	1	1
2	5	2	.	.	.	2
2	6	3	1	.	1	1
3	7	1	.	.	.	1
3	8	3	1	.	1	2
4	9	3	.	.	.	2
5	10	1	.	.	.	1
5	11	3	1	.	1	1

Where:

bhid = household id
pid = person id
hhrole = person role in the household
childsamp = if there is a child present in the household for both periods
hhheadsamp = if there is a parent present in the household for both periods
pair = marker for either childsamp or hhheadsamp
sex = gender of the individual.

So for example, in this chart, I would like to keep all individuals (PID) from any HH that has childsamp==1 & hheadsamp==1. (A key distinction is that I need both conditions to hold at the same time for a given household)

The problem is that childsamp and hhheadsamp will never be 1 at the same time in the same row.

So if I manage to get the right code, I should end up with the following list of individuals(PID): 1,2,3,4,5,6. Since in households (bhid) 1 and 2 there is a childsamp==1 and a hhheadsamp==1 at the same time.

So it is a problem of multi-row conditions, which I am not sure how to explore.

Any ideas of possible functions to explore or alternative coding, would be highly appreciated.

Kind regards,
Patricio.

↧

Regression with interaction variable

March 15, 2016, 9:56 am

≫ Next: Survival Analysis: Exclude or Right-Censor 'Alternative States'

≪ Previous: Working with multi-row common characteristics

Hello,

Is it possible to have a regression where Export in Goods=technological approach+Human Capital*Year of Schooling. Therefore, increase in year of schooling would increase human capital that enhance labour productivity and eventually lead to an increase in Export in Goods. Does this make sense?

Human Capital*Year of Schooling is the interaction of two continuous variables.

Thanks in advance!

Jack

↧

Survival Analysis: Exclude or Right-Censor 'Alternative States'

March 15, 2016, 10:06 am

≫ Next: How do I add a caret over a Greek letter in graph text?

≪ Previous: Regression with interaction variable

Dear Stata-Community, I have a logical rather than a technical question. I am analyzing transitions from long-term unemployment (> 24 months) to full employment using survival analysis (Cox model). In my dataset, there are also other possible states that could follow long-term unemployment (e.g. education, training, retirement, etc.). I am now wondering how I should handle those alternative states. I am not interested in a competing risk model but only want to analyze the causes of (successful) transitions from long-term unemployment to full employment. Most people would suggest that I should right-censor those alternative states. However, I tend to exclude (drop) all alternative transitions from the analysis since I am only interested in successful labor market transitions, and for example assume that 65-year-old unemployed persons - due to the prospect of retirement - "behave" different from a 30-year-old unemployed.

What would you recommend, or could you point me to a paper that discusses the pros and cons of right-censoring vs. exclusion. Unfortunately, I did not find anything.

Thanks,
Adam

↧

How do I add a caret over a Greek letter in graph text?

March 15, 2016, 10:38 am

≫ Next: Calculating Cronbach's alpha with systematically missing values

≪ Previous: Survival Analysis: Exclude or Right-Censor 'Alternative States'

In Latex, \hat{\beta} displays beta with a caret above it. How can I do this in the text on a graph? I know I can put {&beta} to put the Greek letter into title, subtitle, etc. using SMCL, but I don't see anything in the manual about formatting like this, even in -help graph_text##smcl-. Is this possible?

↧

Calculating Cronbach's alpha with systematically missing values

March 15, 2016, 10:52 am

≫ Next: computing the median, range of the median and testing if difference is significant

≪ Previous: How do I add a caret over a Greek letter in graph text?

Dear Statalist,

I have a rather obscure need that I suspect has a straightforward solution.

I am trying to calculate Cronbach's alpha for a psychometric scale. Suppose I have one scale comprising five items, v1-v5, with non-missing values denoted as "x" -- e.g.:

Code:

v1   v2   v3   v4   v5
 x    x    x
      x         x
 x         x         x
 x    x         x    x
           x    x    
 x    x              x
           x    x    x
 x              x
      x    x         x

The missingness is systematic in that the age of the respondent determines which items are administered.

Is there a way to calculate a single alpha for these items (e.g., a set of alphas for each combination of non-missing variables, which is then averaged)? At present the command won't run because there is no observation that is non-missing for all five items. The age range is 60 years, and the age bands are different for each item, so I would like to avoid doing this manually with a large number of commands along the lines of:

Code:

alpha v1 v2 if age==20
alpha v2 v3 if age==21
alpha v2 v4 if age==22
alpha v2 v5 if age==23
alpha v3 v5 v6 if age==24

etc.

Thank you!

↧

computing the median, range of the median and testing if difference is significant

March 16, 2016, 5:37 am

≫ Next: Insignificant constant in OLS regression

≪ Previous: Calculating Cronbach's alpha with systematically missing values

I have two variables il7 (continuous variable but not normally distributed although it has more than 30 observation), and another variable nodules (categorical variable with yes and no, and has more than 30 observations but not normally distributed). I want to compute the median of people with nodules and have il7 and the median of people without nodules but have il7.

when i use the command sum il7 if nodules==1, i am able to get the range but not the median. How do i go about getting the median of the two groups (no nodules with il7 and yes nodules with il7) plus the range of he medians, and also compare the two median to determine if the difference is significant.

↧

Insignificant constant in OLS regression

March 16, 2016, 6:14 am

≫ Next: Interpretation of results to capture dummy effect

≪ Previous: computing the median, range of the median and testing if difference is significant

Hi folks,

Over the last number of weeks I've been creating a custom dataset. In short, I have geocoded sports capital grants by the Irish government. I have also merged this data with small area statistics from the 2011 census in order to have an idea of the regional characteristics of where these clubs are based.

A lot has been examined in regards to the political bias of the distribution of these grants, with regions in which the finance and minister of sport are based doing particularly well. I'm adding to the literature by looking at other potential actors who can perhaps manipulate the system, namely heads of key sporting organisations within Ireland.

I utilise two dependent variables, namely the difference between the amount of money a club applied for, and the grant which they were awarded (as a per cent). The second dependent variable is the amount which a club received for a grant.

For my political bias measure I use the distance in km between the hometown of a minister/head of a sporting organisation to a club.

All my variables are in logarithmic form. I also utilised the augmented dickey fuller test finding no evidence of variables having a unit root. My ministerial variables are in line with what past studies have found also, leading me to believe that my data is sound. One of my issues is that in some cases, I reduce the sample to take into account extremely large grants awarded and also focus on particular sports, my constant is insignificant.

My P>F is significant though with it reading 0.000 in nearly all cases. I include dummies to take into account differences in sports, along with year dummies too.

I'm wondering is it acceptable to present results which have an insignificant constant and a large standard error?

Also would anyone recommend any other robustness tests to carry out, to ensure robust results.

Kind regards,

Sean

↧

Interpretation of results to capture dummy effect

March 16, 2016, 6:29 am

≫ Next: ROC curves for time to event data

≪ Previous: Insignificant constant in OLS regression

Hello,

I am currently doing my dissertation on how technological approach affects export in goods for developing countries. I am trying to incorporate a dummy variable for 2008 financial crisis to capture the effect of financial crisis of export in goods for developing countries.

To simplify, my regression: Export in Goods=technological approach+dummy variable for 2008 financial crisis.
My coefficient on technological approach=2.4
My coefficient on dummy variable for 2008 financial crisis= -0.2
Dummy covers a period from 2008-2010, while my data covers a period from 2002-2010.

Is it possible to interpret my regression results as below:
Before the period of financial crisis (dummy=0), GDP increases by 2.4% when there is a 1% increase in technological approach.
During the period of financial crisis (dummy=1), GDP increases by 2.2% (2.4-0.2) when there is a 1% increase in technological approach.

Thanks in advance!

↧

ROC curves for time to event data

March 16, 2016, 7:02 am

≫ Next: Quaids Aggregate Data Other Exogenous Variables.

≪ Previous: Interpretation of results to capture dummy effect

Dear All

Does anyone know of any Stata commands to create time-dependent ROC curves following a Cox regression?

Any suggestions are appreciated.

Hema.

↧

Quaids Aggregate Data Other Exogenous Variables.

March 16, 2016, 7:26 am

≫ Next: Randomized Control Trials - An Interesting Exercise

≪ Previous: ROC curves for time to event data

I am trying to estimate a system of demand equations using quaids. I am using aggregate, not household, data. In such a case is it possible to add nonprice variables such as the unemployment rate? If so, how would I do so? Thanks.

↧

Randomized Control Trials - An Interesting Exercise

March 16, 2016, 7:30 am

≫ Next: Problem with merging

≪ Previous: Quaids Aggregate Data Other Exogenous Variables.

Hello Everyone,

In an upcoming workshop, I intend to provide a demonstration of the following question: Does Randomization really ensures statistically similar samples on unobserved variables? In theory, randomization ensures statistically similar samples on both observe and unobserved characteristics.
To check this, I shall delete the gender variable from a dataset (case1) and select a number of random samples (with replacement). Since my dataset will not have a gender variable, it is considered unobserved in the given setting. I then will included the gender variable in all the samples (manually) to see if my sample gender proportions matches the gender proportions in the original data-set (one having gender variable or case1).

Assuming my original data-set is "case1", can someone please share a list of commands I can use for collecting many many such samples? only then i'll be able to prove that on average that mean of sample proportions will match population proportions.

The list of tasks is as follows:

1) Observe gender proportions in a data-set
2) Delete variable gender
3) Select a random sample
4) Add a column of gender in the sample
5) Observe gender proportions in the sample
6) Repeat

Is there a code that can help me do this in Stata for say 10,000 times? Please be informed that I do want to keep gender proportions from all samples and eventually report mean of all samples?

Thanks!

↧

Problem with merging

March 16, 2016, 8:45 am

≫ Next: 'power oneproportion' problem

≪ Previous: Randomized Control Trials - An Interesting Exercise

Hi,
I have two datasets: (1) children_mothers and (2) mothers_info
My first dataset, children_mothers looks like this:
ID_children ID_mother
1 1
2 1
3 2
4 3

and so on, it is a dataset of unique child-mother pairs.
My second dataset has panel information about mothers from years 1990-2011. I want to merge the information to the first dataset based upon ID_mothers.
However, when I tried merge m:m, it does not seem to work and only merges the information for the first child rather than repeating that information for the second child with the same mother.

Any ideas on how to overcome this?
Maybe the problem is that my first dataset is not a panel and my second one is?

Thanks for any help!
Surya

↧

'power oneproportion' problem

March 16, 2016, 9:01 am

≫ Next: Using ml command to maximise user-defined (log) likelihood function

≪ Previous: Problem with merging

Hi,

I'm trying to use the 'power oneproportion' command to calculate the sample size for a study to identify the relapse rate in a single group of subjects when treated with a new medication. I expect the proportion of relapse with the old treatment to be 0% and with the new treatment 20%. However, stata can not calculate the sample and gives this error: "null proportion must be between 0 and 1". I wonder if there is any way to fix this or use an alternative command?

Thanks,

↧

Using ml command to maximise user-defined (log) likelihood function

March 16, 2016, 9:06 am

≫ Next: tab by order of frequency across multiple variables

≪ Previous: 'power oneproportion' problem

Hi all,

To make a long story short, I've derived a new model, and I want to estimate it. I haven't done this before, but after reading the help file on the ml command, it seems pretty straightforward.

The problem: The log-likelihood includes a special function (specifically, Owen's T), which apparently cannot be implemented in Stata. On the other hand, although Stata would not be able to derive them, the first-order conditions are straightforward and easy for Stata to handle. It seems, however, that the command requires the log-likelihood function to be specified.

Any suggestions?

Alex

↧