Setting random seed not enough?

January 7, 2018, 4:03 am

≫ Next: Predictions on Panel Data Using a "forvalues" Loop

≪ Previous: Multinomial logistic model with impupted data - Determine variance inflation factor (VIF) or other measure of collinearity

I'm writing a program which uses Mata to consider large numbers of random permutations of a matrix, to search for an optimal arrangement. In testing I'm using "set seed" and "mata: rseed()" to get consistent results. From one Stata session to another this works, but if I run the program twice in the same session (with a "clear all" between runs) the random numbers eventually (but not immediately) become inconsistent.

Should I need more than setting the seed plus "clear all" to restore the state of the Stata session to the beginning, at least in terms of PRNGs?

(Stata 14.2, on two separate machines).

Example results (by iteration 3, at least one of the 2000 random permutations differs):

Code:

: pop = ga(dd, 2000)
  Iter 1  : Σ dist: min 977.377; mean: 1067.934; max: 1090.905
  Iter 2  : Σ dist: min 950.5679; mean: 1039.75; max: 1064.551
  Iter 3  : Σ dist: min 950.5679; mean: 1019.234; max: 1039.178
[...]
: pop = ga(dd, 2000)
  Iter 1  : Σ dist: min 977.377; mean: 1067.934; max: 1090.905
  Iter 2  : Σ dist: min 950.5679; mean: 1039.75; max: 1064.551
  Iter 3  : Σ dist: min 950.5679; mean: 1019.247; max: 1039.178
[...]

Code to run the main code twice:

Code:

do test.do
clear all
clear mata
do test.do

Main code (test.do):

Code:

// Clear all and set seeds
clear all
set seed 123456
mata: rseed(123456)

// Generate random data and make a dissimilarity matrix
set obs 50
forvalues x = 1/10 {
  gen x`x'=rnormal()
}
matrix dissim dd = x1-x10, L2squared

mata

real scalar fitness (real matrix distmat, real matrix permdat) {
  width = cols(distmat)
  perm = transposeonly(order(transposeonly(permdat),1))
  tempdm = distmat[perm,perm]

  // Sum minor diagonal (offset 0 gives main diagonal)
  offset = 1
  f = sum(diagonal(tempdm[1..(width-offset),(1+offset)..width]))

  return(f)
}

// Genetic algorithm function: look for a matrix permutation that minimises the fitness function
real matrix function ga (real matrix distmat, real scalar npop) {
  width = rows(distmat)

  npop = 2*floor(npop/2) // even
  nparents = floor(npop/2) // change from 50% to intensify selection
  nsurv = nparents
  nnew = floor(npop*0.20)
  ntoconv = ceil(npop*0.1)
  // Use top 50% as core. They procreate to replace 30%, and 20% is new random
  // Progress data is on top 50%

  // Create a random population, each row a permutation
  population = rnormal(npop,width,0,1)
  sigmadist = J(npop,1,.)

  // Main iteration
  iter = 0
  while (iter==0 |(max(sigmadist[1..ntoconv]) != min(sigmadist[1..ntoconv]))) {
    newpop = population
    iter++

    // Calculate fitness (lower is better)
    for (i=1; i<=npop; i++) {
      sigmadist[i] = fitness(distmat, newpop[i,.])
    }

    // Order the new medoid sets per fitness (best (lowest) first)
    newpop = population[order(sigmadist,1),][1..nparents,]
    sigmadist = sigmadist[order(sigmadist,1)]
    "Iter "+strofreal(iter,"%-3.0f")+": Σ dist: min "+strofreal(min(sigmadist[1..ntoconv]))+"; mean: "+strofreal(mean(sigmadist[1..ntoconv]))+"; max: "+strofreal(max(sigmadist[1..ntoconv]))
    displayflush()

    // Keep top 50%
    population[1..nsurv,] = newpop[1..nsurv,]

    // Crossover section
    // Top 50% used for crossover to create 30%, remaining 20% new
    for (i=1+nsurv; i<=npop-nnew; i++) {
      j = 1+floor((1-runiform(1,1)^0.1)*nparents)
      k = 1+floor((1-runiform(1,1)^0.1)*nparents)
      l = runiformint(1,1,1,width-1)
      population[i,1..l] = newpop[j,1..l]
      population[i,l+1..width] = newpop[k,l+1..width]
    }
    population[(npop-nnew+1)..npop,] = rnormal(nnew, width, 0, 1)

  }
  return(population)
}

end

mata
dd = st_matrix("dd")
pop = ga(dd, 2000)
end

↧

Predictions on Panel Data Using a "forvalues" Loop

January 7, 2018, 4:37 am

≫ Next: Coefficient of variation

≪ Previous: Setting random seed not enough?

I am using a panel dataset, whereby I have data on log stock returns for a cross-section of companies. I am trying to use these stock returns to create a proxy for volatility using GARCH(1,1) methodology.

I want to run GARCH on each company individually, and then use the results to predict variance values. The code that I have come up with so far is:

forvalues companyid = 1(1)292{
arch Returns if companyid == `i', arch(1) garch(1)
predict Residual, r
predict ProxyVariance, variance
}

However, this returns a syntax error.

Could anyone advise me on why this is not working?

Kind regards,
Jack

↧

Coefficient of variation

January 7, 2018, 5:35 am

≫ Next: Probit model: variable dropped and not used + outcome with single variable

≪ Previous: Predictions on Panel Data Using a "forvalues" Loop

I have two data sets and I would like to investigate differences in terms of
median, standard deviation and coefficient of variation(cv).
For SD I have found sdtest (Bartlett / Levene) but how do I test for differences of the CVs?

↧

Probit model: variable dropped and not used + outcome with single variable

January 7, 2018, 6:54 am

≫ Next: Change from -reg- to -prais- makes R-square missing

≪ Previous: Coefficient of variation

Dear all

I am aware that this problem already discussed on the forum and that 'firthlogit' has been recommended.
However, if I try this command, Stata says 'this command is unrecognised' and if I want to download this command, Stata says that this already exists... (maybe because I work externally on Stata -which is on the server of the University).

This is my command: probit Rejected1 WLB
In which the dependent variable Rejected1 is equal to 1 is the firm is rejected when applying for credit type 1, zero otherwise (binary var)
the independent variable WLB stands for Woman-led-business and is 1 if the firm is led by a women (binary var)
Stata output: note: WLB != 0 predicts success perfectly
WLB dropped and 4 obs not used
I cannot leave out this variable (WLB) as this is the main topic of my master thesis

When I add control variables, which is the following command:
probit Rejected1 WLB Size FirmAge Quality Group Audit Export Foreign Manufacture COUNTRY Competitors Sales Legal
(which are all binary variables except for Size, FirmAge, COUNTRY, Competitors, Sales, Legal - they are numeric)
Stata output: outcome = FirmAge <= 21 predicts data perfectly
If I drop FirmAge, I get the same but with another variable...

Could somebody please help me out?

Best, Elise

↧

Change from -reg- to -prais- makes R-square missing

January 7, 2018, 7:41 am

≫ Next: Theil decomposition- decompose income inequality between and within regions using "ineqdeco"

≪ Previous: Probit model: variable dropped and not used + outcome with single variable

Dear all,

I am estimating trend rate by fitting the equation using OLS (-reg-), y=a+bx, where x is the time variable and b is the trend rate. Due to the existence of 1st-order autocorrelation, I have changed to use -prais- (Paris-Winsten estimation). However, the model becomes totally insignificant with Prob. >F=1.000 and R-square missing. However, if I use -prais, corc- (Cochrane-Orcutt estimation), then Prob.>F=0.8.

I would like to ask why using Prais-Winsten estimation (-prais-) leads to such a poor result. Stata manual has mentioned that for small sample size (n=20 in my case), Prais-Winsten has a "significant advantage", due to its preservation of the first observation.

Thank you very much!

The following is the output.

Code:

prais var1 var2

Iteration 0:  rho = 0.0000
Iteration 1:  rho = 0.4025
Iteration 2:  rho = 0.4098
Iteration 3:  rho = 0.4101
Iteration 4:  rho = 0.4101
Iteration 5:  rho = 0.4101

Prais-Winsten AR(1) regression -- iterated estimates

      Source |       SS           df       MS      Number of obs   =        20
-------------+----------------------------------   F(1, 18)        =      0.00
       Model |           0         1           0   Prob > F        =    1.0000
    Residual |  9.02919029        18  .501621683   R-squared       =         .
-------------+----------------------------------   Adj R-squared   =         .
       Total |  8.41900918        19  .443105746   Root MSE        =    .70825

------------------------------------------------------------------------------
        var1 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        var2 |    .025779   .0421113     0.61   0.548    -.0626935    .1142515
       _cons |   1.729973   .5127416     3.37   0.003     .6527428    2.807203
-------------+----------------------------------------------------------------
         rho |   .4101012
------------------------------------------------------------------------------
Durbin-Watson statistic (original)    1.126068
Durbin-Watson statistic (transformed) 1.972264

↧

Theil decomposition- decompose income inequality between and within regions using "ineqdeco"

January 7, 2018, 8:16 am

≫ Next: Accounting for fixed efffects by variable transformation

≪ Previous: Change from -reg- to -prais- makes R-square missing

Hi all,

I am trying to do Theil decomposition using the STATA module “ineqdeco", but I am not sure how to exactly use the syntax for my situation. I have done some readings but could not find enough tutorials or examples of how to use ineqdeco, so could someone kindly answer my questions in details?

In my research, there are three regions, east, central and north, and in each region, there are some states. I have the population and average income data for each state, and I want to decompose income inequality within and between the regions.

The syntax is ineqdeco varname [weights] [if exp] [in range] [, bygroup(groupvar) welfare summarize ].

My questions are: What my command should be like? Should the "varname" be total income or average income of each state? Does "weights" mean the number of population in each state? Could someone explain the full syntax for me? I would appreciate it if you can also provide some examples or tutorial resources.

I wrote the following command (I am not sure if it is correct):

ineqdeco incomepercapita [w=population] if year==2015,by(region_n)

Thank you in advance,

Yang

↧

Accounting for fixed efffects by variable transformation

January 7, 2018, 10:02 am

≫ Next: Comparing Effects of Two-Way Interaction Nest in Three-Way Interaction

≪ Previous: Theil decomposition- decompose income inequality between and within regions using "ineqdeco"

This is my first post and I hope I'm following the correct procedures.

I have a question about how to account for fixed effects in cross-sectional data (observations within and across countries) that I want to use in sem without having to include (0,1) dummies as explanatory variables.

My study uses cross-sectional data from 12 countries with about 64.000 observations in total. I could of course "solve" my problem by just adding dummy variables for each country to the variable list, but that will also add a lot of unnecessary clutter to my output. Moreover, I'm not interested in the estimates of the country effects, but I do need them to be taken into account when estimating the parameters of the model as there are large differences between countries.

Algebraically, there is an simple transformation which will do the trick by subtracting from each observation of a given variable, the mean of all the observations affected by the fixed effect and adding back the grand mean of all the observationsof the variable of interest whether they're affected by the fixed effect or not.

Denoting the variable of interest by x, subscripted by i for the i-th observation, and by j, an indicator taking the value of 1 if the observation was made in country j, and 0 otherwise, the transformation is equal to x(i,j) - (x.,j) + x (.,.)

What I very much would appreciate is help on is on how to program this transformation in STATA.

Any pointers will be gratefully acknowledged. Thank you.
Roberto Wessels

↧

Comparing Effects of Two-Way Interaction Nest in Three-Way Interaction

January 7, 2018, 10:54 am

≫ Next: Compare random effects to fixed effects?

≪ Previous: Accounting for fixed efffects by variable transformation

Dear Statalist,

I am currently dealing with the analysis of a three-way interaction of all continuous variables, XZW. My regression equation looks like this:

Code:

reg y c.X##c.Z##c.W control1 control2 control3 control4 control5

Essentially, one of my hypotheses is that the effect of the interaction "X*Z," that is the slope of how X is moderated by Z, should vary across levels of "W." In particular, I'm interested in testing whether the difference in the effect of the X*Z interaction is statistically different between the 10th (value of -.25) and 90th (value of .75) percentile values of W. Graphically, they appear to be different and often go in opposite directions, but I need a way to formally test this difference. Margins, of course, does not allow me to compute this using:

Code:

margins, dydx(c.X##c.Z) at(W=(-.25 .75)

I've looked at the contrast function and it appears as though that would work if the W variable were dichotomous, but since it is continuous (and dichotomizing it would lose a lot of information), I am stuck.

I've read through Aiken and West's (1991) Multiple Regression: Testing and Interpreting Interactions and received good advice from a number of individuals who I've reached out to. These have helped orient me in figuring out what exactly my question was (what I am interested in to test my hypothesis), but I'm still stuck in executing it.

If anyone has any help or suggestions I would much appreciate them. Thank you!

Clinton

↧

Compare random effects to fixed effects?

January 7, 2018, 11:52 am

≫ Next: Fixing date variable in medicare data

≪ Previous: Comparing Effects of Two-Way Interaction Nest in Three-Way Interaction

Hi all,

I estimated a model with fixed effects, using data for Germany (the Hausman Test suggested me to use fixed instead of random effects). There is an existing paper which does exactly the same regression as I do, but which uses random effects and data for Switzerland. If possible, I'd like to compare my results to the results of that paper. Is it possible to do a quantitative comparison if one model is estimated with FE and the other one with RE? I'd say no, but I'm not quite sure if there isn't a way I haven't thought about.

Thanks!

↧

Fixing date variable in medicare data

January 7, 2018, 12:04 pm

≫ Next: Decomposing binary dependent variables

≪ Previous: Compare random effects to fixed effects?

I need some help please!

I am using medpar files and looking at pancreatic cancer. The database is set up like this:

ID	Date_Proc1	Proc1_ICD	Date_Proc2	Proc2_ICD	Date_Proc3	Proc3_ICD
1	12/21/2003	5110	12/19/2003	87521	12/24/2003	9004
1	.	.	.	.	.	.
2	11/12/2003	8751	11/14/2003	8009	11/9/2003	5110
3	7/21/2003	5111	7/21/2003	8751	7/23/2003	9004
3	8/22/2003	9004	.	.	.	.

There are dates associated with surgical procedures and their corresponding ICD-9 codes. Each patient may have one observation or multiple observations depending on how many hospitalizations they had that year. What makes it even more complicated is that the dates of the procedures are not in chronological order.

While I can find the first date of any procedure for the observation by using rowmin, I want to be able to chronologically order the dates and the ICD-9 codes that correspond to the dates to help me with my study since the order is important of the first, then the first + 1, first + 2.

Any help would be greatly appreciated. I have read and re-read the FAQ on dates/times/order but am unable to figure out how to do this. Using STATA 15 for this.

Thanks

↧

Decomposing binary dependent variables

January 7, 2018, 12:10 pm

≫ Next: Copying data from observations

≪ Previous: Fixing date variable in medicare data

Dear Stata users,

I would like to decompose the difference in proportions (binary data) between two groups but I am not sure of the right syntax. Anyone that could help? I read about the Oaxaca command but it seems to work only for continuous outcomes.

Thank you in advance.

↧

Copying data from observations

January 7, 2018, 4:49 pm

≫ Next: Counting "active by date" from start and end dates

≪ Previous: Decomposing binary dependent variables

Dear all,

I am having some issues with my dataset and I would be very grateful for your help. I have a large dataset of surveillance data concerning 1 infectious disease. There are many ID duplicates as several patients had several different samples tested. The samples come from different body sites such as Site 1 or Site 2 and they are recorded in the variable SAMPLE. Unfortunately, there are many missing observations in this variable as the SAMPLE was recorded only for the first different observation of each ID as depicted below.

Observation |ID | SAMPLE |
1 1234 Site 1
2 1234 .
3 1234 .
4 1234 Site 2
5 1234 Site 3
6 5678 Site 2
7 5678 .
8 5678 .

What I would like to do is to copy the observations to SAMPLE based on the same ID. For example, Observation 1 of the variable SAMPLE copy to observations 2 and 3 of the same variable, observation 6 copy to observation 7 and 8, etc. As there are about 15.000 missing values, I cannot do it one by one. Is there any command which would allow me to copy these based on the condition of having the same ID?

My second question relates to the possibility of merging several different observations again based on ID. Let's say I want to merge observations with same ID in the variable SAMPLE so that I get this:

Observation |ID | SAMPLE |
1 1234 Site 1, Site 2, Site 3
2 5678 Site 2

Is there any way I can do such a merge?

Thanks in advance for your help.

Andrea

↧

Counting "active by date" from start and end dates

January 7, 2018, 5:04 pm

≫ Next: Numeric Computation: comparison of a float vs a double

≪ Previous: Copying data from observations

Hello StataList,

In my dataset of hospital consultations, each observation represents one consultation and variables include the consultation start date and end date. Am seeking to either tabulate or just count the number of consultations active on each date.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(id startdate enddate)
1 21124 21125
2 21124 21126
3 21124 21136
4 21125 21129
5 21125 21127
6 21125 21128
7 21126 21130
8 21126 21130
end
format %td startdate
format %td enddate

Had originally thought -egen- perhaps followed by -reshape- might be first steps to solve, but honestly haven't made any progress so far.

Any advice or suggestions would be most appreciated.

Thank you -
Randy Absher

↧

Numeric Computation: comparison of a float vs a double

January 8, 2018, 2:42 am

≫ Next: Etable drop-problem

≪ Previous: Counting "active by date" from start and end dates

I want to compare a numeric (double) variable x to another numeric (float) variable y.
. gen flagxy = 1 if float(x) != y
Then I look at the cases that are different.
ed x y if flagxy == 1
My problem is that the flag is up for some cases even when I know that x == y. See the image (x = m1_q45 & y = d71_76)

Array

How can I solve this issue?

Thank you.

↧

Etable drop-problem

January 8, 2018, 3:03 am

≫ Next: Complex Value replacement

≪ Previous: Numeric Computation: comparison of a float vs a double

hi! I have a problem. I try to use putdocx command with drop option. I get :

. putdocx table tbl20(17,.), drop
r(198);

And I have 17 rows in the table so when I use putdocx table tbl20(16,.), drop, there is still 3rd row in the word-table. So I cannot drop that and Stata cannot recognise that. Why?

Array

putdocx table tbl20=etable , width(100%) memtable border(all, nil)
putdocx table tbl20(1,1)=(" "), halign(left)
putdocx table tbl20(1,2)=("Adj.HR"), halign(left)
putdocx table tbl20(2,2), halign(center)
putdocx table tbl20(2,1)=("`period2'"), halign(left)
putdocx table tbl20(.,1), border(right, nil)
putdocx table tbl20(1,.), border(bottom, nil)
putdocx table tbl20(1,.), border(top, nil)
putdocx table tbl20(1,6)=("95% adj.HR Confidence limits"), halign(right)
putdocx table tbl20(2,6), halign(right)

putdocx table tbl20(17,.), drop
putdocx table tbl20(16,.), drop
putdocx table tbl20(15,.), drop
putdocx table tbl20(14,.), drop
putdocx table tbl20(13,.), drop
putdocx table tbl20(12,.), drop
putdocx table tbl20(11,.), drop
putdocx table tbl20(10,.), drop
putdocx table tbl20(9,.), drop
putdocx table tbl20(8,.), drop
putdocx table tbl20(7,.), drop
putdocx table tbl20(6,.), drop
putdocx table tbl20(5,.), drop
putdocx table tbl20(4,.), drop
putdocx table tbl20(3,.), border(bottom, nil)
putdocx table tbl20(3,.), drop
putdocx table tbl20(.,4), drop
putdocx table tbl20(.,3), drop
putdocx table tbl20(1,.), border(bottom, nil)
putdocx table tbl20(1,.), border(top, nil)
putdocx table tbl20(.,2/5), nformat(%03.2f)
putdocx table tbl20(.,3), nformat(%05.4f) halign(center)

putdocx table tbl2T=(2,1)
putdocx table tbl2T(1,1)=table(tbl2)
putdocx table tbl2T(2,1)=table(tbl20)

putdocx save cox0docapu.docx,replace

	Adj.HR	P>\|z\|
Overall adj.effect/pre vacc	1.06	0.7000
>6	0.90	0.7000

With regards,

Jukka

↧

Complex Value replacement

January 8, 2018, 4:54 am

≫ Next: Merging with formulae

≪ Previous: Etable drop-problem

My data is below. I have some problem. My aim is to fill the variable named "rainhhmx" with a value from "jan" to "dec".
The year indicated for example "1984" represented in var "v4" indicates the year of the event.
Now, I would like for example, all 1jan1984 to 31jan1984 to take the value1.072333 and replace the missings of "rainhhmx" with that value repeatedly till end of that month, then takes the values of "feb" of 1984 and replace the "rainhhmx" with that value 0.003333333 from dates 1feb1984 to 29feb1984 and so on.
my final tabe would look somthing like this:

Days

rainhhmx

jan

feb

01jan1984

1.072333

1984

1.072333

.0033333

02jan1984

1.072333

1985

2.813

0.22833334

01feb1984

.0033333

02feb1984

.0033333

01jan1985

2.813

30jan1985

2.813

01feb1985

0.22833334

29feb1985

0.22833334

I have foreach loops but none seems to be working
I know this is a bit complex but if possible, offer a helping hand.
Thanks

clear
input byte(rainhhmx v3) int v4 float(jan feb mar apr may jun jul aug sep oct nov dec Days)
. . 1984 1.0723333 .003333333 2.672333 13.412333 2.0626667 .753 1.854 .919 1.4003333 5.208333 11.292334 3.034333 8766
. . 1985 2.813 .22833334 12.106 8.289 6.783333 2.382 2.451333 1.0093334 2.1233332 5.212333 5.444 1.5883334 8767
. . 1986 1.466 0 3.83 12.079333 7.212 5.516 2.496 1.9433334 2.631 5.212667 24.92633 5.591667 8768
. . 1987 2.8113334 .23866667 1.6863333 5.051333 8.509666 7.508333 1.3366667 1.383 1.0596666 1.3113333 9.808666 .25733334 8769
. . 1988 1.89 1.8683333 2.581667 17.491 5.382 1.8586667 1.221 1.332 .6146666 5.99 7.656 2.751 8770
. . 1989 1.4413333 .8193333 7.177667 4.1336665 2.564 1.1506667 2.627667 .8973333 .7913333 3.3106666 10.165667 2.714333 8771
. . 1990 1.2473333 3.575667 5.299333 22.054 3.6816666 .6896667 2.776333 1.386 .8026667 3.092333 5.571667 2.996667 8772
. . 1991 .6946667 .07433333 3.27 8.114 17.697666 3.623333 .8656667 1.708 .5986667 1.7556666 5.136667 3.5636666 8773
. . 1992 .3033333 0 5.026333 6.107666 6.742333 2.216 2.1326666 1.42 5.222667 2.1016667 5.632667 22.645666 8774
. . 1993 9.065 .7706667 1.514 5.204667 6.955667 3.021667 1.141 .6413333 .9396667 2.112 3.218 3.8443334 8775
. . 1994 .7153333 4.4413333 .776 7.871333 2.64 2.076 .7453333 .9516667 .978 6.500667 16.229 4.5403333 8776
. . 1995 2.065 3.154667 3.2736666 14.724 5.763333 1.458 2.573333 1.1346667 2.2543333 6.01 4.0076666 9.613 8777
. . 1996 2.0266666 .14633334 5.690667 3.0566666 2.899333 3.713667 3.418333 .9423333 2 3.474 10.172667 1.564 8778
. . 1997 .009 0 11.995667 10.67 5.765 6.803667 8.142333 .654 .52533334 11.094666 11.466333 7.27 8779
. . 1998 11.464 3.6473334 7.121 12.502666 14.387667 4.6246667 1.8333334 1.1816666 .665 1.9603333 4.541 3.420667 8780
. . 1999 .09166667 .071 5.272333 5.265333 3.8296666 .7873333 .974 1.1723334 1.3766667 1.5906667 12.194667 6.475667 8781
. . 2000 .4153333 0 2.35 5.469666 .618 1.2346667 .4863333 1.1843333 1.3303334 1.1636667 2.648 10.262 8782
. . 2001 4.996666 .8046666 6.886333 8.973333 2.9733334 2 .641 1.64 1.7616667 5.620333 6.786334 .9943333 8783
. . 2002 3.446667 .202 9.471 10.289333 32.854668 3.888 .5606667 1.2083334 .7863333 4.751 7.201667 2.6526666 8784
. . 2003 .3383333 .24266666 4.907333 8.920667 5.576667 2.3826666 .3926667 1.0916667 1.1096667 9.725333 5.489 2.7276666 8785
. . 2004 3.459 3.025667 2.076 6.327333 2.4776666 1.7623333 .3416667 .595 1.0226667 8.3 8.776334 5.023334 8786
. . 2005 .436 .372 4.7413335 8.508 4.996 1.6343334 1.679 1.9386667 .944 4.581 5.976666 1.8686666 8787
. . 2006 10.156667 3.7216666 5.795333 9.250667 7.016 .657 1.355 .971 1.019 6.308667 8.502334 10.238 8788
. . 2007 1.6063334 1.358 2.525667 9.753 8.108 .6466666 .432 3.301 .839 5.138333 12.685333 1.3093333 8789
. . 2008 8.115 .6743333 8.016 15.617333 1.6546667 .4736667 2.0376666 .7833334 2.1996667 5.438 33.662334 .5986667 8790
. . 2009 2.2443333 1.0093334 2.3756666 5.330667 7.859667 .6323333 .4493333 .8126667 1.196 7.168667 2.254 4.241667 8791
. . 2010 1.601 2.886667 7.950333 9.4 11.393666 2.955333 1.139 1.2226666 .864 6.084333 9.118667 1.1993333 8792
. . 2011 .207 3.391 1.921 14.374333 7.494 2.6756666 3.099 8.056666 1.3686666 14.279333 7.532667 5.850667 8793
. . 2012 .003333333 .17166667 .167 14.851 8.788 2.2706666 2.901 3.0006666 .8046666 7.932 11.877334 3.775 8794
. . 2013 .8726667 1.0893333 12.926 10.932 4.1376667 2.3843334 1.4693333 1.4716667 1.8683333 2.144 7.462667 1.7716666 8795
end
format %td Days

↧

Merging with formulae

January 8, 2018, 4:59 am

≫ Next: Using -matchit- with Chinese characters

≪ Previous: Complex Value replacement

Dear all,
my data unfortunately is confidential, but I am stuck with a question on merging. I'll explain with a dummy example:

I have one dataset containing information on sold products and the date on which they were sold (master with product_id)
I have another dataset containing multiple entries for each product_id where each entry specifies the date starting from which a certain discount percentage was implemented (using dataset with multiple entries by product_id).

I would need to find a way to import, for each sold product, the discount policy that was in effect for that product at the date it was sold. Hence, I would need to import the line with the most recent date before the selling date from the using file within the corresponding product_id group.

Any ideas?
Thank you very much!

↧

Using -matchit- with Chinese characters

January 8, 2018, 5:18 am

≫ Next: Firm fixed effects or industry fixed effects

≪ Previous: Merging with formulae

This is just to post in the forum the following question by Liu Mengdi (刘梦迪):

I am trying to use the "matchit" package in Stata to match firm names in Chinese. I have tried to do it and it seems works well but I still have some concerns about the applicability.
Do you think I can use it for Chinese characters? (If yes it will be great!!)

...and my reply:

First, out of full disclosure, I have almost zero knowledge about Chinese language and characters.

However, matchit is coded to be utf8 compatible (as much as Stata14+ and Mata are). This means that Chinese characters should be compared as such. Probably, using the option sim(ngram,1) for Chinese names is equivalent to using sim(token) for Latin names. You will handle permutations of Chinese characters but not misspellings.

Best,

Julio

↧

Firm fixed effects or industry fixed effects

January 8, 2018, 5:43 am

≫ Next: Cluster analysis: problem of memory

≪ Previous: Using -matchit- with Chinese characters

Hello,

I have a large panel dataset containing firm-level data (firm characteristics, accounting data plus some macroeconomic indicators).

I would like to estimate a linear regression where I regress an accounting variable (e.g. ROE) on a set of firm characteristics (e.g. employee turnover) and some macroeconomic indicators (e.g. unemployment rate). Due to significant differences across the firms and, in particular the industries, it seems important to include some sort of fixed effects in the analysis. What is the better choice: firm fixed effects or industry fixed effects? What are the pros and cons of the two types of fixed effects?

Thanks a lot for any help on this.

Kind regards,
Ingo

↧

Cluster analysis: problem of memory

January 8, 2018, 5:47 am

≫ Next: display format

≪ Previous: Firm fixed effects or industry fixed effects

Hello,

I'm working on a panel data of 3306 variables and 240000 observations ( stata/SE 13.0), when I use cluster analysis "cluster wardslinkage" I receive this message "insufficient memory for ClusterMatrix r(950);" I drop 1000 variables, but I receive the same message.
clustering code :

Code:

cluster wardslinkage PTA, measure(L2)

Thanks

↧