Quantcast
Channel: Statalist
Viewing all 65488 articles
Browse latest View live

Setting random seed not enough?

$
0
0
I'm writing a program which uses Mata to consider large numbers of random permutations of a matrix, to search for an optimal arrangement. In testing I'm using "set seed" and "mata: rseed()" to get consistent results. From one Stata session to another this works, but if I run the program twice in the same session (with a "clear all" between runs) the random numbers eventually (but not immediately) become inconsistent.

Should I need more than setting the seed plus "clear all" to restore the state of the Stata session to the beginning, at least in terms of PRNGs?

(Stata 14.2, on two separate machines).

Example results (by iteration 3, at least one of the 2000 random permutations differs):
Code:
: pop = ga(dd, 2000)
  Iter 1  : Σ dist: min 977.377; mean: 1067.934; max: 1090.905
  Iter 2  : Σ dist: min 950.5679; mean: 1039.75; max: 1064.551
  Iter 3  : Σ dist: min 950.5679; mean: 1019.234; max: 1039.178
[...]
: pop = ga(dd, 2000)
  Iter 1  : Σ dist: min 977.377; mean: 1067.934; max: 1090.905
  Iter 2  : Σ dist: min 950.5679; mean: 1039.75; max: 1064.551
  Iter 3  : Σ dist: min 950.5679; mean: 1019.247; max: 1039.178
[...]
Code to run the main code twice:
Code:
do test.do
clear all
clear mata
do test.do
Main code (test.do):
Code:
// Clear all and set seeds
clear all
set seed 123456
mata: rseed(123456)

// Generate random data and make a dissimilarity matrix
set obs 50
forvalues x = 1/10 {
  gen x`x'=rnormal()
}
matrix dissim dd = x1-x10, L2squared

mata

real scalar fitness (real matrix distmat, real matrix permdat) {
  width = cols(distmat)
  perm = transposeonly(order(transposeonly(permdat),1))
  tempdm = distmat[perm,perm]

  // Sum minor diagonal (offset 0 gives main diagonal)
  offset = 1
  f = sum(diagonal(tempdm[1..(width-offset),(1+offset)..width]))

  return(f)
}

// Genetic algorithm function: look for a matrix permutation that minimises the fitness function
real matrix function ga (real matrix distmat, real scalar npop) {
  width = rows(distmat)

  npop = 2*floor(npop/2) // even
  nparents = floor(npop/2) // change from 50% to intensify selection
  nsurv = nparents
  nnew = floor(npop*0.20)
  ntoconv = ceil(npop*0.1)
  // Use top 50% as core. They procreate to replace 30%, and 20% is new random
  // Progress data is on top 50%

  // Create a random population, each row a permutation
  population = rnormal(npop,width,0,1)
  sigmadist = J(npop,1,.)

  // Main iteration
  iter = 0
  while (iter==0 |(max(sigmadist[1..ntoconv]) != min(sigmadist[1..ntoconv]))) {
    newpop = population
    iter++

    // Calculate fitness (lower is better)
    for (i=1; i<=npop; i++) {
      sigmadist[i] = fitness(distmat, newpop[i,.])
    }

    // Order the new medoid sets per fitness (best (lowest) first)
    newpop = population[order(sigmadist,1),][1..nparents,]
    sigmadist = sigmadist[order(sigmadist,1)]
    "Iter "+strofreal(iter,"%-3.0f")+": Σ dist: min "+strofreal(min(sigmadist[1..ntoconv]))+"; mean: "+strofreal(mean(sigmadist[1..ntoconv]))+"; max: "+strofreal(max(sigmadist[1..ntoconv]))
    displayflush()

    // Keep top 50%
    population[1..nsurv,] = newpop[1..nsurv,]

    // Crossover section
    // Top 50% used for crossover to create 30%, remaining 20% new
    for (i=1+nsurv; i<=npop-nnew; i++) {
      j = 1+floor((1-runiform(1,1)^0.1)*nparents)
      k = 1+floor((1-runiform(1,1)^0.1)*nparents)
      l = runiformint(1,1,1,width-1)
      population[i,1..l] = newpop[j,1..l]
      population[i,l+1..width] = newpop[k,l+1..width]
    }
    population[(npop-nnew+1)..npop,] = rnormal(nnew, width, 0, 1)

  }
  return(population)
}

end

mata
dd = st_matrix("dd")
pop = ga(dd, 2000)
end

Predictions on Panel Data Using a &quot;forvalues&quot; Loop

$
0
0
I am using a panel dataset, whereby I have data on log stock returns for a cross-section of companies. I am trying to use these stock returns to create a proxy for volatility using GARCH(1,1) methodology.

I want to run GARCH on each company individually, and then use the results to predict variance values. The code that I have come up with so far is:

forvalues companyid = 1(1)292{
arch Returns if companyid == `i', arch(1) garch(1)
predict Residual, r
predict ProxyVariance, variance
}

However, this returns a syntax error.

Could anyone advise me on why this is not working?

Kind regards,
Jack


Coefficient of variation

$
0
0
I have two data sets and I would like to investigate differences in terms of
median, standard deviation and coefficient of variation(cv).
For SD I have found sdtest (Bartlett / Levene) but how do I test for differences of the CVs?

Probit model: variable dropped and not used + outcome with single variable

$
0
0
Dear all

I am aware that this problem already discussed on the forum and that 'firthlogit' has been recommended.
However, if I try this command, Stata says 'this command is unrecognised' and if I want to download this command, Stata says that this already exists... (maybe because I work externally on Stata -which is on the server of the University).

This is my command: probit Rejected1 WLB
In which the dependent variable Rejected1 is equal to 1 is the firm is rejected when applying for credit type 1, zero otherwise (binary var)
the independent variable WLB stands for Woman-led-business and is 1 if the firm is led by a women (binary var)
Stata output: note: WLB != 0 predicts success perfectly
WLB dropped and 4 obs not used

I cannot leave out this variable (WLB) as this is the main topic of my master thesis

When I add control variables, which is the following command:
probit Rejected1 WLB Size FirmAge Quality Group Audit Export Foreign Manufacture COUNTRY Competitors Sales Legal
(which are all binary variables except for Size, FirmAge, COUNTRY, Competitors, Sales, Legal - they are numeric)
Stata output: outcome = FirmAge <= 21 predicts data perfectly
If I drop FirmAge, I get the same but with another variable...

Could somebody please help me out?

Best, Elise

Change from -reg- to -prais- makes R-square missing

$
0
0
Dear all,

I am estimating trend rate by fitting the equation using OLS (-reg-), y=a+bx, where x is the time variable and b is the trend rate. Due to the existence of 1st-order autocorrelation, I have changed to use -prais- (Paris-Winsten estimation). However, the model becomes totally insignificant with Prob. >F=1.000 and R-square missing. However, if I use -prais, corc- (Cochrane-Orcutt estimation), then Prob.>F=0.8.

I would like to ask why using Prais-Winsten estimation (-prais-) leads to such a poor result. Stata manual has mentioned that for small sample size (n=20 in my case), Prais-Winsten has a "significant advantage", due to its preservation of the first observation.

Thank you very much!

The following is the output.

Code:
prais var1 var2

Iteration 0:  rho = 0.0000
Iteration 1:  rho = 0.4025
Iteration 2:  rho = 0.4098
Iteration 3:  rho = 0.4101
Iteration 4:  rho = 0.4101
Iteration 5:  rho = 0.4101

Prais-Winsten AR(1) regression -- iterated estimates

      Source |       SS           df       MS      Number of obs   =        20
-------------+----------------------------------   F(1, 18)        =      0.00
       Model |           0         1           0   Prob > F        =    1.0000
    Residual |  9.02919029        18  .501621683   R-squared       =         .
-------------+----------------------------------   Adj R-squared   =         .
       Total |  8.41900918        19  .443105746   Root MSE        =    .70825

------------------------------------------------------------------------------
        var1 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        var2 |    .025779   .0421113     0.61   0.548    -.0626935    .1142515
       _cons |   1.729973   .5127416     3.37   0.003     .6527428    2.807203
-------------+----------------------------------------------------------------
         rho |   .4101012
------------------------------------------------------------------------------
Durbin-Watson statistic (original)    1.126068
Durbin-Watson statistic (transformed) 1.972264

Theil decomposition- decompose income inequality between and within regions using &quot;ineqdeco&quot;

$
0
0
Hi all,

I am trying to do Theil decomposition using the STATA module “ineqdeco", but I am not sure how to exactly use the syntax for my situation. I have done some readings but could not find enough tutorials or examples of how to use ineqdeco, so could someone kindly answer my questions in details?

In my research, there are three regions, east, central and north, and in each region, there are some states. I have the population and average income data for each state, and I want to decompose income inequality within and between the regions.

The syntax is ineqdeco varname [weights] [if exp] [in range] [, bygroup(groupvar) welfare summarize ].

My questions are: What my command should be like? Should the "varname" be total income or average income of each state? Does "weights" mean the number of population in each state? Could someone explain the full syntax for me? I would appreciate it if you can also provide some examples or tutorial resources.

I wrote the following command (I am not sure if it is correct):

ineqdeco incomepercapita [w=population] if year==2015,by(region_n)

Thank you in advance,

Yang

Accounting for fixed efffects by variable transformation

$
0
0
This is my first post and I hope I'm following the correct procedures.

I have a question about how to account for fixed effects in cross-sectional data (observations within and across countries) that I want to use in sem without having to include (0,1) dummies as explanatory variables.

My study uses cross-sectional data from 12 countries with about 64.000 observations in total. I could of course "solve" my problem by just adding dummy variables for each country to the variable list, but that will also add a lot of unnecessary clutter to my output. Moreover, I'm not interested in the estimates of the country effects, but I do need them to be taken into account when estimating the parameters of the model as there are large differences between countries.

Algebraically, there is an simple transformation which will do the trick by subtracting from each observation of a given variable, the mean of all the observations affected by the fixed effect and adding back the grand mean of all the observationsof the variable of interest whether they're affected by the fixed effect or not.

Denoting the variable of interest by x, subscripted by i for the i-th observation, and by j, an indicator taking the value of 1 if the observation was made in country j, and 0 otherwise, the transformation is equal to x(i,j) - (x.,j) + x (.,.)

What I very much would appreciate is help on is on how to program this transformation in STATA.

Any pointers will be gratefully acknowledged. Thank you.
Roberto Wessels

Comparing Effects of Two-Way Interaction Nest in Three-Way Interaction

$
0
0
Dear Statalist,

I am currently dealing with the analysis of a three-way interaction of all continuous variables, XZW. My regression equation looks like this:

Code:
reg y c.X##c.Z##c.W control1 control2 control3 control4 control5
Essentially, one of my hypotheses is that the effect of the interaction "X*Z," that is the slope of how X is moderated by Z, should vary across levels of "W." In particular, I'm interested in testing whether the difference in the effect of the X*Z interaction is statistically different between the 10th (value of -.25) and 90th (value of .75) percentile values of W. Graphically, they appear to be different and often go in opposite directions, but I need a way to formally test this difference. Margins, of course, does not allow me to compute this using:
Code:
margins, dydx(c.X##c.Z) at(W=(-.25 .75)
I've looked at the contrast function and it appears as though that would work if the W variable were dichotomous, but since it is continuous (and dichotomizing it would lose a lot of information), I am stuck.

I've read through Aiken and West's (1991) Multiple Regression: Testing and Interpreting Interactions and received good advice from a number of individuals who I've reached out to. These have helped orient me in figuring out what exactly my question was (what I am interested in to test my hypothesis), but I'm still stuck in executing it.

If anyone has any help or suggestions I would much appreciate them. Thank you!

Clinton

Compare random effects to fixed effects?

$
0
0
Hi all,

I estimated a model with fixed effects, using data for Germany (the Hausman Test suggested me to use fixed instead of random effects). There is an existing paper which does exactly the same regression as I do, but which uses random effects and data for Switzerland. If possible, I'd like to compare my results to the results of that paper. Is it possible to do a quantitative comparison if one model is estimated with FE and the other one with RE? I'd say no, but I'm not quite sure if there isn't a way I haven't thought about.

Thanks!

Fixing date variable in medicare data

$
0
0
I need some help please!

I am using medpar files and looking at pancreatic cancer. The database is set up like this:
ID Date_Proc1 Proc1_ICD Date_Proc2 Proc2_ICD Date_Proc3 Proc3_ICD
1 12/21/2003 5110 12/19/2003 87521 12/24/2003 9004
1 . . . . . .
2 11/12/2003 8751 11/14/2003 8009 11/9/2003 5110
3 7/21/2003 5111 7/21/2003 8751 7/23/2003 9004
3 8/22/2003 9004 . . . .

There are dates associated with surgical procedures and their corresponding ICD-9 codes. Each patient may have one observation or multiple observations depending on how many hospitalizations they had that year. What makes it even more complicated is that the dates of the procedures are not in chronological order.

While I can find the first date of any procedure for the observation by using rowmin, I want to be able to chronologically order the dates and the ICD-9 codes that correspond to the dates to help me with my study since the order is important of the first, then the first + 1, first + 2.

Any help would be greatly appreciated. I have read and re-read the FAQ on dates/times/order but am unable to figure out how to do this. Using STATA 15 for this.

Thanks

Decomposing binary dependent variables

$
0
0
Dear Stata users,

I would like to decompose the difference in proportions (binary data) between two groups but I am not sure of the right syntax. Anyone that could help? I read about the Oaxaca command but it seems to work only for continuous outcomes.

Thank you in advance.

Copying data from observations

$
0
0
Dear all,

I am having some issues with my dataset and I would be very grateful for your help. I have a large dataset of surveillance data concerning 1 infectious disease. There are many ID duplicates as several patients had several different samples tested. The samples come from different body sites such as Site 1 or Site 2 and they are recorded in the variable SAMPLE. Unfortunately, there are many missing observations in this variable as the SAMPLE was recorded only for the first different observation of each ID as depicted below.


Observation |ID | SAMPLE |
1 1234 Site 1
2 1234 .
3 1234 .
4 1234 Site 2
5 1234 Site 3
6 5678 Site 2
7 5678 .
8 5678 .


What I would like to do is to copy the observations to SAMPLE based on the same ID. For example, Observation 1 of the variable SAMPLE copy to observations 2 and 3 of the same variable, observation 6 copy to observation 7 and 8, etc. As there are about 15.000 missing values, I cannot do it one by one. Is there any command which would allow me to copy these based on the condition of having the same ID?

My second question relates to the possibility of merging several different observations again based on ID. Let's say I want to merge observations with same ID in the variable SAMPLE so that I get this:


Observation |ID | SAMPLE |
1 1234 Site 1, Site 2, Site 3
2 5678 Site 2


Is there any way I can do such a merge?

Thanks in advance for your help.

Andrea

Counting &quot;active by date&quot; from start and end dates

$
0
0
Hello StataList,

In my dataset of hospital consultations, each observation represents one consultation and variables include the consultation start date and end date. Am seeking to either tabulate or just count the number of consultations active on each date.


Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input float(id startdate enddate)
1 21124 21125
2 21124 21126
3 21124 21136
4 21125 21129
5 21125 21127
6 21125 21128
7 21126 21130
8 21126 21130
end
format %td startdate
format %td enddate
Had originally thought -egen- perhaps followed by -reshape- might be first steps to solve, but honestly haven't made any progress so far.

Any advice or suggestions would be most appreciated.

Thank you -
Randy Absher

Numeric Computation: comparison of a float vs a double

$
0
0
I want to compare a numeric (double) variable x to another numeric (float) variable y.
. gen flagxy = 1 if float(x) != y
Then I look at the cases that are different.
ed x y if flagxy == 1
My problem is that the flag is up for some cases even when I know that x == y. See the image (x = m1_q45 & y = d71_76)

Array


How can I solve this issue?

Thank you.

Etable drop-problem

$
0
0
hi! I have a problem. I try to use putdocx command with drop option. I get :

. putdocx table tbl20(17,.), drop
r(198);

And I have 17 rows in the table so when I use putdocx table tbl20(16,.), drop, there is still 3rd row in the word-table. So I cannot drop that and Stata cannot recognise that. Why?

Array





putdocx table tbl20=etable , width(100%) memtable border(all, nil)
putdocx table tbl20(1,1)=(" "), halign(left)
putdocx table tbl20(1,2)=("Adj.HR"), halign(left)
putdocx table tbl20(2,2), halign(center)
putdocx table tbl20(2,1)=("`period2'"), halign(left)
putdocx table tbl20(.,1), border(right, nil)
putdocx table tbl20(1,.), border(bottom, nil)
putdocx table tbl20(1,.), border(top, nil)
putdocx table tbl20(1,6)=("95% adj.HR Confidence limits"), halign(right)
putdocx table tbl20(2,6), halign(right)

putdocx table tbl20(17,.), drop
putdocx table tbl20(16,.), drop
putdocx table tbl20(15,.), drop
putdocx table tbl20(14,.), drop
putdocx table tbl20(13,.), drop
putdocx table tbl20(12,.), drop
putdocx table tbl20(11,.), drop
putdocx table tbl20(10,.), drop
putdocx table tbl20(9,.), drop
putdocx table tbl20(8,.), drop
putdocx table tbl20(7,.), drop
putdocx table tbl20(6,.), drop
putdocx table tbl20(5,.), drop
putdocx table tbl20(4,.), drop
putdocx table tbl20(3,.), border(bottom, nil)
putdocx table tbl20(3,.), drop
putdocx table tbl20(.,4), drop
putdocx table tbl20(.,3), drop
putdocx table tbl20(1,.), border(bottom, nil)
putdocx table tbl20(1,.), border(top, nil)
putdocx table tbl20(.,2/5), nformat(%03.2f)
putdocx table tbl20(.,3), nformat(%05.4f) halign(center)

putdocx table tbl2T=(2,1)
putdocx table tbl2T(1,1)=table(tbl2)
putdocx table tbl2T(2,1)=table(tbl20)

putdocx save cox0docapu.docx,replace
Adj.HR P>|z|
Overall adj.effect/pre vacc 1.06 0.7000
>6 0.90 0.7000
With regards,

Jukka

Complex Value replacement

$
0
0
My data is below. I have some problem. My aim is to fill the variable named "rainhhmx" with a value from "jan" to "dec".
The year indicated for example "1984" represented in var "v4" indicates the year of the event.
Now, I would like for example, all 1jan1984 to 31jan1984 to take the value1.072333 and replace the missings of "rainhhmx" with that value repeatedly till end of that month, then takes the values of "feb" of 1984 and replace the "rainhhmx" with that value 0.003333333 from dates 1feb1984 to 29feb1984 and so on.
my final tabe would look somthing like this:
Days
rainhhmx
v4 jan feb
01jan1984 1.072333 1984 1.072333 .0033333
02jan1984 1.072333 1985 2.813 0.22833334
01feb1984 .0033333
02feb1984 .0033333
01jan1985 2.813
30jan1985 2.813
01feb1985 0.22833334
29feb1985 0.22833334
I have foreach loops but none seems to be working
I know this is a bit complex but if possible, offer a helping hand.
Thanks

clear
input byte(rainhhmx v3) int v4 float(jan feb mar apr may jun jul aug sep oct nov dec Days)
. . 1984 1.0723333 .003333333 2.672333 13.412333 2.0626667 .753 1.854 .919 1.4003333 5.208333 11.292334 3.034333 8766
. . 1985 2.813 .22833334 12.106 8.289 6.783333 2.382 2.451333 1.0093334 2.1233332 5.212333 5.444 1.5883334 8767
. . 1986 1.466 0 3.83 12.079333 7.212 5.516 2.496 1.9433334 2.631 5.212667 24.92633 5.591667 8768
. . 1987 2.8113334 .23866667 1.6863333 5.051333 8.509666 7.508333 1.3366667 1.383 1.0596666 1.3113333 9.808666 .25733334 8769
. . 1988 1.89 1.8683333 2.581667 17.491 5.382 1.8586667 1.221 1.332 .6146666 5.99 7.656 2.751 8770
. . 1989 1.4413333 .8193333 7.177667 4.1336665 2.564 1.1506667 2.627667 .8973333 .7913333 3.3106666 10.165667 2.714333 8771
. . 1990 1.2473333 3.575667 5.299333 22.054 3.6816666 .6896667 2.776333 1.386 .8026667 3.092333 5.571667 2.996667 8772
. . 1991 .6946667 .07433333 3.27 8.114 17.697666 3.623333 .8656667 1.708 .5986667 1.7556666 5.136667 3.5636666 8773
. . 1992 .3033333 0 5.026333 6.107666 6.742333 2.216 2.1326666 1.42 5.222667 2.1016667 5.632667 22.645666 8774
. . 1993 9.065 .7706667 1.514 5.204667 6.955667 3.021667 1.141 .6413333 .9396667 2.112 3.218 3.8443334 8775
. . 1994 .7153333 4.4413333 .776 7.871333 2.64 2.076 .7453333 .9516667 .978 6.500667 16.229 4.5403333 8776
. . 1995 2.065 3.154667 3.2736666 14.724 5.763333 1.458 2.573333 1.1346667 2.2543333 6.01 4.0076666 9.613 8777
. . 1996 2.0266666 .14633334 5.690667 3.0566666 2.899333 3.713667 3.418333 .9423333 2 3.474 10.172667 1.564 8778
. . 1997 .009 0 11.995667 10.67 5.765 6.803667 8.142333 .654 .52533334 11.094666 11.466333 7.27 8779
. . 1998 11.464 3.6473334 7.121 12.502666 14.387667 4.6246667 1.8333334 1.1816666 .665 1.9603333 4.541 3.420667 8780
. . 1999 .09166667 .071 5.272333 5.265333 3.8296666 .7873333 .974 1.1723334 1.3766667 1.5906667 12.194667 6.475667 8781
. . 2000 .4153333 0 2.35 5.469666 .618 1.2346667 .4863333 1.1843333 1.3303334 1.1636667 2.648 10.262 8782
. . 2001 4.996666 .8046666 6.886333 8.973333 2.9733334 2 .641 1.64 1.7616667 5.620333 6.786334 .9943333 8783
. . 2002 3.446667 .202 9.471 10.289333 32.854668 3.888 .5606667 1.2083334 .7863333 4.751 7.201667 2.6526666 8784
. . 2003 .3383333 .24266666 4.907333 8.920667 5.576667 2.3826666 .3926667 1.0916667 1.1096667 9.725333 5.489 2.7276666 8785
. . 2004 3.459 3.025667 2.076 6.327333 2.4776666 1.7623333 .3416667 .595 1.0226667 8.3 8.776334 5.023334 8786
. . 2005 .436 .372 4.7413335 8.508 4.996 1.6343334 1.679 1.9386667 .944 4.581 5.976666 1.8686666 8787
. . 2006 10.156667 3.7216666 5.795333 9.250667 7.016 .657 1.355 .971 1.019 6.308667 8.502334 10.238 8788
. . 2007 1.6063334 1.358 2.525667 9.753 8.108 .6466666 .432 3.301 .839 5.138333 12.685333 1.3093333 8789
. . 2008 8.115 .6743333 8.016 15.617333 1.6546667 .4736667 2.0376666 .7833334 2.1996667 5.438 33.662334 .5986667 8790
. . 2009 2.2443333 1.0093334 2.3756666 5.330667 7.859667 .6323333 .4493333 .8126667 1.196 7.168667 2.254 4.241667 8791
. . 2010 1.601 2.886667 7.950333 9.4 11.393666 2.955333 1.139 1.2226666 .864 6.084333 9.118667 1.1993333 8792
. . 2011 .207 3.391 1.921 14.374333 7.494 2.6756666 3.099 8.056666 1.3686666 14.279333 7.532667 5.850667 8793
. . 2012 .003333333 .17166667 .167 14.851 8.788 2.2706666 2.901 3.0006666 .8046666 7.932 11.877334 3.775 8794
. . 2013 .8726667 1.0893333 12.926 10.932 4.1376667 2.3843334 1.4693333 1.4716667 1.8683333 2.144 7.462667 1.7716666 8795
end
format %td Days

Merging with formulae

$
0
0
Dear all,
my data unfortunately is confidential, but I am stuck with a question on merging. I'll explain with a dummy example:
  • I have one dataset containing information on sold products and the date on which they were sold (master with product_id)
  • I have another dataset containing multiple entries for each product_id where each entry specifies the date starting from which a certain discount percentage was implemented (using dataset with multiple entries by product_id).
I would need to find a way to import, for each sold product, the discount policy that was in effect for that product at the date it was sold. Hence, I would need to import the line with the most recent date before the selling date from the using file within the corresponding product_id group.

Any ideas?
Thank you very much!

Using -matchit- with Chinese characters

$
0
0
This is just to post in the forum the following question by Liu Mengdi (刘梦迪):

I am trying to use the "matchit" package in Stata to match firm names in Chinese. I have tried to do it and it seems works well but I still have some concerns about the applicability.
Do you think I can use it for Chinese characters? (If yes it will be great!!)
...and my reply:

First, out of full disclosure, I have almost zero knowledge about Chinese language and characters.

However, matchit is coded to be utf8 compatible (as much as Stata14+ and Mata are). This means that Chinese characters should be compared as such. Probably, using the option sim(ngram,1) for Chinese names is equivalent to using sim(token) for Latin names. You will handle permutations of Chinese characters but not misspellings.
Best,

Julio

Firm fixed effects or industry fixed effects

$
0
0
Hello,

I have a large panel dataset containing firm-level data (firm characteristics, accounting data plus some macroeconomic indicators).

I would like to estimate a linear regression where I regress an accounting variable (e.g. ROE) on a set of firm characteristics (e.g. employee turnover) and some macroeconomic indicators (e.g. unemployment rate). Due to significant differences across the firms and, in particular the industries, it seems important to include some sort of fixed effects in the analysis. What is the better choice: firm fixed effects or industry fixed effects? What are the pros and cons of the two types of fixed effects?

Thanks a lot for any help on this.

Kind regards,
Ingo

Cluster analysis: problem of memory

$
0
0
Hello,

I'm working on a panel data of 3306 variables and 240000 observations ( stata/SE 13.0), when I use cluster analysis "cluster wardslinkage" I receive this message "insufficient memory for ClusterMatrix r(950);" I drop 1000 variables, but I receive the same message.
clustering code :
Code:
cluster wardslinkage PTA, measure(L2)
Thanks
Viewing all 65488 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>