Using "foreach" to merge multiple .dta within one folder

June 25, 2020, 12:12 pm

≫ Next: How to merge biannual variable into annual one

≪ Previous: What is the most suitable analysis for an unbalanced panel data?

I have multiple .dta files from one dataset located in a folder. I would like to create a loop that creates a unique "id" variable within each .dta and then merges all .dta files on that variable.
Presently, I am coding:
use "H16A_R.dta"
rename *, lower
egen id=concat(hhid pn)
save a_r, replace

use "H16B_R"
rename *, lower
egen id=concat(hhid pn)
merge 1:1 id using a_r
drop _merge
save 2016_v2, replace

.....
And so on...

I would like to do something like this:
foreach x.dta in "E:\filename\subfilename\foldername\subfoldern ame" {
rename *, lower
egen id=concat(hhid pn)
merge 1:1 id
drop _merge
save 2016_v2, replace
}

However, when I execute the immediately above, I receive a r:198 "Invalid Name."

Is there any way to do what I'm trying or must it be done the old fashioned way?

Thank you all,
Trent

↧

How to merge biannual variable into annual one

June 25, 2020, 1:37 pm

≫ Next: Confidence interval for proportion (p) and population total (N) for survey data like NHANES

≪ Previous: Using "foreach" to merge multiple .dta within one folder

Hello,

I'm currently working with two datasets. My problem is that the variable I need from one of them is measured biannually and coded with string variables measuring requests send to a firm (requests) by a country (country) based on a date (date).
So I have encoded the (requests) variable and the (country) variable but I can't find a working format for the (date) variable, which is coded either like this: 2013-12-31 or this 2013-06-30, so either in December or June.
For my calculation I have to combine the values of the semi-annual (requests) for each (country) into a single observation.

I thought maybe with dummy variables but I haven't found a way.

Thanks in advance for your time.

↧

Confidence interval for proportion (p) and population total (N) for survey data like NHANES

June 25, 2020, 1:50 pm

≫ Next: creating correlations using rangestat or some alternative

≪ Previous: How to merge biannual variable into annual one

Hello, guys:

I have a question about confidence intervals for the proportion (p) and for the population total (N) using Stata 15. This is my problem:

I am estimating the distribution of the population (age 60-80) over gender using the NHANES datasets for the 2011-2012 and 2013-2014 cycles. To do this, I run the following commands:

svyset sdmvpsu [pweight=wtint4yr], strata(sdmvstra) vce(linearized) singleunit(missing)
svy, subpop (if ridageyr >= 60 & ridageyr <=80), proportion riagendr
svy, subpop (if ridageyr >= 60 & ridageyr <= 80): tabulate riagendr, count ci se format(%30.6g)

By doing this I get the following results for the proportion (p):

For male: p = 0.4883276, se = 0.0044925, lb = 0.4791817, ub = 0.4974813.
For female: p = 0.5116724, se = 0.0044925, lb = 0.5025187, ub = 0.5208183.

While I get the following results for the total (N):

For male: N = 26 890 724, se = 1 696 495, lb = 23 435 076, ub = 30 346 373.
For female: N = 32 768 839, se = 1 750 499, lb = 29 203 190, ub = 36 334 489.
Size of subpopulation: N = 26 890 724 + 32 768 839 = 59 659 564.

So far so good. However, when I multiply the confidence bounds obtained by the subpopulation total, I don't get the same results:

The lb for male: 59 659 564 * 0.4791817 = 28 587 771 =! 23 435 076.
The ub for male: 59 659 564 * 0.4974813 = 29 679 517 =! 30 346 373.

The same goes for women. Why is this happening? Why the results do not match? Where is the mistake?

I'd be very grateful if you could help me. Thank you very much.

↧

creating correlations using rangestat or some alternative

June 25, 2020, 2:10 pm

≫ Next: case-control groups not responding

≪ Previous: Confidence interval for proportion (p) and population total (N) for survey data like NHANES

Hello,

I'd like to create correlations using rangestat or some alternative command. It seems to me that rangestat can't handle more than 2 variables from which to create correlations. I have data that looks something like this:

time ret_a ret_b ret_c ret_d ret_e date
03feb2020 03:05:00 -.0012849 -.0007816 -.0047389 -.0014184 -.0004325 03feb2020
03feb2020 03:10:00 -.0002031 -.0010812 -.0022801 -.0005348 -.00159 03feb2020
03feb2020 03:15:00 .0008296 .0010747 .0001326 .0012698 .0005761 03feb2020
03feb2020 03:20:00 -.0016748 -.0009866 .0000749 -.0018456 -.0016535 03feb2020
03feb2020 03:25:00 .0006948 .0014839 .0019909 .0012688 .002196 03feb2020
03feb2020 03:30:00 -.0018966 -.0016989 -.0007764 -.0017675 -.0014679 03feb2020
03feb2020 03:35:00 .0016966 .0016097 -.0003482 .0027163 .0022473 03feb2020
03feb2020 03:40:00 .0007114 .0009351 -.0003281 .0007936 .0008968 03feb2020
03feb2020 03:45:00 .0003723 .0004799 .0006825 .0003141 .0011547 03feb2020
03feb2020 03:50:00 -.0001184 .00025 .001908 .0004152 .0003027 03feb2020
03feb2020 03:55:00 -.0001522 -.0005204 .0009306 -.0004699 -.0012612 03feb2020
03feb2020 04:00:00 -.0005755 -.0005181 -.0013373 -.000755 -.0007022 03feb2020
03feb2020 04:05:00 -.0001354 -.0000996 -.0007097 -.0007144 -.0009515 03feb2020
03feb2020 04:10:00 .0008298 .000978 .0006154 .0008507 .0013792 03feb2020
03feb2020 04:15:00 .0010999 .0005664 -.0003449 .0006731 .0003825 03feb2020

I'd like to get day-by-day correlations for each pair of {ret_a, ret_b, ret_c, ret_d, ret_e}, and each day-pair should have its own observation in the resulting dataset. What is the best way of going about this?

Thank you very much,
Stan

↧

case-control groups not responding

June 25, 2020, 2:26 pm

≫ Next: 95% confidence interval for lroc?

≪ Previous: creating correlations using rangestat or some alternative

Hi Clyde Schechter, first of all thank you for the very helpful codes for the case-control problem. https://www.statalist.org/forums/for...06#post1357406

I am trying to create case-control and match on age, hospital length of stay (LOS) and Charlson comorbidity index. I want to create breast cancer case group and a matched control group. I get "not responding" message. How can I solve this issue. Thanks you so much in advance.

Code:

preserve
keep if breast_cancer == 0
rename * *_control
rename AGE_control AGE
rename LOS_control LOS
rename charlindex_control charlindex
 
tempfile controls
save `controls'
 
restore
keep if breast_cancer == 1
rename * *_case
rename AGE_case AGE
rename LOS_case LOS
rename charlindex_case charlindex
joinby AGE LOS charlindex using `controls'

I am using HCUP NIS dataset.

Code:

input int AGE byte(DIED FEMALE) long LOS byte PAY1 int PL_NCHS byte RACE double TOTCHG byte ZIPINC_QRTL float(age charlindex) byte cancer float breast_cancer byte(surgery chemotherapy radiation)
50 0 0  2 3 2 1  48056 4 3 1 0 0 0 0 0
90 0 1  6 1 2 1  14233 3 5 1 0 0 0 0 0
77 0 1  2 1 4 2  12238 2 5 1 0 0 0 0 0
39 0 0  1 2 3 3  13503 3 2 1 0 0 0 0 0
 0 0 0  1 2 5 1   2953 2 0 0 0 0 0 0 0
65 0 0  5 6 6 2  11388 1 5 0 0 0 0 0 0
 0 0 1  2 3 1 .   1705 2 0 0 0 0 0 0 0
58 0 1  8 2 6 1  47049 1 4 2 0 0 0 0 0
 0 0 1  2 2 1 .   2482 4 0 0 0 0 0 0 0
68 0 0  1 1 1 2  46439 4 5 4 0 0 0 0 0
32 0 1 16 3 5 1  19062 1 1 0 0 0 0 0 0
62 0 1  2 1 1 1  48083 1 4 2 0 0 0 0 0
73 0 1  3 1 5 1  35484 2 5 0 0 0 0 0 0
27 0 1  1 3 5 1  18764 1 1 1 0 0 0 0 0
 0 0 0  2 2 2 1   3133 1 0 0 0 0 0 0 0
82 0 0  3 1 2 2  27698 4 5 5 0 0 0 0 0
 0 0 1  2 3 3 1   3354 3 0 0 0 0 0 0 0
75 0 1  6 1 5 3 191812 1 5 4 0 0 0 0 0
38 0 0 12 2 2 2  21444 4 2 3 0 0 0 0 0
50 0 0  6 2 2 1  89409 4 3 4 0 0 0 0 0
56 0 0  3 6 5 1  30585 2 4 2 0 0 0 0 0
61 0 0  2 3 2 1  21252 4 4 1 0 0 0 0 0
78 0 1  4 1 1 2  12327 1 5 4 0 0 0 0 0
 0 0 0  2 4 1 1   5376 1 0 0 0 0 0 0 0
61 0 1  3 3 6 1  31288 2 4 0 0 0 0 0 0
48 0 1  2 3 1 1  13333 2 3 0 0 0 0 0 0
58 0 1  1 3 3 1  35613 1 4 2 0 0 0 0 0
82 0 0  2 1 3 1  21410 1 5 0 0 0 0 0 0
61 0 0 14 1 1 4  56138 4 4 0 0 0 0 0 0
51 0 1  2 2 3 2  49494 3 3 1 0 0 0 0 0
69 0 0  1 1 3 1  27115 4 5 0 0 0 0 0 0
82 0 0 10 1 1 3  60281 4 5 4 0 0 0 0 0
24 0 1  2 2 4 .   5065 1 1 0 0 0 0 0 0
 5 0 0  2 3 6 6   4946 1 0 0 0 0 0 0 0
90 0 1  4 1 4 1  31494 3 5 0 0 0 0 0 0
21 0 0 26 2 1 3  36954 1 1 0 0 0 0 0 0
62 0 1  2 2 2 1   6969 3 4 2 0 0 0 0 0
32 0 0 19 6 2 1 448187 3 1 0 0 0 0 0 0
71 0 0  4 1 6 1   7872 3 5 1 0 0 0 0 0
61 0 0  1 1 3 1   8336 1 4 1 0 0 0 0 0
60 0 1  6 3 3 1  36042 2 4 0 0 0 0 0 0
18 0 1  1 2 2 1   8853 3 1 0 0 0 0 0 0
60 0 1  2 . 2 1  33844 3 4 0 0 0 0 0 0
10 0 1  1 2 1 2  35487 1 0 1 0 0 0 0 0
57 0 0  2 3 3 1  17532 3 4 1 0 0 0 0 0
46 0 1  2 6 5 1  13259 2 3 0 0 0 0 0 0
 0 0 0  2 3 1 2   9198 3 0 0 0 0 0 0 0
78 0 1  1 1 4 1  21978 2 5 1 1 1 0 0 0
59 0 1 16 1 2 1 153665 4 4 4 0 0 0 0 0
22 0 0  3 2 2 3   9125 2 1 0 0 0 0 0 0
37 0 1  3 3 3 1  21362 2 2 0 0 0 0 0 0
74 0 0  1 1 1 3  11087 3 5 2 0 0 0 0 0
30 0 1  2 3 1 2  30757 1 1 0 0 0 0 0 0
64 0 0  2 3 1 1  31235 4 4 0 0 0 0 0 0
86 0 1  4 1 1 1  13188 3 5 0 1 1 0 0 0
50 0 1  3 4 1 3  54408 1 3 4 0 0 0 0 0
82 0 0  3 1 1 1  26950 3 5 2 0 0 0 0 0
29 0 1  3 3 2 1  12157 4 1 0 0 0 0 0 0
60 0 0  2 3 6 1  13036 1 4 1 0 0 0 0 0
59 0 1  2 3 1 .  19777 3 4 3 0 0 0 0 0
61 0 0  2 2 3 3  19634 1 4 2 0 0 0 0 0
44 0 0  3 4 3 1   6550 1 2 1 0 0 0 0 0
72 0 1 19 1 1 1  84500 1 5 2 0 0 0 0 0
74 0 0  2 1 1 2  28648 2 5 3 0 0 0 0 0
40 0 1  2 3 5 .  33212 . 2 0 0 0 0 0 0
 0 0 1  2 2 2 2   1790 2 0 0 0 0 0 0 0
31 0 1  1 3 3 1   8459 4 1 0 0 0 0 0 0
42 0 0  4 3 1 1  17066 4 2 0 0 0 0 0 0
 0 0 1  1 3 4 2   2592 1 0 0 0 0 0 0 0
40 0 1  3 3 2 1  31548 4 2 0 0 0 0 0 0
24 0 0  2 3 4 2   9637 2 1 1 0 0 0 0 0
27 0 0  2 3 1 3  14604 1 1 0 0 0 0 0 0
47 0 0  8 1 5 2  49385 1 3 6 0 0 0 0 0
59 0 0  0 1 6 2   3509 1 4 1 0 0 0 0 0
 0 0 0  2 4 1 1   2585 3 0 0 0 0 0 0 0
40 0 1  0 3 1 2   5863 3 2 0 0 0 0 0 0
64 0 0  9 2 1 6      . 1 4 5 0 0 0 0 0
67 0 0  2 1 1 1  16022 3 5 2 0 0 0 0 0
72 0 0  9 1 3 1  55798 1 5 3 0 0 0 0 0
55 0 0  2 1 3 1  25758 3 4 5 0 0 0 0 0
75 0 0  2 1 3 1  38468 3 5 1 0 0 0 0 0
39 0 1 12 2 1 2  91554 1 2 0 0 0 0 0 0
11 0 0  2 2 1 3  11072 1 0 1 0 0 0 0 0
42 0 1  1 4 1 2  19229 1 2 1 0 0 0 0 0
65 0 1  2 1 3 1  17485 2 5 1 0 0 0 0 0
87 0 0  2 1 3 3  48250 1 5 2 0 0 0 0 0
53 0 0  1 1 6 1   1709 1 3 0 0 0 0 0 0
61 0 0  2 3 5 1  11092 1 4 3 0 0 0 0 0
15 0 1  4 2 3 2   8971 1 0 0 0 0 0 0 0
49 0 1  1 2 1 3  29514 1 3 1 0 0 0 0 0
44 0 0  3 1 3 2   4107 2 2 2 0 0 0 0 0
67 0 0  1 1 2 2  32078 4 5 0 0 0 0 0 0
80 0 1  3 1 2 1  29185 4 5 3 0 0 0 0 0
 0 0 0  2 3 5 1   2786 2 0 0 0 0 0 0 0
89 0 0  2 1 4 1  23633 3 5 9 0 0 0 0 0
22 0 1  2 3 1 .  11977 2 1 0 0 0 0 0 0
66 0 0  1 1 2 1  74956 4 5 0 0 0 0 0 0
 0 0 0  2 2 1 2  18049 1 0 0 0 0 0 0 0
57 0 0  2 3 1 1  12366 3 4 1 0 0 0 0 0
44 0 1  3 3 3 1  22879 1 2 1 0 0 0 0 0
end

↧

95% confidence interval for lroc?

June 25, 2020, 2:56 pm

≫ Next: Simulating a population from aggregate observed data

≪ Previous: case-control groups not responding

Hi,

I am using lroc after different logistic regression models to estimate the area under the ROC curve. I was wondering if there is a way to include/calculate a 95% confidence interval for the AUC...?

Thanks,
Robin

PHP Code:


logistic y c.var1 i.var2 i.var3, base

lroc, nograph

↧

Simulating a population from aggregate observed data

June 25, 2020, 3:42 pm

≫ Next: Multiple plots from different datasets on same graph?

≪ Previous: 95% confidence interval for lroc?

Hi all,

I want to create individual-level data that fulfills the distributions from an observed aggregate data. I have a set of regressions for 10 Y values that I want to simulate, I want to make sure their covariances are taken into account as well. For simplicity, let's say it is two Ys (blood pressure and heart rate) and only a couple of covariates. I first run the regressions, generate a new variable indicating the sample from the regression, get the distribution of the outcome variable for the observed sample and store the mean and standard deviations. I am stuck for what to do with the code with afterwards, do I use the simulate command (Monte Carlo simulation)?

Any help would be greatly appreciated!

Code:

 
reg bloodpressure i.post##i.evertreated delta_bp i.gender i.obs_count i.IMD, vce (cluster IMD) 

gen sample_bp = e(sample) 

*get the distribution of bp for the observed 

sum bloodpressure if sample_bp == 1
gen mean_bp = r(mean) if sample_bp == 1 
gen std_bp = r(sd) if sample_bp == 1 

reg heartrate i.post##i.evertreated delta_hr i.gender i.obs_count i.IMD, vce (cluster IMD) 

gen sample_hr = e(sample) 

*get the distribution of hr for the observed 

sum heartrate if sample_hr == 1
gen mean_hr = r(mean) if sample_hr == 1 
gen std_hr = r(sd) if sample_hr == 1

Any help would be greatly appreciated!

Surya

↧

Multiple plots from different datasets on same graph?

June 25, 2020, 3:58 pm

≫ Next: Time trend in xtunitroot ht test

≪ Previous: Simulating a population from aggregate observed data

Hello Statalist members,

I'd like to use two different datasets for the same graph. I am using Stata 16 for Mac. I am wondering if it is possible to use the feature of frames to add an additional plot to an xtline graph from a different dataset. Or is it possible to add an additional plot to an exisitng graph?

Thank you for your insights!
Brennan

↧

Time trend in xtunitroot ht test

June 25, 2020, 5:44 pm

≫ Next: Dunn Test Bonferroni Correction gives p-value of 1. Error?

≪ Previous: Multiple plots from different datasets on same graph?

Hi everyone,
I run the Harris-Tzavalis unit-root test with time trend as follows:

xtunitroot ht index7, trend demean

Harris-Tzavalis unit-root test for index7
-----------------------------------------
Ho: Panels contain unit roots Number of panels = 3315
Ha: Panels are stationary Number of periods = 24

AR parameter: Common Asymptotics: N -> Infinity
Panel means: Included T Fixed
Time trend: Included Cross-sectional means removed
------------------------------------------------------------------------------
Statistic z p-value
------------------------------------------------------------------------------
rho 0.5518 -50.7588 0.0000
------------------------------------------------------------------------------

My question is: how can I get the estimation of the coefficient that weights time?

Thank you in advance.

↧

Dunn Test Bonferroni Correction gives p-value of 1. Error?

June 25, 2020, 10:50 pm

≫ Next: Dunn Test Median Comparison Error ?

≪ Previous: Time trend in xtunitroot ht test

I did the Dunn test using the following code:

Code:

dunntest rating, by(outcome) ma(bonferroni)

I ended up getting a p-value of 1 for some pairwise comparisons, which I assume is an error since the data is not the same for any of the groups compared. How do I fix this error?

↧

Dunn Test Median Comparison Error ?

June 25, 2020, 11:19 pm

≫ Next: Labelling variable based on existing label of another variable

≪ Previous: Dunn Test Bonferroni Correction gives p-value of 1. Error?

I might be doing this wrong but I named my groups numbers, so "7", "8" and "9" are separate groups. When I used the Dunn Test code, the Col Mean-Row Mean for "Group 8 mean-Group 8 mean" is 0.382100 with a p-value of 1.0000 (shown below). Would would the Dunn Test give me the z-score for a group subtracted by itself?

Dunn's Pairwise Comparison of rating by outcome (Bonferroni)

Code:

Col Mean-

Row Mean 7 8

8 -3.883837 0.382100

0.0019 1.0000

9 -3.829106 0.054730

0.0023 1.0000

↧

Labelling variable based on existing label of another variable

June 26, 2020, 12:32 am

≫ Next: Generate Hourly Time

≪ Previous: Dunn Test Median Comparison Error ?

Hello,

I was wondering if there is a command that enables you to label a variable based on an existing variable label
e.g.
I have a variable X and I use the command:

label variable x "apples"

Now I want to run a loop generating variables from x, but I want to include in that loop to label them as "apples" or whatever the existing label for variable x is.

TIA!

Cheers,
Jess

↧

Generate Hourly Time

June 26, 2020, 12:55 am

≫ Next: "Pre Trends" for DiD with continuous treatment

≪ Previous: Labelling variable based on existing label of another variable

Dear Everyone,

This is a set of zip codes with the date of the policy. How can I generate new vars such that I will have for each zip code an hourly time (an hour and minute: 00:00 to 23:00) for the 3rd of April. Below is a sample of the data set I have.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str85 zipcode str19 speedlimit str10 bdate
" 90425" "30" "03.04.2019"
" 90552" "30" "03.04.2019"
" 90765" "30" "03.04.2019"
" 95445" "30" "03.04.2019"
"63739"   "30" "03.04.2019"
"63739"   "30" "03.04.2019"
"63785"   "30" "03.04.2019"
end

I hope my question is clear. Please let me know if I must explain more.

best regards,
Amir

↧

"Pre Trends" for DiD with continuous treatment

June 26, 2020, 1:23 am

≫ Next: What does cumul [fweight] really do?

≪ Previous: Generate Hourly Time

Dear all,

I have a Difference-in-Differences (DiD) setup in which treatment intensity is clearly continuous rather than binary. I also have data on many pre and many post periods.

In a binary treatment intensity setup, the standard procedure for assessing the plausibility of assuming that absent treatment trends of the two groups would have been parallel seems to be to plot trends for the two groups for several pre-treatment periods ("pre trends analysis"). However, with a continuous treatment the number of treatment groups is not two, but is essentially up to as large as the number of units.

I am trying to figure out what is the best practice to examine the parallel trends assumption in this setup. While I see a lot of material on having more than 2 periods, I have not found good material on having more than 2 treatment levels.

Who knows more about this?

Thanks so much, PM

↧

What does cumul [fweight] really do?

June 26, 2020, 1:37 am

≫ Next: transpose putdocx table (memtable)

≪ Previous: "Pre Trends" for DiD with continuous treatment

I am trying to understand the role of [fweight] in the following command

Code:

cumul [fweight]

https://www.stata.com/help10.cgi?weight says that fweights, or frequency weights, are weights that indicate the number of duplicated observations. Given this explanation, let us consider the following:

Code:

set obs 150
set seed 12345

gen z = rnormal()
gen w = runiformint(1,100)

cumul z [fweight=w], gen(cum_z)

gen zw = z*w
cumul zw, gen(cum_zw)

I would expect that cum_z and cum_zw are identical, or a least very close. But they are very different, as can be seen from the graph attached.

What does the option [fweight] really do?

Your help would be much appreciated.

Stata/IC 16.1 for Mac (64-bit Intel)

Array

↧

transpose putdocx table (memtable)

June 26, 2020, 1:46 am

≫ Next: FMB Newey SE error

≪ Previous: What does cumul [fweight] really do?

Dear Stata-Listener,
i have a question about manipulating a table after putdocx (data), memtable.
I have generated tables with the putdocx command (the code below). Everything is fine. But sometimes the varlist is too long and I have to transpose the table in the word file for better readability. I found the option memtable in putdocx. Is it possible to create a table with putdoxc (memtable), transpose this table and then export it to Word?

Code:

putdocx clear
putdocx begin
putdocx paragraph
collapse (mean) price mpg trunk weight length turn, by( foreign)
putdocx table tbl1 = data(" foreign price mpg trunk weight length turn"), varnames border(start, nil) border(insideV, nil) border(end, nil) memtable
putdocx table tbl1(1,2)=("`: var label price'")
putdocx table tbl1(1,3)=("`: var label mpg'")
putdocx table tbl1(1,4)=("`: var label trunk'")
putdocx table tbl1(1,5)=("`: var label weight'")
putdocx table tbl1(1,6)=("`: var label length'")
putdocx table tbl1(1,7)=("`: var label turn'")

putdocx save myreport.docx, replace

Any suggestions, Thanks Jörg

↧

FMB Newey SE error

June 26, 2020, 1:52 am

≫ Next: ARCH effect after fitting GARCH(1,1)

≪ Previous: transpose putdocx table (memtable)

Hi,

I'm trying to run the asreg dv ivs, fmb newey(int) command on my data, but i get an error message (see attached). The function works with other data I have, so what could be wrong with my data set (see attachment)?

Thank you in advance!

Camilla

↧

ARCH effect after fitting GARCH(1,1)

June 26, 2020, 2:02 am

≫ Next: reshape 4-levels wide format to 2X2 long format

≪ Previous: FMB Newey SE error

I am trying to messure volatility of return on stock index by using GARCH(1,1) but the ARCH-LM test results show that there is ARCH effect in the residuals after fitting GARCH(1,1).
I also tried GARCH(1,2) and GARCH(2,1) but there is still ARCH effect. How can I solve this problem in Stata? Great thanks to any help!

↧

reshape 4-levels wide format to 2X2 long format

June 26, 2020, 3:33 am

≫ Next: Help with Stochastic Frontier Analysis/Translog Cost Function in Stata 15, bc95 model

≪ Previous: ARCH effect after fitting GARCH(1,1)

Hello,

I have designed a 2✖2 within-subject design and get a 4-level repeated measure, how could I reshape the wide format to 2✖2 factor long format.

Thanks.

↧

Help with Stochastic Frontier Analysis/Translog Cost Function in Stata 15, bc95 model

June 26, 2020, 5:45 am

≫ Next: Diff command - Kernel-based Propensity Score Matching DID in the context of RCS data

≪ Previous: reshape 4-levels wide format to 2X2 long format

Dear Stata users,

I am using Stata 15 and I am quite new in using this software. I am working on a profit and cost efficiency analyses of 309 tourists firms for period 2008-2017. I have chosen to apply the Battese & Coelli(1995) model (Translog function). I have gone through the Stata journal by Belotti et al. (2013) as well. I took the ln of some of the inputs and outputs. The variables that I have chosen are: lnTotal Costs and EBIT as dependent variables, respectively for cost and profit efficiency function; independent output variable: LnSales Revenue; independent input variables: lnLabour costs, LnMaterial costs, LnPhysical Capital costs; explanatory variables of inefficiencies: 3*category, 4*category, 5*category and Tourism Specialization. Every time I try and run the model of Cost efficiency I receive these results:

the Code:. sfpanel lTC lSR lpL lpM lpPhC Year, model(bc95) dist(tn) emean(category3 category4 category5 Tourismspecialization ) ort(o)

the Results:
initial: Log likelihood = -7224.8875
Iteration 0: Log likelihood = -7224.8875
Iteration 1: Log likelihood = -6065.1666 (backed up)
Iteration 2: Log likelihood = -5868.4579 (backed up)
Iteration 3: Log likelihood = -5714.9868 (backed up)
Iteration 4: Log likelihood = -5693.4116 (backed up)
Iteration 5: Log likelihood = -5635.5574 (backed up)
Iteration 6: Log likelihood = -5627.7123 (backed up)
Iteration 7: Log likelihood = -5576.0618 (backed up)
Iteration 8: Log likelihood = -5492.4664 (backed up)
Iteration 9: Log likelihood = -5170.6388 (backed up)
Iteration 10: Log likelihood = -4820.927 (backed up)
Iteration 11: Log likelihood = -4654.692 (backed up)
Iteration 12: Log likelihood = -4582.0477
Iteration 13: Log likelihood = -3682.9697
Iteration 14: Log likelihood = -2918.4183
Iteration 15: Log likelihood = -1668.3032
Iteration 16: Log likelihood = -1573.0303
Iteration 17: Log likelihood = -1407.2167
Iteration 18: Log likelihood = -1199.2455
Iteration 19: Log likelihood = -1111.6501
Iteration 20: Log likelihood = -1036.1432
Iteration 21: Log likelihood = -994.09753
Iteration 22: Log likelihood = -961.96257
Iteration 23: Log likelihood = -914.4405
BFGS stepping has contracted, resetting BFGS Hessian
Iteration 24: Log likelihood = -890.15886
Iteration 25: Log likelihood = -890.15698 (backed up)
Iteration 26: Log likelihood = -887.35603 (backed up)
.................................................. ..................................

Iteration 98: Log likelihood = -845.8587
Iteration 99: Log likelihood = -841.51001
Iteration 100: Log likelihood = -841.36681

Inefficiency effects model (truncated-normal) Number of obs = 3370
Group variable: UIC Number of groups = 348
Time variable: Year Obs per group: min = 1
avg = 9.7
max = 10

Prob > chi2 = 0.0000
Log likelihood = -841.3668 Wald chi2(5) = 29977.96

---------------------------------------------------------------------------------------
lTC | Coef. Std. Err. z P>|z| [95% Conf. Interval]
----------------------+----------------------------------------------------------------
Frontier |
lSR | .8265737 .0056998 145.02 0.000 .8154022 .8377451
lpL | .3862649 .0144812 26.67 0.000 .3578823 .4146475
lpM | .3463463 .0080424 43.07 0.000 .3305836 .3621091
lpPhC | .0036081 .0061366 0.59 0.557 -.0084194 .0156357
Year | -.0242052 .0021223 -11.41 0.000 -.0283649 -.0200455
_cons | 49.74526 4.263296 11.67 0.000 41.38935 58.10117
----------------------+----------------------------------------------------------------
Mu |
category3 | -.0477896 .0180303 -2.65 0.008 -.0831284 -.0124509
category4 | -.0969461 .0202058 -4.80 0.000 -.1365486 -.0573435
category5 | -6.000944 .4847824 -12.38 0.000 -6.9511 -5.050788
Tourismspecialization | -.1345739 .2149034 -0.63 0.531 -.5557769 .2866291
_cons | .2507753 .0318595 7.87 0.000 .1883317 .3132188
----------------------+----------------------------------------------------------------
Usigma |
_cons | -5.155248 .2810498 -18.34 0.000 -5.706096 -4.604401
----------------------+----------------------------------------------------------------
Vsigma |
_cons | -2.400803 .0278476 -86.21 0.000 -2.455383 -2.346223
----------------------+----------------------------------------------------------------
sigma_u | .0759542 .0106735 7.12 0.000 .0576683 .1000385
sigma_v | .3010733 .0041921 71.82 0.000 .2929681 .3094028
lambda | .2522782 .0128424 19.64 0.000 .2271076 .2774489
------------------------------------------------------------------------------

Can you explain to me what this message 'BFGS stepping has contracted, resetting BFGS Hessian' means?
Should I do some extra actions with my data or I am using wrong syntax?

When I have trying to run the Profit Efficiency function I received this error message:

the code:. sfpanel EBIT lSR lpL lpM lpPhC Year, model(bc95) dist(tn) emean(category3 category4 category5 Tourismspecialization ) ort(o)

the Results:
initial: Log likelihood = -3.816e+09
Iteration 0: Log likelihood = -3.816e+09
could not calculate numerical derivatives -- flat or discontinuous region encountered
could not calculate numerical derivatives -- flat or discontinuous region encountered

Inefficiency effects model (truncated-normal) Number of obs = 3370
Group variable: UIC Number of groups = 348
Time variable: Year Obs per group: min = 1
avg = 9.7
max = 10

Prob > chi2 = .
Log likelihood = -7.346e+05 Wald chi2(0) = .

---------------------------------------------------------------------------------------
EBIT | Coef. Std. Err. z P>|z| [95% Conf. Interval]
----------------------+----------------------------------------------------------------
Frontier |
lSR | 1086.5 . . . . .
lpL | -840.1726 . . . . .
lpM | -991.0253 . . . . .
lpPhC | -122.3459 . . . . .
Year | 18.04479 . . . . .
_cons | -26225.4 . . . . .
----------------------+----------------------------------------------------------------
Mu |
category3 | 1.016451 . . . . .
category4 | 1.024072 . . . . .
category5 | 1.004471 . . . . .
Tourismspecialization | 1.001218 . . . . .
_cons | 1.047579 . . . . .
----------------------+----------------------------------------------------------------
Usigma |
_cons | 22.82486 . . . . .
----------------------+----------------------------------------------------------------
Vsigma |
_cons | 434.1319 . . . . .
----------------------+----------------------------------------------------------------
sigma_u | 90438.91 . . . . .
sigma_v | 1.86e+94 . . . . .
lambda | 4.85e-90 . . . . .
------------------------------------------------------------------------------

I can not understand what this message 'could not calculate numerical derivatives -- flat or discontinuous region encountered' means.
The dependent variables is EBIT and it can take negative value that is why I do not generate it into ln forms. Do you think that the problem is the negative values of some units or something else?

I will be so grateful if anyone can help me.
Thank you in advance.

↧