panel data with three variables

February 1, 2020, 1:13 am

≫ Next: I am working on systematic review and meta-analysis on clinical data( i.e. Antimicrobial resistance).

≪ Previous: Is ROC curve for 3x3 table possible?

Hi everybody.

I have a panel data with three variables: year, country, product. I want to run a logit model so first of all I have to set my data. As I am a beginner, I don't know how I can set my pannel data with three variables.

↧

I am working on systematic review and meta-analysis on clinical data( i.e. Antimicrobial resistance).

February 1, 2020, 1:18 am

≫ Next: window fopen DIRECTORY

≪ Previous: panel data with three variables

I have already completed the manuscript. The primary outcome of the article is to describe the pooled estimate of the different clinical data describing antimicrobial resistance in Ethiopia.
Unfortunately, my attempt to the "Metaprop" estimate using the dialogue box is erroneous and finally, the manuscript is rejected.
I need special support on how to perform "Metaprop" using the dialogue box so that to depict using the forest plot.

thanks, to your assistance!

↧

window fopen DIRECTORY

February 1, 2020, 1:44 am

≫ Next: MIMIC (SEM) Models with panel (longitudinal) data

≪ Previous: I am working on systematic review and meta-analysis on clinical data( i.e. Antimicrobial resistance).

Stata/MP 16.0 for Windows (64-bit x86-64) Revision 08 Jan 2020
Microsoft Windows [Version 10.0.17763.973]

When window fopen is used in a program, it seems the window fopen is exexuted in the directory of the ado and not in the current directory. (have I missed some obvious options etc?)

Code:

prog define fopen , nclass

window fopen macroname "title" "*.*"

end

The above will "list" the files from the directory of the ado, not from the current directory.

↧

MIMIC (SEM) Models with panel (longitudinal) data

February 1, 2020, 1:55 am

≫ Next: Replace some observation for String variable

≪ Previous: window fopen DIRECTORY

IS it possible to estimate a MIMIC (SEM) model for a panel of countries and years? Could I replicate using STATA theTable 6 of this paper: Dybka, P., Kowalczuk, M., Olesiński, B., Torój, A., & Rozkrut, M. (2019). Currency demand and MIMIC models: towards a structured hybrid method of measuring the shadow economy. International Tax and Public Finance, 26(1), 4-40.
Thanks

I

↧

Replace some observation for String variable

February 1, 2020, 1:59 am

≫ Next: Phillips and Sul technique

≪ Previous: MIMIC (SEM) Models with panel (longitudinal) data

Hello,
Assume I have two variables Region and Oceania with large observation. Example:

Region	Oceania
North America	0
Asia	0
Europe	0
Australia	1
New Zealand	1

I'd like to change the "Australia" and "New Zealand" to "Oceania" by using this command.:
replace Region = "Oceania" if Oceania = 1.

But this doesn't work. Can somebody help me out from this problem?
Thank You

↧

Phillips and Sul technique

February 1, 2020, 4:34 am

≫ Next: Cluster standard error for random effect logit model - without vce(bootstrap)?

≪ Previous: Replace some observation for String variable

Dear all,

I am using the Phillips and Sul (2007) technique for convergence test and club identification for my PhD research. My problem is that sometimes, for one of the clubs identified (with the psecta and default options), the TStat in the output table is below the threshold value of -1.65, whereas if I use the adjusted method (Schnurbus, 2017) with the same dataset, therefore including the 'adjust' command specification, the results are ok. On the contrary, when using a different dataset, the same problem (with the TStat value below -1.65 for one of the clubs) happens when using the adjusted method, while the other method is ok.

How can I handle the issue? Why does it happen? Should I use some specific options or just discard the anomalous clustering obtained and choose the alternative method? How can I motivate this in my PhD research?

Thank you very much in advance for your reply, I really hope you can help me!

↧

Cluster standard error for random effect logit model - without vce(bootstrap)?

February 1, 2020, 5:22 am

≫ Next: How to resolve numeric overflow while performing xtset,fe in stata?

≪ Previous: Phillips and Sul technique

[COLOR=rgba(0, 0, 0, 0.87)]Hello everyone,
I have an issue with Stata and I would be grateful for your support. [/COLOR]

[COLOR=rgba(0, 0, 0, 0.87)]I'm working with an unbalanced penal data and use the "random effect logit model".
By that I mean, I'm using the following command:
xtlogit dep_var indep_var, re vce(bootstrap, rep(50) bca)

My issue is that with the vce(bootstrap) command, Stata needs forever to give me some output. Is there maybe another way to get clusteres standard errors for this -xtlogit, re command.

Thank you in advance.

Best regards,
Yasemin [/COLOR]

↧

How to resolve numeric overflow while performing xtset,fe in stata?

February 1, 2020, 6:36 am

≫ Next: i. vs c.

≪ Previous: Cluster standard error for random effect logit model - without vce(bootstrap)?

Dear all,
I am getting error r(1400): combinations results in numeric overflow; computations cannot proceed, while performing xtlogit, fe in stata with 5738 observations (about 1900 individuals X 3 rounds).
Please consider the following sample data set for this purpose

Code:

 input str3 ID byte str1 round byte str1 hi byte str1 acc byte str1 inf byte str1 shock

            ID      round         hi        acc        inf      shock
  1. IN1 1 1 0 1 1
  2. IN1 2 1 1  1 1
  3. IN1 3 0 0 1 1
  4. IN2 1 1 1 0 1
  5. IN2 2 0 0 1 0
  6. IN2 3 1 0 0 0
  7. end

. list

     +--------------------------------------+
     |  ID   round   hi   acc   inf   shock |
     |--------------------------------------|
  1. | IN1       1    1     0     1       1 |
  2. | IN1       2    1     1     1       1 |
  3. | IN1       3    0     0     1       1 |
  4. | IN2       1    1     1     0       1 |
  5. | IN2       2    0     0     1       0 |
     |--------------------------------------|
  6. | IN2       3    1     0     0       0 |
     +--------------------------------------+

I set up the panel as follows:

Code:

encode ID, gen(ID1)
drop ID
rename ID1 ID
xtset round ID

however when I peformed

Code:

xtlogit hi inf shock, fe

I got the following

Code:

1,913 (group size) take 1,640 (# positives) combinations results in numeric overflow; computations cannot proceed r(1400)

from the original data set

The same regression with

Code:

xtlogit acc inf shock, fe

returned the regression results in my original data set.

I am confused as to why with only 5738 observations I'm getting numeric overflow. Also, please suggest a way to resolve this problem.

Thanks and Regards

↧

i. vs c.

February 1, 2020, 7:03 am

≫ Next: ml maximize, technique(bhhh): option technique() not allowed

≪ Previous: How to resolve numeric overflow while performing xtset,fe in stata?

Could someone explain me what is the difference between i.variable and c.variable?

↧

ml maximize, technique(bhhh): option technique() not allowed

February 1, 2020, 7:36 am

≫ Next: Different output from estimates table when using stored estimates

≪ Previous: i. vs c.

Hello,

I have some problems using the maximum likelihood command of Stata to estimate a probit model.
Here is a simplified example of my problem: I am interested in estimating the effect of past experienced stock market returns of households on their stock market participation (controlling for other household characteristics).

This is my likelihood function:
-----------------------------------------------------------------------------------------------------
capture program drop nlprobitlf_stock
program nlprobitlf_stock
version 11
args lnfj xb b k
tempvar xnl

global lambda = `k'

rmatcalc_stock

quietly generate double `xnl' = w
drop w

quietly replace `lnfj' = lnnormal(`xb'+`b'*`xnl') if ($ML_y1 == 1)
quietly replace `lnfj' = lnnormal(-`xb'-`b'*`xnl') if ($ML_y1 == 0)

end
----------------------------------------------------------------------------------------------------
where rmatcalc_stock is a function that calculates the weighted sum of the stock returns a household experienced during its lifetime. The weights depend on k, which should also be estimated within the probit model.

This is my code for the optimization problem:
--------------------------------------------------------------------------------------------------------------------------------
ml model lf nlprobitlf_stock (`dependent_var' = `controls') /b /k [pweight = wgt]
ml search /// search initial value
ml maximize, difficult technique (bhhh) nonrtolerance
---------------------------------------------------------------------------------------------------------------------------------
If I run this, I get the following error message:

initial: log pseudolikelihood = -2.388e+08
rescale: log pseudolikelihood = -3395333.6
rescale eq: log pseudolikelihood = -3385030.8
Iteration 0: log pseudolikelihood = -3385030.8
Iteration 1: log pseudolikelihood = -2737459.7 (not concave)
Iteration 2: log pseudolikelihood = -2685007.4
Iteration 3: log pseudolikelihood = -2681058.8 (not concave)
Iteration 4: log pseudolikelihood = -2680772 (not concave)
Iteration 5: log pseudolikelihood = -2680771.5
Iteration 6: log pseudolikelihood = -2680524.6
Iteration 7: log pseudolikelihood = -2680520.2
Iteration 8: log pseudolikelihood = -2680520.2 (backed up)
option technique() not allowed

Does anyone know where this error may come from? Thank you very much in advance!

↧

Different output from estimates table when using stored estimates

February 1, 2020, 8:09 am

≫ Next: How to save my y and residual errors from a regression loop

≪ Previous: ml maximize, technique(bhhh): option technique() not allowed

How do I preserve value labels when running estimates table with stored estimates?

If I run estimates table immediately after I fit a model, the output shows the correct value labels; however, if I store the estimate, fit another model, and then try to use estimates table with the stored estimate corresponding to the first model, the value labels are lost.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input double ped_slight_inj float season byte weather_conditions
0 2 1
0 2 1
0 2 1
0 2 1
0 2 1
0 2 1
0 2 1
0 2 1
0 2 1
0 2 1
0 1 1
1 1 1
0 1 2
0 1 1
0 1 1
0 1 1
0 1 2
0 1 1
0 1 1
0 1 1
end
label values season season
label def season 1 "Winter", modify
label def season 2 "Spring", modify
label values weather_conditions weather_conditions
label def weather_conditions 1 "Fine", modify
label def weather_conditions 2 "Raining", modify

// First regression
qui logit ped_slight_inj i.season

// Store first regression
est store season

// First estimates table
estimates table season

// Second regression
qui logit ped_slight_inj i.weather_conditions

// Second estimates table from stored result of first regression
estimates table season

↧

How to save my y and residual errors from a regression loop

February 1, 2020, 8:22 am

≫ Next: Calculating the Employment-Weighted Mean Differential

≪ Previous: Different output from estimates table when using stored estimates

Could you tell me why I cannot save my y or residuals here? Everything working except last and I dont know how to store y.
In addition the looping over values is not performing, it only does the 10 obs specified in local mc = 10.

Thanks much!
clear local mc = 10

set obs `mc'

g data_store_x3 =.

g data_store_x2 =.

g data_store_con= .

g data_store_y =.

quietly{
forvalues i = 1(1) `mc' {
if floor((`i'-1)/100) == ((`i'-1)/100) {
noisily display "Working on `i' out of `mc' at $S_TIME"
}

preserve
clear
set obs 10
g x2 = rnormal()
g x3 = rnormal()
g e = runiform()
g y = 1 -3*x2 + 2*x3 + e
reg y x2 x3
local x2coeff = _b[x2]
local x3coeff = _b[x3]
local const = _b[_cons]
restore
replace data_store_x3 = `x3coeff' in `i'
replace data_store_x2 = `x2coeff' in `i'
replace data_store_con = `const' in `i'
}
}
summ data_store_con data_store_x2 data_store_x3 data_store_y
display e(rmse)
predict res, resid

↧

Calculating the Employment-Weighted Mean Differential

February 1, 2020, 8:29 am

≫ Next: Difference between using l1.var and previously generated lagged variable

≪ Previous: How to save my y and residual errors from a regression loop

I am currently investigating wage differentials by industry in Germany. My dataset gives me cross-sectional data per individual, with information such as industry, wage, etc. To find the uncontrolled wage differentials per industry, I ran the following command:

reg lnwage i.industry

Stata then returns coefficients for each industry (which represent the wage differential of that industry). I now wish to calculate the employment-weighted mean differential. I obtain the number of employed people in each industry with the following command:

tab industry

Now, I want to weight each coefficient obtained from the regression by its frequency, to get the weighted average differential across all industries. For example, if the industry "Farming" has a coefficient of 0.4 and employs 30% of individuals, while "Mining" has a coefficient of -0.2 and employs 70% of the individuals, then the employment-weighted mean differential is (0.4 * 0.3) + (-0.2 * 0.7) = -0.02

Ideally, I then want to present the difference between each industry's coefficient and the employment-weighted mean differential in a table.

Does anybody have an idea how I can execute this?

↧

Difference between using l1.var and previously generated lagged variable

February 1, 2020, 8:43 am

≫ Next: keeping people with the longest duration

≪ Previous: Calculating the Employment-Weighted Mean Differential

Dear Stata users,

I am using Stata 16 on Windows 10 and I'm working on a quarterly dataset of over 10,000 companies.

Code:

xtset
       panel variable:  gvkey (unbalanced)
        time variable:  fyearq_, 1996q2 to 2008q2, but with gaps
                delta:  1 quarter

I was looking at a variable for the average assets of a company at a given quarter of a year and I noticed something strange. For my work the variable has to be created like this:
' Average assets = ((Total assets) + (lagged Total assets)) / 2 '. The strange thing that occured is that the variable "Average assets" differs if I use l1.[Total assets] instead of a previously generated variable for "lagged Total assets". I provide sample data and the code I used. I will explain at the end why I didn't create new variable names that are straightforward.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input double gvkey float fyearq_ double atq2 float(t2_atq_L1 t2_avg_assets t3_avg_assets)
1004 146 449.645       .        .        .
1004 147  468.55 449.645 459.0975 459.0975
1004 148 523.852  468.55  496.201  496.201
1004 149 529.584 523.852  526.718  526.718
1004 150 542.819 529.584 536.2015 536.2015
1004 151 587.136 542.819 564.9775 564.9775
1004 152 662.345 587.136 624.7405 624.7405
1004 153 670.559 662.345  666.452  666.452
1004 154 707.695 670.559  689.127  689.127
1004 155 737.416 707.695 722.5555 722.5555
1004 156 708.218 737.416  722.817  722.817
1004 157  726.63 708.218  717.424  717.424
1004 158 718.913  726.63 722.7715 722.7715
1004 159 747.043 718.913  732.978  732.978
1004 160 753.755 747.043  750.399  750.399
1004 161 740.998 753.755 747.3765 747.3765
1004 162 747.543 740.998 744.2705 744.2705
1004 163 772.941 747.543  760.242  760.242
1004 164 754.718 772.941 763.8295 763.8295
1004 165 701.854 754.718  728.286  728.286
1004 166 758.503 701.854 730.1785 730.1785
1004 167 714.208 758.503 736.3555 736.3555
1004 168 690.681 714.208 702.4445 702.4445
1004 169 710.199 690.681   700.44   700.44
1004 170 722.944 710.199 716.5715 716.5715
1004 171 727.776 722.944   725.36   725.36
1004 172 723.019 727.776 725.3975 725.3975
1004 173 686.621 723.019   704.82   704.82
1004 174 676.345 686.621  681.483  681.483
1004 175 666.178 676.345 671.2615 671.2615
end
format %tq fyearq_

Now to really explain the issue, here is the code I used and the output. The variable for "Total assets" is atq2

Code:

gen t2_avg_assets=((atq2)+(l1.atq2))/2
(15,545 missing values generated)

. gen t2_atq_L1 = l1.atq2
(14,933 missing values generated)

. gen t3_avg_assets=((atq2)+(t2_atq_L1))/2
(15,545 missing values generated)

. * t2_avg_assets and t3_avg_assets should be same, but they aren't:

. compare t2_avg_assets t3_avg_assets

                                        ---------- difference ----------
                            count       minimum      average     maximum
------------------------------------------------------------------------
t2_avg_~s<t3_avg_~s         14814     -.0078125    -.0000578   -2.33e-10
t2_avg_~s=t3_avg_~s        217381
t2_avg_~s>t3_avg_~s         14735      2.33e-10     .0000563    .0039063
                       ----------
jointly defined            246930     -.0078125    -1.06e-07    .0039063
jointly missing             15545
                       ----------
total                      262475

At first I create the 'Average assets' variable by using the lag operator L. Then I create a one-lagged variable for atq2 by using the lag operator L. Then I create again a 'Average assets' variable but instead of using the lag operator L I am using the lagged variable for which I used the lag operator L. To me the variables created should be identical but using the compare command shows that they aren't. So my question is: How are these two 'Average assets' variables not identical?

In preparation for this post I created variables with easier to understand names. But by doing this another question emerged.

Code:

gen assetstotalqtly = atq2
(741 missing values generated)

. gen assetstotalqtly_L1 = l1.assetstotalqtly
(14,933 missing values generated)

. gen averageassets = ((assetstotalqtly)+(assetstotalqtly_L1))/2
(15,545 missing values generated)

. gen test_averageassets = ((assetstotalqtly)+(l1.assetstotalqtly))/2
(15,545 missing values generated)

. compare averageassets test_averageassets

                                        ---------- difference ----------
                            count       minimum      average     maximum
------------------------------------------------------------------------
average~s=test_av~s        246930
                       ----------
jointly defined            246930             0            0           0
jointly missing             15545
                       ----------
total                      262475

compare assetstotalqtly atq2

                                        ---------- difference ----------
                            count       minimum      average     maximum
------------------------------------------------------------------------
assetst~y<atq2             123741       -.00625    -.0000155   -5.96e-11
assetst~y=atq2              12729
assetst~y>atq2             125264      2.61e-11     .0000156      .00625
                       ----------
jointly defined            261734       -.00625     1.46e-07      .00625
jointly missing               741
                       ----------
total                      262475

How are assetstotalqtly and atq2 not identical when I created the first by telling Stata it is equal to the latter? And why doesn't the issue described above occure?

I hope I described everything well enough, if not feel free to let me know. Thank you in advance!

↧

keeping people with the longest duration

February 1, 2020, 12:18 pm

≫ Next: Urgent STATA query please help

≪ Previous: Difference between using l1.var and previously generated lagged variable

Hi all,

I am working with some data and I am trying to keep people with the longest duration from 3 groups (dating, cohabiting, and married). I am only trying to keep people with the longest relationships.

↧

Urgent STATA query please help

February 1, 2020, 12:55 pm

≫ Next: Adding the scores of imputed group variable

≪ Previous: keeping people with the longest duration

I am a complete noob at STATA and am trying to do a subgroup meta analysis but it won't let me create them or maybe I am doing it wrong. I am also struggling with creating a funnel plot for the data. The data I have is continuous and is comparing between the control and intervention. Please help, I have a deadline for monday morning, thanks in advance!

↧

Adding the scores of imputed group variable

February 1, 2020, 1:53 pm

≫ Next: Taking more time in loop (foreach/forvalues)?

≪ Previous: Urgent STATA query please help

Hello Statalist,

I have a multiply imputed dataset that looks like this

Country A B C _mi_m
1 2 2 1 0
1 2 2 1 0
1 2 2 1 0
3 5 3 5 0
3 5 3 5 0
3 5 3 5 0
1 8 8 8 1
1 8 8 8 1
1 8 8 8 1
3 4 4 4 1
3 4 4 4 1
3 4 4 4 1
(note that the real dataset has more per Country observations and number of imputations. What is given above is just to show the format)

I'd like to generate a new variable X by adding the value of C in country1, country3, and etc together. According to the example above, I basically would like to have X=1+5 for the dataset _mi_m=0 and X=1+1 for the dataset _mi_m=1. Please how can I achieve this?
I use Stata 16.0

Best
Ikenna Egwu

↧

Taking more time in loop (foreach/forvalues)?

February 1, 2020, 4:06 pm

≫ Next: How do I do two j for reshape command? Use reshape twice?

≪ Previous: Adding the scores of imputed group variable

Dear All, I notice that if I run 100 regressions (using loop), and save some statistics, every regression takes, say, 0.1 second. However, when I run 30,000 regressions, the average time for additional regression is becoming longer and longer. Does anyone know why this is happening (because of memory?)? Thanks.

↧

How do I do two j for reshape command? Use reshape twice?

February 1, 2020, 5:50 pm

≫ Next: How to match two datasets with constraints?

≪ Previous: Taking more time in loop (foreach/forvalues)?

Hello,
I have a wide data set that contains variables:
PUBID (individuals ID)
startdate__njob_year

Ideally, I would want to have two j (year Njob)
so that the data could look like:
Array

what should I do? My professor suggests that I could do reshape twice.
Thanks in advance!

↧

How to match two datasets with constraints?

February 1, 2020, 7:23 pm

≫ Next: Formatting Several Variables using a Loop

≪ Previous: How do I do two j for reshape command? Use reshape twice?

Hello,

This is my first time posting on this forum so thank you to everyone in advance.

The simplest way for me to describe my problem is as follows: I have two datasets. The first dataset consists of a list of suppliers and their capacities. It looks like

Code:

<supplierid> <capacity>
A 20
B 30
C 10
D 15

The second dataset consists of buyers, how much they want to buy, and who they want to buy from (preferences).

Code:

<buyerid> <quantity> <preference> <supplierid>
1 15 1 A
1 15 2 C
2 10 1 A
2 10 2 C
3 20 1 B
3 20 2 A

To explain a little bit further, each buyer wants to buy a fixed amount of quantity. Each buyer also has a preference for who they want to buy from (in general buyers are not willing to buy from all suppliers). A buyer must buy the entire quantity from one supplier i.e. it is not possible for a buyer to buy 1 unit from supplier A and 1 unit from supplier C. Supplier don't have preferences and don't care who they sell to.

What I want to do is make my way down the list of buyers and assign a supplier to each buyer (to the extent that I can). So for example, buyer 1 will buy 15 units from supplier A. Buyer 2 will buy 10 units from supplier C (since supplier A won't have 10 units of capacity left after buyer 1 has purchased 15 units), and buyer 3 will buy 20 units from supplier B. I want my final dataset to look as follows:

Code:

<buyerid> <quantity> <sellerid>
1 15 A
2 10 C
3 20 B

Total capacity is much lower than total quantity demanded so, at the end, most buyers will not be able to buy anything.

Any help will be greatly appreciated.

Thanks!

↧

Latest Images