Replacing string using regexm/regexs

March 24, 2020, 11:51 am

≫ Next: Is the percentage significantly higher than another?

≪ Previous: Custom labels in a coefplot.

Hi,

My data consists of a list of viral mutations, separated by a comma. Here is some dummy data:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str33 nrti
"K65R,Y115F,M184V"           
"D67N,K70R,M184MV,K219E"      
"D67N,K70E,M184V"            
"D67DN,K70R,M184V,T215I,K219E"
"D67DN,K70E,M184V,K219KR"    
"K70Q,M184V"                 
"M184V"                      
end

The format for each mutation (separated by a comma, no space), should be capital string, 1-3 numbers, followed by one capital string (eg, K65R). However, sometimes there are two string characters at the end (eg, K65KR). I want to replace this so that the first of the two string characters at the end is removed (eg, K65KR -> K65R).

I am trying to achieve this using the regexm/regexs string functions. I can identify the issue using regexr to replace the errors with a different text (repeating the code to identify cases where there are more than one problem mutation in a cell).

Code:

gen dup = nrti
replace dup = regexr(dup, "[A-Z][0-9]+[A-Z][A-Z]","issue")

But this isn't exactly what I want to do. I am trying various iterations using regexs but can't quite seem to get there. Does anyone have any advice on how I could achieve this?

I really appreciate your any help on this.

Bryony

↧

Is the percentage significantly higher than another?

March 24, 2020, 12:47 pm

≫ Next: Graph: label x-axis (years) from "2001" to "2001/2002"

≪ Previous: Replacing string using regexm/regexs

when I want to compare the difference of two percentages, I input the value in the code:
prtesti 1196 273 1549 313, count

the result shows as below:
Two-sample test of proportions x: Number of obs = 1196
y: Number of obs = 1549
------------------------------------------------------------------------------
| Mean Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x | .2282609 .0121363 .2044742 .2520475
y | .2020658 .0102024 .1820694 .2220623
-------------+----------------------------------------------------------------
diff | .026195 .0158549 -.0048801 .0572701
| under Ho: .0157729 1.66 0.097
------------------------------------------------------------------------------
diff = prop(x) - prop(y) z = 1.6608
Ho: diff = 0

Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
Pr(Z < z) = 0.9516 Pr(|Z| > |z|) = 0.0968 Pr(Z > z) = 0.0484

P value is 0.097, >0.05, which means we don't reject Ho. That means there's no difference between two percentages.
But Pr(Z > z) = 0.0484<0.05, which seems that we reject Ho. That means percentage of x is higher than percentage of y.

Please help correct it! Thanks

↧

Graph: label x-axis (years) from "2001" to "2001/2002"

March 24, 2020, 1:37 pm

≫ Next: Covid19 ICUs prediction

≪ Previous: Is the percentage significantly higher than another?

Dear all:

I am writing to ask a very quick question for the Stata graphs. As you could see from the simple code below, I would like to draw a variable against the year (x-axis). However, the year is actually refer to a "year combo". For example, "2001" is "2001-2001". I am wondering how I could label the axis as "2001/2002".

Code:

clear
set obs 10
gen year = _n + 2000
gen var   = _n
twoway scatter var year, msize(tiny)

Thank you, and I look forward to hearing from you!

Best,
Long

↧

Covid19 ICUs prediction

March 24, 2020, 2:15 pm

≫ Next: Converting string date to date

≪ Previous: Graph: label x-axis (years) from "2001" to "2001/2002"

Dear all
I am trying to help a local hospital in Italy in predicting the number of intensive care units (ICUs) which will be required in the next few days. I know the number of admitted patients but also the number of positives (including those not admitted at hospital but quarantined at home). I first tried a linear regression of ICUs as a function of admitted patients (nr of observation n=28) (p<0.0001 and R-square=0.997) then a quadratic polynomial regression of ICUs as a function of total positives (again p<0.0001 and R-square=0.998). Similarly, I did a regression of ICUs vs day_number (1...28). Now I assume I can try to predict ICUs. I did it for the sample (n=1...28), but I do not know how to do it for the next few days (29, 30...) and for more admitted patients. Any suggestions? I attach here my data. Thanks

↧

Converting string date to date

March 24, 2020, 2:23 pm

≫ Next: Creating dummies for whether value of a variable for a given obs has appeared before in any of multiple variables

≪ Previous: Covid19 ICUs prediction

Dear stata people,

I am currently trying to merge two datasets. We'll call dataset one: X and dataset two: Y. The problem is occurring in X. The way I am trying to match each row and merge the two datasets is by using my date variable from each to correctly place the right rows together. The only problem is in X the date variable is a string when it should be a float/date. This is what one observation looks like for date variable in X: 22/08/14. The format is right, but stata reads as string. I've scoured the internet trying to find a way to destring this and format as a float but can't seem to find the right solution. When I tried to just destring the observation, I get the error: "date: contains nonnumeric characters; no generate". I also tried this: generate date2 = date(date, "DMY") but get zero observations from that code. I then changed my approach and so far this is the best I've come up with to try and reformat the observation as a float/date:
encode date, gen(date2)
recast float date2
format %tdDD/NN/YY date2
The only problem with this is that now stata, for some reason, reads 22/08/14 as 12feb1960 when I need it to read it correctly as 22aug2014.

I guess my underlying question is, is there a way to format 22/08/14 as a float/date?

Thank you in advance for the help.

↧

Creating dummies for whether value of a variable for a given obs has appeared before in any of multiple variables

March 24, 2020, 2:43 pm

≫ Next: Creating dummies for whether value of a variable for a given obs has appeared before in any of multiple variables

≪ Previous: Converting string date to date

Hi there,
I have a dataset on patents and their respective subclasses. Over a large number of inventors and years, my sample totals nearly 4 million patents. Each of them can have various subclasses.
I want to identify with a dummy variable when a patent subclass first appears in an inventor's portfolio.
Here is an example with two inventors. Each row corresponds to a different patent. The data has been sorted by date (which is in Stata format), showing the first 3 subclasses for each patent.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str11 inventor_id str8 patent_id float(date_stata class_subclass1 class_subclass2 class_subclass3 wanted_1 wanted_2 wanted_3)
"3930300-1" "4255207"  7038 132.0609 132.1129 132.0469 1 1 1
"3930300-1" "4272753"  7230 193.0214        .        . 1 . .
"3930300-1" "4255209"  7294 132.0541 132.0609 132.0529 1 0 1
"3930300-1" "4670769"  7602 132.0469 132.1401  132.047 0 1 1
"3930300-1" "4900689" 10569 132.0541 132.0609 132.1393 0 0 1
"3930309-1" "4211003"  6781 166.0079 104.0072 443.0042 1 1 1
"3930309-1" "D257612"  6797 448.0214        .        . 1 . .
"3930309-1" "4203167"  6909 119.0399  81.0251  81.0435 1 1 1
"3930309-1" "4265017"  7189 166.0079        .        . 0 . .
"3930309-1" "4442559"  7853  166.007 385.0057        . 1 1 .
"3930309-1" "4389775"  7873  166.007 119.0399 448.0214 0 0 0
end

The last three variables are the ones I am trying to create.
For example, in the last observation (i.e. most recent patent for the second inventor) the three dummy variables take the value of 0 because:
- for dummy wanted_1: subclass 166.007 showed up in the inventor's previous patent under class_subclass1
- for dummy wanted_2: subclass 119.0399 showed up in the inventor's third patent (4203167) under class_subclass1
- for dummy wanted_3: subclass 448.0214 showed up in the inventor's second patent as its only subclass.

The complexity here is due to the fact that the code needs to search over the three variables in all of the inventor's previous patents, then assign 0 or 1 in a specific cell.
I have tried running all sorts of loops and using countmatch/rangestat, to no avail.

Thanks for your help

↧

Creating dummies for whether value of a variable for a given obs has appeared before in any of multiple variables

March 24, 2020, 2:50 pm

≫ Next: calculating cronbach's a in a batch

≪ Previous: Creating dummies for whether value of a variable for a given obs has appeared before in any of multiple variables

Hi there,
I have a dataset on patents and their respective subclasses. Over a large number of inventors and years, my sample totals nearly 4 million patents. Each of them can have various subclasses - usually 5-6, but it can reach up to over 200.
I want to identify with a dummy variable when a patent subclass first appears in an inventor's portfolio.
Here is an example with two inventors. Each row corresponds to a different patent. The data has been sorted by date (which is in Stata format), showing the first 3 subclasses for each patent.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str11 inventor_id str8 patent_id float(date_stata class_subclass1 class_subclass2 class_subclass3 wanted_1 wanted_2 wanted_3)
"3930300-1" "4255207"  7038 132.0609 132.1129 132.0469 1 1 1
"3930300-1" "4272753"  7230 193.0214        .        . 1 . .
"3930300-1" "4255209"  7294 132.0541 132.0609 132.0529 1 0 1
"3930300-1" "4670769"  7602 132.0469 132.1401  132.047 0 1 1
"3930300-1" "4900689" 10569 132.0541 132.0609 132.1393 0 0 1
"3930309-1" "4211003"  6781 166.0079 104.0072 443.0042 1 1 1
"3930309-1" "D257612"  6797 448.0214        .        . 1 . .
"3930309-1" "4203167"  6909 119.0399  81.0251  81.0435 1 1 1
"3930309-1" "4265017"  7189 166.0079        .        . 0 . .
"3930309-1" "4442559"  7853  166.007 385.0057        . 1 1 .
"3930309-1" "4389775"  7873  166.007 119.0399 448.0214 0 0 0
end

↧

calculating cronbach's a in a batch

March 24, 2020, 3:27 pm

≫ Next: Multiple ARMA model orders

≪ Previous: Creating dummies for whether value of a variable for a given obs has appeared before in any of multiple variables

I have a variable, whcih contains the combination of items, such the item1 item2 and item3 in the first observation, and item1 item3 in the second observation.
How to calculate the cronbach's a in a batch? How to write the loop syntax.
Thanks!

↧

Multiple ARMA model orders

March 24, 2020, 4:38 pm

≫ Next: Issues implementing a multivariate tobit model of 7 equations

≪ Previous: calculating cronbach's a in a batch

I apologize if this has already been answered, however, I either can't find the answer or I've seen it and don't understand it. I need to estimate an ARMA model order from (0,0) to (5,5). Is there a command that would do this range or do I have to individually type (0,0,0) all the way to (5,0,5) This is the code I have now with just this order:

Code:

 arima (variable), arima(0,0,0)

Thank you.

↧

Issues implementing a multivariate tobit model of 7 equations

March 24, 2020, 5:34 pm

≫ Next: multiplying each value from a long format variable by corresponding value from wide format variables

≪ Previous: Multiple ARMA model orders

Has anyone ever implemented successfully a mvtobit? I read though all the questions about mvtobit, but I still can't figure out how to solve my issues.

I am running an mvtobit of 7 equations: 7 DV and several IVs. The IVs are almost the same for each equation.

Here is what I got from running my syntax:

Maximum # of censored equations is 7

Draws (Halton/pseudo random) are being made:

Created 32 Shuffled Randomized Halton draws per equation for 7 dimensions. Number of initial draws dropped per
dimension = 0 . Primes used:

2 3 5 7 11 13 17

After this, the model runs for ever....
I tried to provide starting values as suggested in the help. I used:

matrix m = (value)
matrix colnames m = atrho12:_cons

and included atrho0(m) in the options when writing the syntax.

Any help?

↧

multiplying each value from a long format variable by corresponding value from wide format variables

March 24, 2020, 5:49 pm

≫ Next: Laptop Configuration for Big Data

≪ Previous: Issues implementing a multivariate tobit model of 7 equations

Hello,

I have a question that is a bit complicated for me to solve.
First of all, here's the example data set.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(hotel_id time price booking_1 booking_2 booking_3 booking_4 booking_5 booking_6 booking_7 booking_8 booking_9 booking_10)
1  1  25 1 1 1 1 0 0 0 0 0 1
1  2  25 1 1 1 0 0 0 0 0 1 1
1  3  27 1 1 0 0 0 0 0 1 1 0
1  4  28 1 0 0 0 0 0 1 1 0 0
1  5  28 0 0 0 0 0 1 1 0 0 1
1  6  28 0 0 0 0 1 1 0 0 1 1
1  7  28 0 0 0 1 1 0 0 1 1 1
1  8  29 0 0 1 1 0 0 1 1 1 1
1  9  29 0 1 1 0 0 1 1 1 1 0
1 10  29 1 1 0 0 1 1 1 1 0 0
2  1 100 1 1 0 0 1 1 0 0 0 0
2  2 110 1 0 0 1 1 0 0 0 0 1
2  3 110 0 0 1 1 0 0 0 0 1 1
2  4 110 0 1 1 0 0 0 0 1 1 0
2  5 110 1 1 0 0 0 0 1 1 0 0
2  6 115 1 0 0 0 0 1 1 0 0 1
end

Please note that this is a mock data set.

hotel_id: unique id assigned to each hotel.
time: simply put, let's say it's a daily variable.
price: price for each day for each hotel.
booking_i: dummy variable equals to 1 if a hotel is booked on that day or 0 if not booked.
(Why do I have more than booking_1? -> In this data set, booking information were collected up to future 10 days)

Goal:
My goal is to construct a new variable that represents the average revenue(price*booking) on a 5-day rolling basis (something like moving average).
For example,
the value for hotel 1 in time 1 will be, 25*1+25*1+27*1+28*1+28*0
the value for hotel 1 in time 2 will be, 25*1+27*1+28*1+28*0+28*0
...
the value for hotel 1 in time 10 will be, 29*1+29*1+29*0+29*0+29*1 ...(future price is not achievable for some of the latter observations, so simply put, let's just assume that we can impute the last price.)

the value for hotel 2 in time 1 will be, 100*1+110*1+110*0+110*0+110*1
the value for hotel 2 in time 2 will be, 110*1+110*0+110*0+110*1+110*1
the value for hotel 2 in time 3 will be, 110*0+110*0+110*1+110*1+115*0
the value for hotel 2 in time 4 will be, 110*0+110*1+115*1+115*0+115*0
the value for hotel 2 in time 5 will be, 110*1+115*1+115*0+115*0+115*0
the value for hotel 2 in time 6 will be, 115*1+115*0+115*0+115*0+115*0

Thinking in a simple way, what I can do is to multiply the first 5 observations of price variable with corresponding booking_i variables.
I was thinking of using reshape command so that I can convert price variable to wide format but I can't think of the next steps and just hangs there for a couple of hours..
Is there any better way to implement this?
I would greatly appreciate for any helps.

↧

Laptop Configuration for Big Data

March 24, 2020, 9:00 pm

≫ Next: Poststratification video tutorial

≪ Previous: multiplying each value from a long format variable by corresponding value from wide format variables

Dear Stata List

I am working with a large trade dataset (17 g). The dataset has three dimensions : 5000 products, 190 countries and 3 time periods. I use reghdfe command to run a linear model with multidimensional fixed effects (country-time, country-product and product-time). On my laptop it takes about 2 hrs to estimate the model.

Can you recommend how I can update the configuration to estimate such a model in a reasonable time frame?

My laptop configuration is below:

8 g RAM, Intel Dual Core i5-7200U processor and 1 TB hard drive.

Thank you

Rohit

↧

Poststratification video tutorial

March 24, 2020, 9:17 pm

≫ Next: Decile groups based on MPCE

≪ Previous: Laptop Configuration for Big Data

In the following YouTube video tutorial Chuck Huber (StataCorp) mentions at 4:25 that the poststratification options are going to be described in a different video.

I have found a reference to: https://www.youtube.com/embed/lWXhGeT8u5M

from withinthe PDF help file: https://www.stata.com/manuals13/svyp...tification.pdf

Yet the video is marked as private. Is there a public version of it?

Thank you, Sergiy

↧

Decile groups based on MPCE

March 24, 2020, 10:58 pm

≫ Next: Making predictions on spatial panel data

≪ Previous: Poststratification video tutorial

Hello,
I'm trying to re-create a table given in one of the technical reports on housing. The data set includes household level data on housing condition and other socio-economic parameters. The table that I want to recreate basically divides the households into decile groups based on monthly per capita consumption expenditure (MPCE). I tried different approaches including:

Method 1:

egen MPCErank= rank( MPCE ), unique
egen decile =cut(MPCErank), group(10)

Method 2:
sort MPCE
sumdist MPCE [aweight==weight], n(10)

But yet failed to recreate them. Please tell me where I'm going wrong.

↧

Making predictions on spatial panel data

March 24, 2020, 11:13 pm

≫ Next: Loop to create new variables

≪ Previous: Decile groups based on MPCE

Hello!

I am trying to make a model for estimating the demand for city-bikes. For this, I have created a spatial panel regression (using spxtregress). I run the estimation on a subsample (one month of data) for all my models, everything works fine that way. But when I want to do a prediction the following month it seems impossible to do so.

I noticed that the documentation mentions "These statistics are only available in a subset of the estimation sample.", which indeed seems to be true as using

Code:

predict departures if e(sample)

does the trick, but then the following month is just missing data.

Is there any way to work around this and actually do the prediction for data outside the estimation sample?

Code:

spxtregress ln_dep at rain metro restaurants bikepaths capacity weekend students work men if time_id < 836 + (24 * 28), re dvarlag(W)
estat ic
outreg2 using departures.doc, replace
estat impact
predict p_dep_3

Kind regards,
Didrik

↧

Loop to create new variables

March 25, 2020, 1:57 am

≫ Next: Subgroup analysis in stcox

≪ Previous: Making predictions on spatial panel data

Hello,

I have data with a date variable and some other variables. For each date, there are multiple observations. I am looking to iterate over a range of variables and for each of these variables create a new mean variable, which indicates the mean of that variable for that date. This is the code I have so far:

These means will be used to calculate z-scores per observation per date.

Code:

foreach var in varXX-varXX{

    egen `var'_mean = mean(`var'), by(datadate)

    egen `var'_sd = sd(`var'), by(datadate)

    gen `var'_z = (`var'-`var'_mean)/`var'_sd

    drop `var'_mean `var'_sd

}

I cannot get it to work. Any tips? I am all new to Stata.

↧

Subgroup analysis in stcox

March 25, 2020, 2:39 am

≫ Next: Test of difference between means

≪ Previous: Loop to create new variables

Dear Statalisters,

In this first post on Statlist I apologize in advance if the format or details are not appropriate), I'd like to ask guidance for subgroup analysis in a Cox regression model using Stata v16.
I am analyzing the association between a gene variant (rsXX, with three potential genotypes: rsXX_num 1 rsXX_num 2 rsXX_num 3) and time to a composite outcome. In both uni and multivariate models, HR is significantly increased in rsXX_num 3 (vs rsXX_num 1, ref). Details are provided below:

Code:

. stcox i.rsXX_num, nolog

Code:

. stcox DoPcreat Age Gender BMI CVD Diabetes i.rsXX_num, nolog

Code:

stcox i.rsXX_num#i.Cohort_number , nolog

appropriate to test this question?

Based on the results provided below, am I allowed to conclude that HR is higher in rsXX_num 3 vs rsXX_num 1 in cohorts 1/2/3/5 (not 4) ?

Thank you very much for your comments and for your help.

Johann

Code:

. stcox i.rsXX_num#i.Cohort_number , nolog

failure _d: Outcomerisk_combined == 8
analysis time _t: Time_PD_outcome_dayscorr
id: PatGen_ID

Cox regression -- Breslow method for ties

No. of subjects = 756 Number of obs = 756
No. of failures = 363
Time at risk = 674992
LR chi2(14) = 89.22
Log likelihood = -1992.4261 Prob > chi2 = 0.0000

--------------------------------------------------------------------------------------------
_t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval]
---------------------------+----------------------------------------------------------------
rsXX_num#Cohort_number |
1 2 | .9009012 .4122355 -0.23 0.820 .3674373 2.208875
1 3 | 2.570452 .7543683 3.22 0.001 1.446111 4.568961
1 4 | 1.021353 .3630243 0.06 0.953 .5088959 2.049851
1 5 | 3.380038 .8142878 5.06 0.000 2.107938 5.419825
2 1 | 1.633623 .4229501 1.90 0.058 .983497 2.713505
2 2 | .9060641 .3421188 -0.26 0.794 .4322704 1.899163
2 3 | 2.632542 .8024708 3.18 0.001 1.448458 4.784591
2 4 | 1.409253 .4038504 1.20 0.231 .8036365 2.471259
2 5 | 3.397648 .8171751 5.09 0.000 2.120577 5.443805
3 1 | 2.292504 .9024723 2.11 0.035 1.059801 4.959022
3 2 | 3.106154 1.688919 2.08 0.037 1.070031 9.016742
3 3 | 2.831661 1.399257 2.11 0.035 1.075035 7.458648
3 4 | 1.113009 .484567 0.25 0.806 .4741479 2.612663
3 5 | 5.122214 1.407804 5.94 0.000 2.988897 8.77818
--------------------------------------------------------------------------------------------

.

Code:

. stcox DoPcreat Age Gender BMI CVD Diabetes i.rsXX_num#i.Cohort_number , nolog

failure _d: Outcomerisk_combined == 8
analysis time _t: Time_PD_outcome_dayscorr
id: PatGen_ID

Cox regression -- Breslow method for ties

No. of subjects = 628 Number of obs = 628
No. of failures = 302
Time at risk = 577213.5
LR chi2(20) = 144.90
Log likelihood = -1564.1734 Prob > chi2 = 0.0000

--------------------------------------------------------------------------------------------
_t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval]
---------------------------+----------------------------------------------------------------
DoPcreat240 | 1.953969 .9840493 1.33 0.183 .728181 5.243196
Age | 1.011828 .0044122 2.70 0.007 1.003217 1.020513
Gender | 1.203049 .1542438 1.44 0.149 .9357293 1.546737
BMI | .9882223 .0141289 -0.83 0.407 .9609146 1.016306
CVD | 1.58262 .2104134 3.45 0.001 1.219571 2.053743
Diabetes | 1.793188 .2476533 4.23 0.000 1.367943 2.350625
|
rsXX_num#Cohort_number |
1 2 | .8634405 .4086258 -0.31 0.756 .3415063 2.183062
1 3 | 2.18686 .6931283 2.47 0.014 1.174983 4.070151
1 4 | 1.223346 .4918156 0.50 0.616 .5563458 2.690008
1 5 | 3.524228 .9636173 4.61 0.000 2.062163 6.022891
2 1 | 1.409677 .4095051 1.18 0.237 .7977179 2.491094
2 2 | .8402843 .3326644 -0.44 0.660 .3867584 1.82563
2 3 | 2.474652 .8202406 2.73 0.006 1.292342 4.738608
2 4 | 1.638635 .5346087 1.51 0.130 .8645211 3.105909
2 5 | 3.860905 1.034039 5.04 0.000 2.284114 6.526199
3 1 | 2.722979 1.122167 2.43 0.015 1.214104 6.107071
3 2 | 4.102141 2.328255 2.49 0.013 1.348632 12.4775
3 3 | 2.906612 1.618698 1.92 0.055 .975778 8.658109
3 4 | 1.436237 .705047 0.74 0.461 .548747 3.759066
3 5 | 5.211703 1.59731 5.39 0.000 2.858241 9.502992
--------------------------------------------------------------------------------------------

.

↧

Test of difference between means

March 25, 2020, 3:12 am

≫ Next: ppml cluster - Gravity on product level with only one importer

≪ Previous: Subgroup analysis in stcox

Hi,

I have data where I have compared the means and from visualising it I can see that one group have lower values than the other, however I am not sure how to quantify this so that I can say whether group 1 has slightly lower scores or much lower scores.

Here is my table:

Age	Group 1 (males)	Group 2 (males)	Group 1 (females)	Group 2 (females)
40-44	37.3	50.3	25.8	30.7
45-49	36.6	48.8	20.1	29.9
50-54	35.9	47.6	18.5	28.7
55-59	34.3	46.2	21.2	27.5
60-64	30.3	44.6	19.2	26.5
65-69	31.5	42.3	18.8	25.3
70	23.6	39.1	10.5	23.5

↧

ppml cluster - Gravity on product level with only one importer

March 25, 2020, 3:43 am

≫ Next: Exporting Tables from Stata

≪ Previous: Test of difference between means

Dear all

I am evaluating the effect on american imports of the trade war dispute between China and the US.

I am using the reghdfe and ppmlhdfe and my question relates to in which dimensions I should cluster my standard errors.

Question 1:

Firstly, I want to evaluate the effects of the different tariff rounds for american imports from China (Products subject to a tariff compared to those not subject to tarif). E.i my dimensions in the panel data are product and time. Additional I control for seasonality of products by interacting product fixed effect with month fixed effect (i.product#i.month)
Should I then cluster my standard errors on both time, products and seasonality or only products with seasonality or products without seasonality?

To clarify, which one of these are the most appropriate and why?

A) ppmlhdfe Y X, abs(i.product#i.month i.time) vce(cluster i.product#i.month i.time)
B) ppmlhdfe Y X, abs(i.product#i.month i.time) vce(cluster i.product#i.month)
C) ppmlhdfe Y X, abs(i.product#i.month i.time) vce(cluster i.product)

Y = American imports from China
X = Tariff rounds

Question 2:

Secondly, I want to evaluate the "indirect" effect of the different tariff rounds for american imports from EU countries. My panel data have now three dimensions: Exporter, Product and Time.
I still control for seasonality for a product in a specific country. (i.exporter#i.product#i.month)
Again how should I correctly cluster my standard errors and why:

A) ppmlhdfe Y X, abs(i.exporter#i.product#i.month i.time) vce(cluster i.exporter#i.product#i.month i.time)
B) ppmlhdfe Y X, abs(i.exporter#i.product#i.month i.time) vce(cluster i.exporter#i.product#i.month)
C) ppmlhdfe Y X, abs(i.exporter#i.product#i.month i.time) vce(cluster i.exporter#i.product)

Y = American imports from EU country i
X = Tariff rounds

Dear Joao Santos Silva I have been following you in the attempt to find an answer on these questions. But unfortunately have not been able to find it so far. However, I believe that you might be able to answer this question?

Thank you in advance

Best
Rasmus

↧

Exporting Tables from Stata

March 25, 2020, 3:46 am

≫ Next: Combining Monthly and Yearly Data

≪ Previous: ppml cluster - Gravity on product level with only one importer

Hello. I am currently writing my thesis and I have finished all my analysis. However, I am now struggling to export all my results to excel and word. For instance, I have performed a factor analysis with over 70 variables resulting in 16 factors. Unfortunately, copying all the tables to excel by hand doesn't work since it is all being copied into one excel-column. I have read about the putexcel function, but so far I couldn't make it work. I would greatly appreciate some help.

Thanks and best regards, Steve.

↧

Latest Images