Exporting multiple tabs to Excel with putexcel

April 21, 2020, 9:36 pm

≪ Previous: rep macro in simulate command

Hi! I got this code that allows me to export a simple tabulation of the variable "Edad" to Excel:

tabulate Edad, matcell(freq) matrow(names)

putexcel set "Example 1", modify
putexcel A1=("Pregunta 1") B1=("Frecuencia") C1=("%") D1=("% Acumulado")

local rows = rowsof(names)
local row = 2
local cum_percent = 0

forvalues i = 1/`rows' {

local val = names[`i',1]
local val_lab : label (Edad) `val'

local freq_val = freq[`i',1]

local percent_val = `freq_val'/`r(N)'*100
local percent_val : display %9.2f `percent_val'

local cum_percent : display %9.2f (`cum_percent' + `percent_val')

putexcel A`row'=("`val_lab'") B`row'=(`freq_val') C`row'=(`percent_val') ///
D`row'=(`cum_percent')
local row = `row' + 1
}

putexcel A`row'=("Total") B`row'=(r(N)) C`row'=(100.00)

It works perfectly, the things is that I want to do this same procedure but for the other 35 variables that I have, and also it would be great if I could export the tables in the same Excel file or even the same spreadsheet. I could run the comand 35 five times changing the name of the variable but there must be an easier way to do this, can you point me in the right direction or give me advice on how to proceed?

Thank you very much!

↧

Tariff data to use in Stata

April 22, 2020, 12:55 am

≫ Next: egen numeric variable

≪ Previous: Exporting multiple tabs to Excel with putexcel

Hello,

I am new to this forum and econometrics in general so im not sure this is a appropriate line of questioning in this forum. I am constructing a dataset to use in Stata/SE 15, where my aim is to look at the effect of tariffs on U.S. manufactuing sector.

I am using the aggregation created by Li (2018), found here: https://www.card.iastate.edu/china/trade-war-data/ (point C). The data is allready aggregated to GTAP commodity level with corresponding trade values as weights. My goal is to further use this data to create two monthly variabels.

What I did was:
1. remove GTAP commodites not related to manufacturing, e.g. remove all GTAP sectors except 27-47 (https://www.gtap.agecon.purdue.edu/d...iledsector.asp). This was done using VLOOKUP in excel. Then I summed all the tariff increases and divided on number of rows. From this I got two variabels: 1) tariff increase that the U.S. put on import from the rest of the world and 2) tariff that is placed on exports coming from the US.

When I ran these against my dependent variable (production) in Stata I got that the "export tariff" has a positive effect, which sounds unlikely. I tried just summing the tariffincreases without dividing by number of rows (since they are already weighted) without any luck.

Does anyone have a way to use the data found on CARD to create monthly independent variables for tariff increases.

Would appreciate any help and thank you in advance.

PS: I am studying on a bachelor level, not yet very advanced in Stata so the regression I will use is very basic. Therefore a model like used in the reasearch paper here "Disentangling the Eﬀects of the 2018-2019 Tariﬀs on a Globally Connected U.S. Manufacturing Sector" by Aaron Flaaen and Justin Pierce will be out of reach.

↧

egen numeric variable

April 22, 2020, 1:20 am

≫ Next: Reshape long Datastream equity data

≪ Previous: Tariff data to use in Stata

Dear Stata users,

Please take a look at this problem: I wish to generate a unique identifier for a panel data set. The id is supposed to be composed of values drawn from three existing variables. Therefore I used this code: egen [float] ID = concat(var1 var2 var3)

The result is almost correct. Unfortunately, the new variable ID is created as a string variable, despite the float specification in my code. Moreover, the values all look like this: 6.70e+071020 which makes it impossible to use destring as this format contains non-numeric characters. I have tried other type specifications but none of them work.

Does anyone know how to do in order to get an output that looks like this: 670007181020 ?

Thank you!

↧

Reshape long Datastream equity data

April 22, 2020, 1:34 am

≫ Next: How to add extra column to summary statistics with asdoc?

≪ Previous: egen numeric variable

Hi,

I have the following type of wide panel data downloaded directly from Datastream. The stucture of the number of stocks "m" included in the sample and the number of variables with "n" will change:
Date; STOCK1 - VAR1 ; STOCK1 - VAR2 ; STOCK1 - VARn ; .... STOCK2 - VAR1 ; STOCK2 - VAR2 ; ... ; STOCK2 - VARn ; ...... ; STOCKm - VAR1 ; STOCKm - VARn

Here is an exemple:

Code:

 Date
AMAZON.COM - TOT RETURN IND
AMAZON.COM - MARKET VALUE
AMAZON.COM - DIVIDEND YIELD
AMAZON.COM - PER
AMAZON.COM - UNADJUSTED PRICE
AMAZON.COM - BOOK VALUE PER SHARE
AMAZON.COM - COMMON SHARES OUTSTANDING
ABBOTT LABORATORIES - TOT RETURN IND
ABBOTT LABORATORIES - MARKET VALUE
ABBOTT LABORATORIES - DIVIDEND YIELD
ABBOTT LABORATORIES - PER
ABBOTT LABORATORIES - UNADJUSTED PRICE
ABBOTT LABORATORIES - BOOK VALUE PER SHARE
ABBOTT LABORATORIES - COMMON SHARES OUTSTANDING
AES - TOT RETURN IND
AES - MARKET VALUE

4/21/2017
45882.38
429475.1
0
183.4
898.53
57.25
484000
49507.01
75536.38
2.44
45.1
43.53
17.72
1743602
414.52
7502.99

5/21/2017
49013.11
458779.9
0
181.2
959.8401
57.25
484000
49086.21
74894.38
2.46
44.7
43.16
17.72
1743602
412.3
7393.61

6/21/2017
51177.7
479041.2
0
189.2
1002.23
57.25
484000
55239.05
84282.19
2.18
50.3
48.57
17.72
1743602
429.6
7703.88

7/21/2017
52374.63
492710.8
0
193.6
1025.67
57.25
484000
58137.11
88331.5
2.08
52.6
50.84
17.72
1743602
420.4
7538.84

8/21/2017
48678.64
457940.9
0
242.4
953.29
57.25
484000
55895.8
84926.13
2.17
68.3
48.88
17.72
1743602
417.83
7414.68

9/21/2017
49258.71
463398
0
245.3
964.6499
57.25
484000
58811.79
89356.63
2.06
71.8
51.43
17.72
1743602
414.11
7348.65

10/21/2017
50191.14
473636.9
0
250
982.9099
57.25
484000
64715.63
98030.63
1.88
44
56.32
17.72
1743602
414.11
7348.65

11/21/2017
58186.72
549088.5
0
290.1
1139.49
57.25
484000
64485.8
97682.5
1.89
43.8
56.12
17.72
1743602
399.97
7019.91

12/21/2017
59987.74
566084.1
0
299.1
1174.76
57.25
484000
65416.55
99092.38
1.97
44.4
56.93
17.72
1743602
399.6
7013.3

1/21/2018
66106.19
623822.1
0
329.6
1294.58
88.694
491000
68475
103235
1.89
46.3
59.31
17.386
1755619
442.12
7759.54

2/21/2018
75723.56
717891.8
0
325.8
1482.92
88.694
491000
68105.56
103016.3
1.9
53.3
58.99
17.386
1755619
389.25
6756.39

3/21/2018
80775.81
765789.4
0
347.5
1581.86
88.694
491000
72065.63
109006.2
1.79
56.4
62.42
17.386
1755619
410.94
7133.32

I would like to the data to be reshaped as follow:
Date; STOCK ; VAR1 ; VAR2 ; VARn

I include the data in a csv file if that can help.

Thanks a lot in advance!

↧

How to add extra column to summary statistics with asdoc?

April 22, 2020, 1:39 am

≫ Next: PPMLHDFE - Regression not starting

≪ Previous: Reshape long Datastream equity data

Dear all,

I am using the asdoc tool to export my tables to word.
However, I am struggling to add an extra column to my summary statistics, where I can add the VIF of the variables manually in word.
Does someone knows the option for this?

Kind regards

↧

PPMLHDFE - Regression not starting

April 22, 2020, 1:43 am

≫ Next: How do I insert two weights in one tabulate command?

≪ Previous: How to add extra column to summary statistics with asdoc?

Dear Statalist forum,

I am currently tring to run a gravity equation using the PPMLHDFE command and saving the estimates for my fixed effects, as i have to use these in a further study. The regression i am running is the following:

ppmlhdfe trade, abs( i#k j#k i#j, savefe)

Where:

i= Exporter
j= Importer
k= sector (HS6 - 5200 different in the dataset).

I have bilateral trade for 99 countries and therefore i in total have : 50,440,698. I am using STATA I/C and running the regression on a computer with 32 GB ram and I7 processor.
The problem is that the regression never seems to begin the furthest i have gotten is this:

ppmlhdfe v, abs(i#k j#k i#j, savefe)
(dropped 1.84e+07 observations that are either singletons or separated by a fixed effect)

It never begins undertaking the iterations.

My question is am i simply demanding to much of my system ? Or is there something i am doing wrong in terms of the coding?.. I have look through several forums and cant seem to find an answer for this.

After having read some of PPMLHDFE: Fast Poisson Estimation withHigh-Dimensional Fixed Effects, i would expect on of the following to be able to answer this:

Sergio Correia Tom Zylkin Paulo Guimaraes

However advice from anyone, would be highly apperciated !

↧

How do I insert two weights in one tabulate command?

April 22, 2020, 1:44 am

≫ Next: Referring to year_t-1 to construct in- and outmigration rates

≪ Previous: PPMLHDFE - Regression not starting

Hi Statalist,

I have a survey with two sample populations. One is representative of the general population (frequency 2029), one is representative of a specific target population (frequency 5934). I have to create very simple tabulations of variables that capture questions asked of BOTH the general and target populations. However, I have never done such tabulations including two different weights. Any advice on how I can integrate two different weights into one tabulate command?

Code:

. tab sample

Data Only Variable: |
             Sample |      Freq.     Percent        Cum.
--------------------+-----------------------------------
            General Pop |      2,029       25.48       25.48
Target Pop |      5,934       74.52      100.00
--------------------+-----------------------------------
              Total |      7,963      100.00

Naturally, each sample population comes with its own weights, namely 'weight_gp' for the general population, and 'weight_target' for the target population

Code:

. codebook weight_gp // for general pop analyses

---------------------------------------------------------------------------------------------
weight_gp                                    Post-Stratification weight for Gen Pop (n=2,029)
---------------------------------------------------------------------------------------------

                  type:  numeric (double)

                 range:  [.212,3.052]                 units:  .0001
         unique values:  1,129                    missing .:  5,934/7,963

                  mean:         1
              std. dev:   .328805

           percentiles:        10%       25%       50%       75%       90%
                             .6517     .7857     .9459    1.1671    1.4248


. codebook weight_target // for target pop analyses

---------------------------------------------------------------------------------------------
weight_target              Post-Stratification weight for Qualified Targeted Sample (n=5,934)
---------------------------------------------------------------------------------------------

                  type:  numeric (double)

                 range:  [.0655,7.6575]               units:  .0001
         unique values:  1,647                    missing .:  2,029/7,963

                  mean:         1
              std. dev:   1.27466

           percentiles:        10%       25%       50%       75%       90%
                             .1665      .293     .5625    1.1167    2.1893

I am now looking at a survey question that was asked of both the general and target populations. The question is captured in the variable 'mleave_pew'. codebook & dataex:

Code:

mleave_pew   Following the birth or adoption of a child, do you think MOTHERS should be able 
---------------------------------------------------------------------------------------------

                  type:  numeric (byte)
                 label:  Q14A

                 range:  [1,99]                       units:  1
         unique values:  3                        missing .:  0/7,963

            tabulation:  Freq.   Numeric  Label
                         7,802         1  Yes, mothers should be able to
                                          take leave
                           115         2  No, mothers should not be able
                                          to take leave
                            46        99  Refused


* Example generated by -dataex-. To install: ssc install dataex
clear
input byte mleave_pew
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
2
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
end
label values mleave_pew Q14A
label def Q14A 1 "Yes, mothers should be able to take leave", modify
label def Q14A 2 "No, mothers should not be able to take leave", modify

Again my question is, how can I create a simple weighted table using both weights?

↧

Referring to year_t-1 to construct in- and outmigration rates

April 22, 2020, 2:03 am

≫ Next: How to produce 6-way tabulate with only percentages?

≪ Previous: How do I insert two weights in one tabulate command?

Dear all,

I am currently working with a survey dataset that follows people over time and looks as follows:

person-id	year	state	inmigration	outmigration
1	1991	1	0	0
1	1992	1	0	0
1	1993	1	0	0
2	1991	1	0	0
2	1992	1	0	0
2	1993	2	1	1
3	1991	1	0	0
3	1992	1	0	0
3	1993	2	1	1

My task is to build inmigration and outmigration rates for every state and year. Therefore, I thought that I could generate a variable "inmigration" that equals 1 if state_t ≠ state_t-1. As a next step I could sum the observations by state and year to get amount of people migrating to a state. In the example above this would equal 2 for state 2 and year 1993 and 0 for all other years and states.
However, I also need to compute outmigration rates by state and year. This variable should also equal 2 for the year 1993, but should be reported for state 1 because these people are migrating frome state 1 to state 2 and are therefore leaving state 1. Does anyone know how to adress this problem?

I hope my question is clear.

Thanks for your help and best regards,

Leandro

↧

How to produce 6-way tabulate with only percentages?

April 22, 2020, 2:14 am

≫ Next: Identifying and keeping the three highest-paid waitresses

≪ Previous: Referring to year_t-1 to construct in- and outmigration rates

Dear statalisters,

I have the following data structure.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str50 id float(club1 club2 club3 club4 club5 club6)
"ab1"           1 0 0 1 0 0
"ab2"           1 1 0 0 0 0
"ab3"           0 0 1 1 1 0
"ba3"           0 1 0 0 0 1
"ce4"           1 0 1 0 0 1
"kl7"            1 1 0 0 1 0
"gb5"           0 0 0 0 1 0
"tv2"            1 0 0 1 0 0
end

where id is a unique user id for each observation, and club1 ... club6 are dummies wheter the id is member of the club or not. I want to make a 6-way tabulation of the percantage within each club that is also a member of the other clubs. Perhaps the data should be restructured., im not sure. So what i want is something that looks like the following:

	club1	club2	club3	club4	club5	club6
club1	-
club2	% of members of club 1 who are also members of club 2	-
club3	etc		-
club4				-
club5					-
club6						-
Total	nr of members club 1	nr of members club 2	nr of members club 3	nr of members club 4	nr of members club 5	nr of members club 6

Any advice would be very helpful as I now spend alot of time manually plotting it from two-way tabulations. (Bonus: If anybody has a solution that can be automatically exported to a MS word format, that would be amazing.)

Thanks,

Maximilan

↧

Identifying and keeping the three highest-paid waitresses

April 22, 2020, 2:15 am

≫ Next: Calculate ratio by date

≪ Previous: How to produce 6-way tabulate with only percentages?

Hi,
I would like to identify the three highest-paid waitresses in terms of "total compensation" in each hotel-year. I also would like to keep such three waitresses only with other receptionists. Note that, in some cases, the third and fourth highest-paid waitresses have the exact same compensation. In such a case, I would like to keep the one with the highest "base salary". if both have the exact base salary and total compensation, I would like to keep either one of them. Also, I would like to ignore observations with missing compensation data.

I am new to Stata and I would like the simplest way to do it. Thanks in advance.

Code:

ssc install dataex
clear
input int year str6 staff_id str5 hotel_id byte(waitress receptionist) str6 base_salary str6 total_compensation
2009 "124665" "23453" 1 0 40 112
2009 "455543" "23453" 0 1 60 111
2009 "334532" "23453" 1 0 55 222
2009 "888976" "23453" 1 0 80 90
2009 "903454" "23453" 1 0 88 90
2009 "457888" "23453" 1 0 . 90

2010 "124665" "23453" 1 0 53 90
2010 "455543" "23453" 0 1 45 88
2010 "334532" "23453" 1 0 33 79
2010 "556333" "23453" 1 0 60 60
2010 "299211" "23453" 1 0 60 60
2010 "235987" "23453" 1 0 60 .

2011 "124665" "23453" 1 0 40 67
2011 "877776" "23453" 0 1 34 89
2011 "666755" "23453" 1 0 12 99
2011 "556333" "23453" 1 0 50 66
2011 "563222" "23453" 1 0 50 66
2011 "967656" "23453" 1 0 50 66
2011 "343434" "23453" 1 0 13 .
end

↧

Calculate ratio by date

April 22, 2020, 2:27 am

≫ Next: Using gsem with ordered categorical mediators

≪ Previous: Identifying and keeping the three highest-paid waitresses

Dear All,

I'm trying to compute a ratio based on a date (def_date) in the dataset. I try to give a basic working example to explain the problem. Thank you for the help.

Code:

clear
input float(def_date def date_start date_end)
195 1 180 252
195 1 181 270
196 1 170 260 
197 0 165 222 
198 1 159 223
201 0 198 224
202 0 187 225
203 0 199 225
205 1 199 219
205 1 150 240
. 0 140 219
. 0 199 219
. 0 201 219
. 1 205 218
. 0 187 217
. 1 177 216
end

format %tq def_date date_start date_end
gen id_loan = _n
order id
browse



gen active = 0
replace active = 1 if date_start <= date_end

bysort def_date : gen def_count_q = sum(active)

bysort def_date : egen def_count_max_q = max(def_count_q) if def_date != .

twoway (line def_count_max_q def_date), title(Number of default per quarter of default)

* for each def_date
* goal: compute for each def_date number of def=1 loans and divide it by total number of active loans in that def_date
* problem: def_date is available only for loans with def=1 status
* solution:
* numerator: number of def=1 loans for each def_date, e.g. in 2008 = 2 (done! see variable def_count_max_q) 
* denominator = calculate number of loans active in that particular def_date, meaning that date_start < def_date < date_end
*    -> e.g. in 2008q4 I have active id_loan 3,4,5,6,7,8,9,10,11,15,16 = 11 loans
*     -> e.g. in 2011q2 I have active id_loan 11,12,13,14,15,16 = 7
* last operation: compute per each def_date the def_rate = n.default / tot.active loans : 2/11 in 2008q4

* I was thinking to solve it with a loop for each def_date
qui: levelsof def_date, local(levels)
foreach l of local levels{
di %tq `l'
bysort `l': count if date_start < `l' & `l' > date_end
count if date_start < `l'
}

↧

Using gsem with ordered categorical mediators

April 22, 2020, 2:36 am

≫ Next: Accuracy of a new test using the diagti and sf(0) adjusted 95% CI

≪ Previous: Calculate ratio by date

Hi,

I want to run a mediation analysis looking at the direct and indirect effect of adolescent personality (three separate but related measures of personality) on all-cause mortality in adulthood. I have a range of mediators I want to include, most of which are continuous variables. However, I have several ordered categorical mediators, including highest qualification and self-reported health. I know gsem allows me to run ordered logit models for the effect of personality on these ordered categorical mediators. However, does the command treat them as continuous when mortality is the outcome and they are independent variables? Is it possible to include them as dummy variables when they are acting as independent variables?

Thanks,
Rose

↧

Accuracy of a new test using the diagti and sf(0) adjusted 95% CI

April 22, 2020, 3:02 am

≫ Next: Cross sectional dependence/ xtpcse

≪ Previous: Using gsem with ordered categorical mediators

I have used the immediate command in STATA to calculate the accuracy and predictive values for a new test against a gold standard. For zero in one or more blocks, I have used the sf(0) command to adjust the 95% CI, what method does this use? Many thanks Lesley

↧

Cross sectional dependence/ xtpcse

April 22, 2020, 4:28 am

≫ Next: Panel data: the share of inter and intra variabilities in total variabilities for each variable

≪ Previous: Accuracy of a new test using the diagti and sf(0) adjusted 95% CI

Dear Members,

I would like to know how do deal with cross sectional dependences.
I have started doing something after some readings as follow:

After running fixed effect model I run "xtcsd, pesaran" for cross sectional dependence. My data indicates that there is a sectional dependence.

I have unbalanced panel dataset where N>T (N=56 and T=16).

xtpcse is suitable ?
If yes I would like to know if it obligatory to add an option? (I found an example like this : eg: xtpcse variables, c(psar1) )

And then, if it is obligatory, how to know which option I must apply ?

Thanks in advance

↧

Panel data: the share of inter and intra variabilities in total variabilities for each variable

April 22, 2020, 5:03 am

≫ Next: Expand sample using levels of a variable

≪ Previous: Cross sectional dependence/ xtpcse

Hello. I have panel data. How could we calculate the share of inter and intra variabilities in total variabilities? A reviewer of an econometrics journal recommends reporting such share. Stata’s “xtsum” reports the standard deviations for overall, between, and within. However, the sum of the between (inter) and within (intra) variances is not equal to the overall variance, and the standard deviation for between can be larger than the standard deviation for overall, as follows.

https://www.stata.com/manuals13/xtxtsum.pdf

. use http://www.stata-press.com/data/r13/nlswork
(National Longitudinal Survey. Young Women 14-26 years of age in 1968)

. xtsum hours

Variable | Mean Std. Dev. Min Max | Observations
-----------------+--------------------------------------------+----------------
hours overall | 36.55956 9.869623 1 168 | N = 28467
between | 7.846585 1 83.5 | n = 4710
within | 7.520712 -2.154726 130.0596 | T-bar = 6.04395

. xtsum birth_yr

Variable | Mean Std. Dev. Min Max | Observations
-----------------+--------------------------------------------+----------------
birth_yr overall | 48.08509 3.012837 41 54 | N = 28534
between | 3.051795 41 54 | n = 4711
within | 0 48.08509 48.08509 | T-bar = 6.05689

Then, we cannot calculate a meaningful share of the between or within variabilities in overall variabilities using the xtsum statistics? Is there any way to obtain such share? I am using Stata/SE 16.1. I would appreciate your help.

↧

Expand sample using levels of a variable

April 22, 2020, 6:13 am

≫ Next: Scalar multiplication problem

≪ Previous: Panel data: the share of inter and intra variabilities in total variabilities for each variable

Hello statalist,

I am using Stata 15.1. I would like to expand a sample using the levelsof one variable. The objective is that for my ID unit (country) I record whether a measure has been taken or not.

Here the data:

input str3 iso3 str22 Country long level1
"AFG" "Afghanistan" 1
"AFG" "Afghanistan" 1
"AFG" "Afghanistan" 1
"AFG" "Afghanistan" 1
"AFG" "Afghanistan" 1
"AFG" "Afghanistan" 3
"AGO" "Angola" 3
"ALB" "Albania" 1
"ALB" "Albania" 1
"ALB" "Albania" 1
"ALB" "Albania" 3
"ALB" "Albania" 3

Basically, I have a list of countries. I have a variable that records for each country a "level1" measure taken. "level 1" can take up to 5 values. Level 1 measures can be repeated (as seen above, AFG has 5 observations taking value of 1 for the "level 1").

I would like to add observations to the data set so each country can have a maximum of 5 level 1 measures. For countries that do not report a specific measure (e.g., Afghanistan in the example below not recording level1=2, level1=4, level1=5), I would like just to add one line for each missing measure. Then create a new variable ("Level1_taken_fictious") that takes value of 1 if a specific "level1" measure has been taken, 0 otherwise. Below I report an example just for AFG to exemplify the new data set I would like to create.

iinput str3 iso3 str22 Country long level1 float Level1_taken_fictious
"AFG" "Afghanistan" 1 1
"AFG" "Afghanistan" 1 1
"AFG" "Afghanistan" 1 1
"AFG" "Afghanistan" 1 1
"AFG" "Afghanistan" 1 1
"AFG" "Afghanistan" 2 0
"AFG" "Afghanistan" 3 1
"AFG" "Afghanistan" 4 0
"AFG" "Afghanistan" 5 0

I hope you can help me.

Davide

↧

Scalar multiplication problem

April 22, 2020, 6:21 am

≫ Next: Delete observations if not complete

≪ Previous: Expand sample using levels of a variable

Hi, Stata community!

I am trying to do a simple multiplication between scalars in two loops and having a problem with it. The new scalar - which is a multiplication of two previous scalars - is empty.
The loops run over people ("k") and countries ("j"). Let see:

Code:

  foreach k in `people' {    
    foreach j in `co_list' {
      scalar v_`k'_co_`j' =  scalar(v_all_`k'_co_`j') * scalar(share_`k'_co_`j')
    }
  }

Note: the two scalars: v_all_`k'_co_`j' and share_`k'_co_`j' are checked to contain values.

Any suggestion is welcome.

Thanks.

↧

Delete observations if not complete

April 22, 2020, 6:29 am

≫ Next: How can I perform a one sample ttest in Stata with weights?

≪ Previous: Scalar multiplication problem

Hi,

I have a question about data for my thesis and I'm new with Stata!

I have to observe whether or not there occured a rotation of the auditor. To do that, I have data of companies for 2017 and 2018 (which audit firm they have in those years), to decide whether or not there has been a change. With some of the companies there is no data of 2018, which means that those companies need to be excluded. Does anyone maybe know how to exclude such observations? (in this pic, you can see that there is no data for company 0000033073 for 2018, for example).
Array

PS: after that, I want to create a dummy which is equal to 1 if the audit firm in 2018 is different from 2017, also for that I don't know what to use.

I hope that someone can help!

Kind regards,

Oussama

↧

How can I perform a one sample ttest in Stata with weights?

April 22, 2020, 6:49 am

≫ Next: Descriptive analysis scatterplots - panel data

≪ Previous: Delete observations if not complete

Hey,

I want to test if the value for a specific subgroup is different from the (weighted) mean value for all respondents (in the example below it's 0.88).

Without weights this works perfectly well
ttest var== 0.88 if group==1

However, I cannot use weights for ttests (I want to use a pweight). If I use a two sample ttest, I would simply run a regression with group as independent variable. But I cannot think of how to do this for a one sample ttest. What would you suggest?

Thank you!

↧

Descriptive analysis scatterplots - panel data

April 22, 2020, 7:18 am

≫ Next: Hybrid model

≪ Previous: How can I perform a one sample ttest in Stata with weights?

Background of question
I am an economics student, currently writing my bachelor thesis, and quite inexperienced with Stata. I would be grateful for any help!

The purpose of my research is to analyse the drivers of export sophistication of Malaysian exports.

The dependent variable is the natural logarithm of the export sophistication index, more specifically the export sophistication of Malaysian exports to 171 countries.

The independent variables are:

Foreign Direct Investment (FDI) proxied by the stock and flow of FDI inflow, FDIS and FDIF respectively
Research and Development (R&D) proxied by Gross Domestic Expenditure on R&D as a percentage of GDP and Number of researchers per thousand in the labour force, GDE and RES respectively

Control variables are Malaysia’s GDP per capita PPP (current international $) proxying for the level of economic development (GDPc); Malaysia’s total population proxying for the country size (POPc); Malaysia’s gross enrolment ratio of the tertiary education segment proxying for Malaysia’s human capital (HCc); and the rule of law proxying for Malaysia's institutional quality (INSc).

Important here is that the data for the independent and control variables do not vary between the countries (id), only throughout the years since the data is specific to Malaysia.

My question
Whilst doing the descriptive analysis, I have encountered problems plotting the dependent against the independent variables. I simply used the scatter command. My aim is to check for the regression assumptions of linearity and homoscedasticity, but unfortunately, I am not able to draw any conclusions from the graphs.
I presume this is due to the fact that the data is the same throughout the ids…
Please find the graphs attached.

Please let me know if you need any clarification, I would be grateful for any advice/hint.
Kind regards,
Julie

↧