Quantcast
Channel: Statalist
Viewing all 65116 articles
Browse latest View live

How to handle a Categorical Variable with 1200 Levels in Stata BE 17.0?

$
0
0
Hello everyone,

I am currently working on an analysis of newspaper articles using the LIWC-22 software to extract various linguistic categories. I am using Stata BE 17.0 for my analysis. One of the key variables in my dataset is "author", which contains the names of 1200 different authors. I would like to investigate the effect of artificial intelligence discussions on several LIWC categories (e.g., moral) while controlling for the author effect.

The dataset does not have a panel structure since there is more than one observation for a certain day (Websites and/or an author published more than one article per day). Here an overview of my data:

Code:
* Example generated by -dataex-. For more info, type help dataex
clear
input str16 website str10 publication_date str145 author float(moral AIBM)
"Analytic Insight" "01/05/2024" "sumedha"           0  3.02
"Analytic Insight" "01/05/2024" "Shiva Ganesh"      0   6.2
"Analytic Insight" "01/05/2024" "sumedha"           0  1.81
"Analytic Insight" "01/05/2024" "P.Sravanthi"       0  8.15
"Analytic Insight" "01/05/2024" "sumedha"           0   5.7
"Analytic Insight" "01/05/2024" "Rachana Saha"    .12     5
"Analytic Insight" "01/05/2024" "Gayathri"        .19  6.24
"Analytic Insight" "01/05/2024" "Harshini"        .24  3.18
"Analytic Insight" "30/04/2024" "sumedha"         .34  2.16
"Analytic Insight" "30/04/2024" "Harshini"          0  6.58
"Analytic Insight" "29/04/2024" "P.Sravanthi"       0  5.47
"Analytic Insight" "29/04/2024" "P.Sravanthi"     .12   8.2
"Analytic Insight" "29/04/2024" "Parvin Mohmad"  1.38  3.31
"Analytic Insight" "29/04/2024" "Prathima"        .31  4.39
"Analytic Insight" "28/04/2024" "Nitesh Kumar"      0  7.03
"Analytic Insight" "28/04/2024" "Sai Chaitanya"   .57  9.77
"Analytic Insight" "28/04/2024" "Sai Chaitanya"   .36  5.81
"Analytic Insight" "28/04/2024" "Nitesh Kumar"    .96  4.05
"Analytic Insight" "27/04/2024" "Rachana Saha"    .26 12.81
"Analytic Insight" "27/04/2024" "sumedha"         .11 13.19
"Analytic Insight" "27/04/2024" "Prathima"          0  5.16
"Analytic Insight" "27/04/2024" "Nitesh Kumar"   1.07  2.68
"Analytic Insight" "27/04/2024" "Sai Chaitanya"     0 12.71
"Analytic Insight" "26/04/2024" "Shiva Ganesh"    .72  8.16
"Analytic Insight" "26/04/2024" "sumedha"           0  6.76
"Analytic Insight" "26/04/2024" "Supraja"         .19  7.03
"Analytic Insight" "26/04/2024" "P.Sravanthi"     .53  6.44
"Analytic Insight" "26/04/2024" "Parvin Mohmad"    .4   4.8
"Analytic Insight" "26/04/2024" "Prathima"          0  1.63
"Analytic Insight" "25/04/2024" "P.Sravanthi"     .12  1.48
"Analytic Insight" "25/04/2024" "Prathima"          0  7.42
"Analytic Insight" "25/04/2024" "S Akash"        1.45  7.79
"Analytic Insight" "25/04/2024" "Market Trends"   .29  4.82
"Analytic Insight" "25/04/2024" "Prathima"          0  5.92
"Analytic Insight" "25/04/2024" "Rachana Saha"    .13  2.59
"Analytic Insight" "24/04/2024" "Supraja"         .09  9.16
"Analytic Insight" "24/04/2024" "Shiva Ganesh"      0  7.44
"Analytic Insight" "24/04/2024" "Prathima"         .1  1.37
"Analytic Insight" "24/04/2024" "P.Sravanthi"       0  9.09
"Analytic Insight" "24/04/2024" "Parvin Mohmad"  3.42  4.56
"Analytic Insight" "24/04/2024" "Harshini"        .08  2.02
"Analytic Insight" "24/04/2024" "sumedha"         .95  8.21
"Analytic Insight" "23/04/2024" "Supraja"           0     0
"Analytic Insight" "23/04/2024" "Prathima"         .1  6.18
"Analytic Insight" "23/04/2024" "Supraja"         .14  6.04
"Analytic Insight" "23/04/2024" "P.Sravanthi"       0  8.57
"Analytic Insight" "23/04/2024" "Prathima"        .22  6.11
"Analytic Insight" "23/04/2024" "Shiva Ganesh"    .12  9.29
"Analytic Insight" "23/04/2024" "P.Sravanthi"       0  4.41
"Analytic Insight" "22/04/2024" "P.Sravanthi"     .51  6.11
"Analytic Insight" "22/04/2024" "Rachana Saha"      0  7.28
"Analytic Insight" "21/04/2024" "Pardeep Sharma"    0  6.39
"Analytic Insight" "21/04/2024" "Nitesh Kumar"   1.67  7.31
"Analytic Insight" "21/04/2024" "Pardeep Sharma"    0  6.21
"Analytic Insight" "21/04/2024" "Nitesh Kumar"   1.33  6.87
"Analytic Insight" "20/04/2024" "Sai Chaitanya"     0  6.29
"Analytic Insight" "20/04/2024" "Rachana Saha"      0  3.02
"Analytic Insight" "19/04/2024" "S Akash"           0  3.18
"Analytic Insight" "19/04/2024" "Supraja"           0  6.13
"Analytic Insight" "19/04/2024" "IndustryTrends"   .5  5.62
"Analytic Insight" "19/04/2024" "Rachana Saha"      0  4.43
"Analytic Insight" "18/04/2024" "Prathima"          0  7.41
"Analytic Insight" "18/04/2024" "Prathima"          0  7.57
"Analytic Insight" "18/04/2024" "Shiva Ganesh"    .17  3.28
"Analytic Insight" "18/04/2024" "P.Sravanthi"     .16   .47
"Analytic Insight" "17/04/2024" "Pardeep Sharma" 1.48   5.7
"Analytic Insight" "16/04/2024" "Parvin Mohmad"   .32   4.2
"Analytic Insight" "16/04/2024" "P.Sravanthi"     .37  5.97
"Analytic Insight" "16/04/2024" "S Akash"           0  6.89
"Analytic Insight" "16/04/2024" "greeshmitha"     .67  4.86
"Analytic Insight" "15/04/2024" "Parvin Mohmad"   .17  4.35
"Analytic Insight" "15/04/2024" "Parvin Mohmad"     0   .56
"Analytic Insight" "14/04/2024" "Nitesh Kumar"      0  2.82
"Analytic Insight" "14/04/2024" "greeshmitha"       0  3.74
"Analytic Insight" "14/04/2024" "Nitesh Kumar"      0  1.18
"Analytic Insight" "14/04/2024" "Rachana Saha"      0   6.3
"Analytic Insight" "14/04/2024" "sumedha"         .88 10.68
"Analytic Insight" "13/04/2024" "Parvin Mohmad"   1.1  6.62
"Analytic Insight" "13/04/2024" "Pardeep Sharma"    0  6.97
"Analytic Insight" "13/04/2024" "Pardeep Sharma"  .31  6.61
"Analytic Insight" "13/04/2024" "Harshini"          0  4.71
"Analytic Insight" "13/04/2024" "Prathima"          0  6.55
"Analytic Insight" "12/04/2024" "P.Sravanthi"     .88  6.93
"Analytic Insight" "12/04/2024" "greeshmitha"       0  2.68
"Analytic Insight" "12/04/2024" "Pardeep Sharma"    0  9.67
"Analytic Insight" "11/04/2024" "Shiva Ganesh"   1.49  6.39
"Analytic Insight" "11/04/2024" "P.Sravanthi"       0  2.96
"Analytic Insight" "11/04/2024" "P.Sravanthi"       0  1.07
"Analytic Insight" "11/04/2024" "P.Sravanthi"     .14  2.03
"Analytic Insight" "11/04/2024" "greeshmitha"     .18   4.2
"Analytic Insight" "11/04/2024" "Rachana Saha"    .96  6.12
"Analytic Insight" "11/04/2024" "sumedha"         .42  4.66
"Analytic Insight" "10/04/2024" "P.Sravanthi"     .31  5.33
"Analytic Insight" "10/04/2024" "P.Sravanthi"       0  2.98
"Analytic Insight" "10/04/2024" "Shiva Ganesh"    .21  5.76
"Analytic Insight" "10/04/2024" "Harshini"          0  4.38
"Analytic Insight" "10/04/2024" "P.Sravanthi"       0  2.99
"Analytic Insight" "09/04/2024" "greeshmitha"     .57  4.38
"Analytic Insight" "09/04/2024" "P.Sravanthi"     .31  5.16
"Analytic Insight" "09/04/2024" "Rachana Saha"      0  6.42
end

Given the large number of unique authors, creating dummy variables for each author is not feasible (Stata BE only supports matrices with up to 800 rows or columns). I am considering using a mixed-effects model to account for the variability between authors.

I have a few questions and would appreciate any advice or suggestions:
  1. Is the mixed-effects model the best approach to control for the author effect given the large number of authors?
  2. Are there any other methods or best practices in Stata that could handle this situation more effectively?
  3. Any recommendations on model diagnostics or validation techniques to ensure the robustness of my results?
Thank you in advance for your help!

Examine correlations within groups

$
0
0
Hi,

I have a conceptual question regarding panel data analysis.

Consider the following panel data where Firm_ID identifies a firm, CEO_ID identifies the respective CEO in place during the fiscal year (CC_FY). Each firm in my sample underwent exactly one CEO change. NL represents a psychological construct of the CEO, and NL_Median refers to the median of this construct over the CC_FY for a given CEO in a firm. I want to understand how NL is correlated within a firm across different CEOs based on NL_Median.

Code:
* Example generated by -dataex-. For more info, type help dataex
clear
input double Firm_ID int CC_FY long CEO_ID float(NL NL_Median)
4295899290 2010 18101 -1.0617353 -1.0617353
4295899290 2010 18101 -1.0617353 -1.0617353
4295899290 2015 35166   .7557002  1.1048557
4295899290 2013 35166  1.4971353  1.1048557
4295899290 2013 35166  1.4971353  1.1048557
4295899290 2013 35166  1.4971353  1.1048557
4295899290 2013 35166  1.4971353  1.1048557
4295899290 2012 35166  -1.130444  1.1048557
4295899290 2014 35166  1.1048557  1.1048557
4295899290 2012 35166  -1.130444  1.1048557
4295899290 2014 35166  1.1048557  1.1048557
4295899290 2014 35166  1.1048557  1.1048557
4295899290 2014 35166  1.1048557  1.1048557
4295899290 2012 35166  -1.130444  1.1048557
4295899323 2011 30552    -.66134    -.66134
4295899323 2011 30552    -.66134    -.66134
4295899323 2011 30552    -.66134    -.66134
4295899323 2017 31112 -2.9109335 -2.9109335
4295899323 2015 31112  -2.444447 -2.9109335
4295899323 2014 31112  -2.857669 -2.9109335
4295899323 2017 31112 -2.9109335 -2.9109335
4295899323 2016 31112   -3.69725 -2.9109335
4295899323 2014 31112  -2.857669 -2.9109335
4295899323 2016 31112   -3.69725 -2.9109335
4295899323 2016 31112   -3.69725 -2.9109335
4295899323 2017 31112 -2.9109335 -2.9109335
4295899323 2018 31112 -2.3553092 -2.9109335
end
Please ignore that some observations refer to the same CC_FY, producing duplicates.

My initial idea was to create a variable CNS_Median_First, which holds the NL_Median values for the first CEO, and a variable CNS_Median_Second, which holds the NL_Median values for the second CEO of each firm. Then, I could calculate the correlation between these two variables by Firm_ID, i.e., within each specific firm. This would give me a set of correlation coefficients.

However, I'm unsure how to proceed from here. I feel that simply taking the average of all these correlation coefficients might not be sufficient.

Any advice on how to properly analyze this correlation would be greatly appreciated.

Thank you!

Comparing different t-tests and wilcoxon signed rank tests

IV regression, no observations, within and between group

$
0
0
Dear community,

I have tried many things but due to a thought error, I cannot find the solution. I would be very grateful if you could help me.

I want to analyze the following equation:

redistribution=β0+β1⋅intergenerational mobility1 + u

According to Alesina et al. (2018), intergenerational mobility is significantly correlated with redistribution, so they use an experiment to generate exogenous variation for intergenerational mobility and use this as an IV (Instrumental Variable).

I have coded the experiment so that I have the variables for the control group and the treatment group:
  • Control group: mobility1 & redistribution
  • Treatment group: mobilitytreatment1 & redistributiontreatment
When I stay within one of the groups, I get a regression output:

reg redistribution mobility1

Source | SS df MS Number of obs = 34
-------------+---------------------------------- F(1, 32) = 0.02
Model | .019916327 1 .019916327 Prob > F = 0.8862
Residual | 30.5977307 32 .956179085 R-squared = 0.0007
-------------+---------------------------------- Adj R-squared = -0.0306
Total | 30.6176471 33 .927807487 Root MSE = .97784

------------------------------------------------------------------------------
redistribu~n | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
mobility1 | .0027527 .019073 0.14 0.886 -.0360977 .041603
_cons | 6.24398 .2207855 28.28 0.000 5.794255 6.693705
------------------------------------------------------------------------------

But when I try to perform an IV regression, I get an error:

ivregress 2sls redistribution ( mobility1 = mobilitytreatment1)
no observations
r(2000);

When I run another regression to check and stay within the control group, I get another regression output:

ivregress 2sls redistribution ( mobility1 = mobility2)

Instrumental variables 2SLS regression Number of obs = 34
Wald chi2(1) = 0.51
Prob > chi2 = 0.4742
R-squared = .
Root MSE = .95729

------------------------------------------------------------------------------
redistribu~n | Coefficient Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
mobility1 | .0173522 .0242436 0.72 0.474 -.0301644 .0648687
_cons | 6.134054 .2455078 24.99 0.000 5.652868 6.615241
------------------------------------------------------------------------------
Instrumented: mobility1
Instruments: mobility2

Conclusion: Within group it works, between group it does not work.


What am I doing wrong? Many thanks in advance!

Stata 18 Makespline vs mkspline (updates make it error prone for omitting spline vars and misspecifying models)

$
0
0
I just upgraded to Stata 18 and I was adding splines into a model using the new makespline command and I've come across two issues that make it less user-friendly than the old mkspline version. I'm writing this post as a general heads up to users of Stata 18 (and a request to Stata to consider adding old features back into Makespline). It took me a very long (and painful) afternoon to figure out why I was getting different results between the two packages with the same number of knots so I'm just writing this post in case anyone else encounters a similar issue.

1) In the newer Stata 18 makespline version the "basis(varname)" on Makespline doesn't include the first knot, whereas in the older mkspline this is included by default. Without having the first knot automatically included (ex: varname_1, ...varname_k), it's easy when adding it into a regression model to just enter varname* and miss the first knot, and misspecify your splines (i.e. drop one out of the model and get different overall predictions). The model coefficients if you specify varname* will run varname_2, ... varname_k .

2) in mkspline, there is an option to display the points at which the knots in the spline lie. This is helpful for adding into a footnote of a table and understanding how the splines are being specified/what the effect of the "harrel" option has on the knot placement. I can't find this feature in the new version.


Here's an example below of the issue (base code modified from a post by Maarten Buis - linked here: https://www.stata.com/statalist/arch.../msg00311.html
__________________________________________________ _______

log using statalist_ex_splines
sysuse nlsw88, clear
gen ln_w = ln(wage)

*Old version mkspline (restricted cubic spine with 5 knots, the display shows that the knots are correctly specified at Harrell 2001 percentiles)
mkspline ten_mk = tenure, cubic nknots(5) displayknots
reg ln_w ten_mk*
predict yhat_mk
label var yhat_mk "Older - Mkspline"

*Newer makespline, specified as above. Restricted cubic spline with 5 knots, placed as per Harrel (2001)
makespline rcs tenure, knots(5) basis(ten_make) order(3) harrell
reg ln_w ten_make* //nb: only 3 betas included vs 4 when using mkspline with the same number of knots
predict yhat_make
label var yhat_make "Stata 18 - Makespline"

* Output plot of predicted values based on splines
sort tenure
twoway line yhat_mk tenure, lcolor(blue) lpattern(dash) sort || ///
line yhat_make tenure, lcolor(orange) sort ///
title(Fitted values for the effect of tenure on ln(wage)) subtitle(mkspline and makespline confusion)
*** Really not clear why restricted cubic splines from mkspline and makespline differ !!

*Updated model, now including original base variable "tenure"
reg ln_w tenure ten_make*
predict yhat_make_basevar
label var yhat_make_basevar "Stata 18 - Makespline with basevar"

twoway line yhat_mk tenure, lcolor(blue) lpattern(dash) sort || ///
line yhat_make tenure, lcolor(orange) sort || ///
line yhat_make_basevar tenure, lcolor(red) lpattern(dash) sort ///
title(Fitted values for the effect of tenure on ln(wage)) subtitle(mkspline vs makespline demo)
*** now the Makespline version is the same as mkspline output.

Hurdle Poisson Model Post Estimation - Marginal Effects

$
0
0
I'm looking for the code for calculating marginal effects for the logit and count part separately after fitting a hurdle poisson model.

How to merge without unique ID

$
0
0

Hi group members
I want to merge two data sets: one is on household level and the other is on individual level. The household level data has details about expenditure on around 42 items and individual data set have sociodemographic information individuals in a particular household. Both data sets have household id (hhid) but this variable is not unique due to which the data is not merging. Please suggest how can I merge these files:
.Data set 1
* Example generated by -dataex-. To install: ssc install dataex
clear
input str38 hhid int item_code
"HCES2022655621010121713017 101111 301" 139
"HCES2022655621010121713017 101111 310" 139
"HCES2022655621010121713017 101111 311" 139
"HCES2022655621010121713017 101111 313" 139
"HCES2022655371010122023014 201111 202" 139
"HCES2022655261010122023016 101112 201" 139
"HCES2022655261010122023016 101112 301" 139
"HCES2022655261010122023016 101112 303" 139
"HCES2022655261010122023016 101112 306" 139
"HCES2022655261010122023016 101112 307" 139
"HCES2022655261010122023016 101112 308" 139
"HCES2022655261010122023016 101112 309" 139
"HCES2022655261010122023016 101112 310" 139
"HCES2022655261010122023016 101112 311" 139
Data set 2
* Example generated by -dataex-. To install: ssc install dataex
clear
input str38 hhid float person_no
"HCES2022310002282831212025 228111 301" 1
"HCES2022310002282831212025 228111 302" 5
"HCES2022310002282831212025 228111 302" 3
"HCES2022310002282831212025 228111 302" 4
"HCES2022310002282831212025 228111 302" 2
"HCES2022310002282831212025 228111 302" 1
"HCES2022310002282831212025 228111 303" 2
"HCES2022310002282831212025 228111 303" 5
"HCES2022310002282831212025 228111 303" 4
"HCES2022310002282831212025 228111 303" 1
"HCES2022310002282831212025 228111 303" 3
"HCES2022310002282831212025 228111 304" 2
"HCES2022310002282831212025 228111 304" 1
"HCES2022310002282831212025 228111 305" 1
"HCES2022310002282831212025 228111 306" 1
"HCES2022310002282831212025 228111 307" 1

Generating a mean with time-series operators

$
0
0
Hi everyone,
I am a beginner in STATA and have problems with an issue. I am working with World Bank data, where I have Gini coefficients for different country-years.
country year gini
Albania 2008 30
Albania 2009 .
Albania 2010 .
Albania 2011 .
Albania 2012 29
Albania 2013 .
Albania 2014 34.6
Albania 2015 32.8
Albania 2016 33.7
Albania 2017 33.1
Albania 2018 30.1
Albania 2019 30.1
Albania 2020 29.4


For each year, I would like to generate the mean of all the values in the timeframe between the previous five years and the following five years. For example, for 2013, I would like to generate the mean of the Gini coefficients for Albania between 2008 and 2018.

I tried with the following code, but the problem is that the mean is not calculated as soon as there is a missing value in the timeframe. However, I would like missings to just be ignored.

Code:
by country: gen sum_gini = gini[_n-5] + gini[_n-4] + gini[_n-3] + gini[_n-2] + gini[_n-1] + gini + gini[_n+1] + gini[_n+2] + gini[_n+3] + gini[_n+4] + gini[_n+5]
by country: gen count_nonmissing = !missing(gini[_n-5]) + !missing(gini[_n-4]) + !missing(gini[_n-3]) + !missing(gini[_n-2]) + !missing(gini[_n-1]) + !missing(gini) + !missing(gini[_n+1]) + !missing(gini[_n+2]) + !missing(gini[_n+3]) + !missing(gini[_n+4]) + !missing(gini[_n+5])
by country: gen gini2 = sum_gini_ gini / count_nonmissing
Thank you
Anna

Comparing wilcoxon signed rank tests and t-tests.

$
0
0
Hi, I'm doing research on different sustainability practices between sectors. I have data at time t-1 and t+2 for different firms and want to check whether different practices are more prevalent in some sectors. I want to test if a practice is prevalent in a sector by doing either a t-test or a Wilcoxon signed rank test. If the variable changes significantly from t-1 to t+2 then it indicates that a practice is used in a sector. I was wondering if it is possible to compare different t-tests or wilcoxon signed rank tests by comparing the significance between sectors so that I can interpret that one practice is more prevalent in a sector if it is significant for one sector and not for another?
Kind regards,
Berend

histograms of categorical variables

$
0
0
Hello everyone,
first post in the forum so apologies if something is misspecified/does not follow all publication guideliens.

I have an issue with the following code

twoway (histogram quality_8 if treatment == 0, discrete frequency width(0.4) start(0) color(blue%50)) ///
(histogram quality_8 if treatment == 1, discrete frequency width(0.4) start(2) color(red%50)), ///
xlabel(1 "Outside covered" 2 "Outside not covered" 3 "Inside covered" 4 "Inside not covered", angle(45)) ///
xtitle("") ///
ytitle("Frequency") ///
legend(order(1 "Control" 2 "Treatment") position(6)) ///
title("Distribution of quality_8 by Treatment Status") ///
ylabel(, angle(horizontal))

it should create a histogram for a categorical variable (quality_8) that has 5 categories (I chose the histogram environment has only one category is present in the data, and using bar graph it does not display on the x axis the categories with missing observations), sorted by treatment.
The issue is that the bars are overlapping, even if I set two different starts.

Any way to keep the same code and have side by side bars?

Many thanks in advance.

Error merging two files

$
0
0
Below is a section of my do-file. I am trying to merge two files, HW and KR. The code below runs well up till this line - merge_kr_hw, KR("`kr2003'") HW("`hw2003'")
local merged2003 `r(merged_file)' returning error message "option kr() required". Can anyone please help resolve the issue?

// Function to merge KR and HW files
program define merge_kr_hw, rclass
syntax, KR(string) HW(string)

clear
// Load KR file
use "`KR'", clear
describe
// Check for duplicate keys in KR file
duplicates report caseid midx
if r(N) > 0 {
di as error "Error: Duplicate keys found in `KR'"
exit 1
}

// Load HW file
di "Merging with HW file: `HW'"
merge 1:1 caseid midx using "`HW'"

// Check for duplicate keys in HW file
duplicates report caseid midx
if r(N) > 0 {
di as error "Error: Duplicate keys found in `HW'"
exit 1
}

// Handle merge results
drop if _merge == 2 | _merge == 3
drop _merge

// Save merged file
tempfile merged
save `"`merged'"', replace

return local merged_file `"`merged'"'
end

// Ensure KR and HW file paths are defined
local kr2003 "${path}/NG_2003_DHS_07022024_1257_196132/NGKR4BDT/NGKR4BFL.dta"
local hw2003 "${path}/NG_2003_DHS_07022024_1257_196132/NGHW4BDT/NGHW4BFL.dta"
local kr1990 "${path}/NG_1990_DHS_02072024_824_196132/NGKR21DT/NGKR21FL.dta"
local hw1990 "${path}/NG_1990_DHS_02072024_824_196132/NGHW21DT/NGHW21FL.dta"

// Merge KR and HW for each year
merge_kr_hw, KR("`kr2003'") HW("`hw2003'")
local merged2003 `r(merged_file)'

merge_kr_hw, KR("`kr1990'") HW("`hw1990'")
local merged1990 `r(merged_file)'

Handling attrition and change in eligibility variable value in a repeated measure GEE model using xtgee

$
0
0
Hello,

I need your advice on a repeated measure analysis (Baseline-Endline) using xtgee. This is my first time of applying GEE. The participants are subdivided into pregnant and non-pregnant women (75% to 25% ratio). Questions were designed in a way that some are general while some depend on who is pregnant and not pregnant.
My questions are as follows:
1. In a situation where there is attrition between baseline and endline, do I need to drop those who were lost to follow-up? How does xtgee handle them if they are not dropped?
2. In a situation where some non-pregnant women at baseline(responded to questions meant for non-pregnant and general questions) became pregnant at endline (responded to questions meant for pregnant women and general questions), is it ok to ensure that only women with consistent pregnancy status are involved in the model or how does xtgee handles it if consistency of pregnancy status is not made a requirement to be included in the model. I was thinking that they should be left out since they no longer have repeated measures for the questions they responded to at the baseline but I am not sure of my line of thought.

The panel variable is at the individual level.

thanks.

Strange errors about missing variables with threshold command

$
0
0
Hi all,

I am receiving strange error messages about missing variables when running the threshold command. Two examples are:

Code:
variable siffra not found
or
Code:
variable __00001H not found
Neither of these variables are specified in my command or available in the used data set.

Unfortunately, I have a hard time creating a reproducible example as, for example, in the following code, the threshold command runs without error:

Code:
clear

input str25 country int year float trust_most
"ARGENTINA" 1984 .2607261
"ARGENTINA" 1991 .2330905
"ARGENTINA" 1995 .1827731
"ARGENTINA" 1999 .1587811
"ARGENTINA" 2006 .1688708
"ARGENTINA" 2013 .2350472
"ARGENTINA" 2017 .207304
"ARMENIA" 1997 .2427184
"ARMENIA" 2008 .2046477
"ARMENIA" 2011 .0908174
"ARMENIA" 2018 .2547236
"ARMENIA" 2021 .0780018
"AUSTRIA" 1990 .3184615
"AUSTRIA" 1999 .3342756
"AUSTRIA" 2008 .3643755
"AUSTRIA" 2018 .4894811
"AUSTRALIA" 1981 .4874776
"AUSTRALIA" 1995 .4018568
"AUSTRALIA" 2005 .480949
"AUSTRALIA" 2012 .5461432
"AUSTRALIA" 2018 .5419474
end

threshold trust_most, threshvar(year)
But if I run the following, with exactly the same data set:

Code:
use "/Users/MG/Dropbox/TimeUse/Data/WVS/Trust.dta", clear
keep country year trust_most
keep if country=="ARGENTINA" | country=="AUSTRALIA"  | country=="ARMENIA" | country=="AUSTRIA"
threshold trust_most, threshvar(year)
Array

I am getting the
Code:
variable siffra not found
message.

I have run
Code:
update all
in the hope that this will solve the issue, but the issue persists. (I am using Stata/SE 17.0)
Thank you for your help.

Spatially lagged endogenous variable

$
0
0
Dear All,

I estimate a spatial model (specifically a SARAR). One of the variables in the model is likely to be endogenous. Therefore I use the command spivreg. Additionally, I believe that this endogenous variable is also spatially correlated.

When reading the user manual about spivreg, I found:

"You cannot include in the model spatial lags of the endogenous regressors or spatial lags of the excluded exogenous regressors."

Now my question is whether it is impossible to do that or whether Stata cannot do that. I tried to search online but I did not find any conclusive answer about whether it is possible to estimate a model with a spatially lagged endogenous variable.

Any suggestion would be highly appreciated.

Best regards,

Dario

Predictive performance of mixclogit

$
0
0
Dear all,

I would like to assess in-sample and out-of-sample predictive performance of mixclogit.

Could you instruct me to do this?

Thank you.

Best regards.

Issues with Infinite Loop in Calculating Complex Ownership Structures (Pyramids, Cross-Holding, Circular, Dual Cross-Holding)

$
0
0
Hello,
I am a PhD student working on a project that involves identifying ownership structure patterns, which I will subsequently use to calculate cash flow rights and control rights. I am particularly following the methodology outlined in the paper

Aslan, H., & Kumar, P. (2012). Strategic ownership structure and the cost of debt. The Review of Financial Studies, 25(7), 2257-2299.

On page 15 of this paper, the authors categorize ownership structures into four distinct types: Pyramids, Cross-Holding, Circular, and Dual Cross-Holding. These classifications are precisely what I need for my research. Please refer to the accompanying image or the main text for further details.

Array

I have a dataset that I'm working with, and I would like to share a few rows for your reference. Please let me know if these sample rows are sufficient or if you need additional data.
I am using the dataset from the following paper, which is available on their website:
Schwartz-Ziv, Miriam, and Ekaterina Volkova. "Is blockholder diversity detrimental?." Management Science (2024).

Code:
* Example generated by -dataex-. For more info, type help dataex
clear
input long blockholder_cik str149 blockholder_name long company_cik str110 company_name int year float position
   9015 "BABSON DAVID L & CO INC"                     20 "K TRON INTERNATIONAL INC" 1997 10.69
   9015 "BABSON DAVID L & CO INC"                     20 "K TRON INTERNATIONAL INC" 1998 10.69
   9015 "BABSON DAVID L & CO INC"                     20 "K TRON INTERNATIONAL INC" 1999 11.29
   9015 "BABSON DAVID L & CO INC"                     20 "K TRON INTERNATIONAL INC" 2000   4.9
  50341 "FLEETBOSTON FINANCIAL CORP"                  20 "K TRON INTERNATIONAL INC" 2000   5.8
  50341 "FLEETBOSTON FINANCIAL CORP"                  20 "K TRON INTERNATIONAL INC" 2001  6.03
  50341 "FLEETBOSTON FINANCIAL CORP"                  20 "K TRON INTERNATIONAL INC" 2002  5.97
  50341 "FLEETBOSTON FINANCIAL CORP"                  20 "K TRON INTERNATIONAL INC" 2003  6.06
  70858 "BANK OF AMERICA CORP /DE/"                   20 "K TRON INTERNATIONAL INC" 2004     5
  80255 "PRICE T ROWE ASSOCIATES INC /MD/"            20 "K TRON INTERNATIONAL INC" 1995   6.4
  80255 "PRICE T ROWE ASSOCIATES INC /MD/"            20 "K TRON INTERNATIONAL INC" 1996   7.9
  80255 "PRICE T ROWE ASSOCIATES INC /MD/"            20 "K TRON INTERNATIONAL INC" 1997   7.8
  80255 "PRICE T ROWE ASSOCIATES INC /MD/"            20 "K TRON INTERNATIONAL INC" 1998   8.2
  80255 "PRICE T ROWE ASSOCIATES INC /MD/"            20 "K TRON INTERNATIONAL INC" 1999   9.1
  80255 "PRICE T ROWE ASSOCIATES INC /MD/"            20 "K TRON INTERNATIONAL INC" 2000  10.5
  80255 "PRICE T ROWE ASSOCIATES INC /MD/"            20 "K TRON INTERNATIONAL INC" 2001  10.5
  80255 "PRICE T ROWE ASSOCIATES INC /MD/"            20 "K TRON INTERNATIONAL INC" 2002  10.5
  80255 "PRICE T ROWE ASSOCIATES INC /MD/"            20 "K TRON INTERNATIONAL INC" 2003  10.5
  80255 "PRICE T ROWE ASSOCIATES INC /MD/"            20 "K TRON INTERNATIONAL INC" 2004  10.4
  80255 "PRICE T ROWE ASSOCIATES INC /MD/"            20 "K TRON INTERNATIONAL INC" 2005   9.9
  80255 "PRICE T ROWE ASSOCIATES INC /MD/"            20 "K TRON INTERNATIONAL INC" 2006   9.9
  80255 "PRICE T ROWE ASSOCIATES INC /MD/"            20 "K TRON INTERNATIONAL INC" 2007   9.4
  80255 "PRICE T ROWE ASSOCIATES INC /MD/"            20 "K TRON INTERNATIONAL INC" 2008   9.4
  80255 "PRICE T ROWE ASSOCIATES INC /MD/"            20 "K TRON INTERNATIONAL INC" 2009     9
 354204 "DIMENSIONAL FUND ADVISORS INC"               20 "K TRON INTERNATIONAL INC" 1998  5.04
 354204 "DIMENSIONAL FUND ADVISORS INC"               20 "K TRON INTERNATIONAL INC" 1999  4.82
 354204 "DIMENSIONAL FUND ADVISORS INC"               20 "K TRON INTERNATIONAL INC" 2000  5.72
 354204 "DIMENSIONAL FUND ADVISORS INC"               20 "K TRON INTERNATIONAL INC" 2001  5.71
 354204 "DIMENSIONAL FUND ADVISORS INC"               20 "K TRON INTERNATIONAL INC" 2002     5
 354204 "DIMENSIONAL FUND ADVISORS INC"               20 "K TRON INTERNATIONAL INC" 2003     5
 888002 "AXA FINANCIAL INC"                           20 "K TRON INTERNATIONAL INC" 2007   5.2
 904571 "GOLDMAN SACHS GROUP LP"                      20 "K TRON INTERNATIONAL INC" 1998   7.8
 906304 "ROYCE & ASSOCIATES LLC"                      20 "K TRON INTERNATIONAL INC" 2007  4.84
 929372 "PARADIGM CAPITAL MANAGEMENT INC /NY/ /ADV"   20 "K TRON INTERNATIONAL INC" 1998   5.7
 937394 "HEARTLAND ADVISORS INC"                      20 "K TRON INTERNATIONAL INC" 1999   6.7
 937394 "HEARTLAND ADVISORS INC"                      20 "K TRON INTERNATIONAL INC" 2000  11.4
 937394 "HEARTLAND ADVISORS INC"                      20 "K TRON INTERNATIONAL INC" 2001  11.3
 937394 "HEARTLAND ADVISORS INC"                      20 "K TRON INTERNATIONAL INC" 2002  11.4
 937394 "HEARTLAND ADVISORS INC"                      20 "K TRON INTERNATIONAL INC" 2003  11.3
 937394 "HEARTLAND ADVISORS INC"                      20 "K TRON INTERNATIONAL INC" 2004  7.73
 937394 "HEARTLAND ADVISORS INC"                      20 "K TRON INTERNATIONAL INC" 2005   5.7
 944808 "LIBERTY INVESTMENT MANAGEMENT INC/"          20 "K TRON INTERNATIONAL INC" 1997  6.15
1037792 "PARADIGM CAPITAL MANAGEMENT INC/NY"          20 "K TRON INTERNATIONAL INC" 2001   9.7
1037792 "PARADIGM CAPITAL MANAGEMENT INC/NY"          20 "K TRON INTERNATIONAL INC" 2002     8
1037792 "PARADIGM CAPITAL MANAGEMENT INC/NY"          20 "K TRON INTERNATIONAL INC" 2003   7.2
1037792 "PARADIGM CAPITAL MANAGEMENT INC/NY"          20 "K TRON INTERNATIONAL INC" 2004   7.3
1037792 "PARADIGM CAPITAL MANAGEMENT INC/NY"          20 "K TRON INTERNATIONAL INC" 2005   6.7
1037792 "PARADIGM CAPITAL MANAGEMENT INC/NY"          20 "K TRON INTERNATIONAL INC" 2006     5
1088084 "GOLDMAN SACHS ASSET MANAGEMENT/"             20 "K TRON INTERNATIONAL INC" 1999  10.5
1105838 "ROBOTTI ROBERT"                              20 "K TRON INTERNATIONAL INC" 2001   7.5
1105838 "ROBOTTI ROBERT"                              20 "K TRON INTERNATIONAL INC" 2002   8.8
1105838 "ROBOTTI ROBERT"                              20 "K TRON INTERNATIONAL INC" 2003   9.5
1105838 "ROBOTTI ROBERT"                              20 "K TRON INTERNATIONAL INC" 2004   9.5
1105838 "ROBOTTI ROBERT"                              20 "K TRON INTERNATIONAL INC" 2005   9.4
1105838 "ROBOTTI ROBERT"                              20 "K TRON INTERNATIONAL INC" 2006   9.4
1105838 "ROBOTTI ROBERT"                              20 "K TRON INTERNATIONAL INC" 2007   7.8
1145949 "CLOUES EDWARD B II"                          20 "K TRON INTERNATIONAL INC" 2001   9.3
1145949 "CLOUES EDWARD B II"                          20 "K TRON INTERNATIONAL INC" 2002   9.3
1145949 "CLOUES EDWARD B II"                          20 "K TRON INTERNATIONAL INC" 2003   9.3
1145949 "CLOUES EDWARD B II"                          20 "K TRON INTERNATIONAL INC" 2004   9.5
1145949 "CLOUES EDWARD B II"                          20 "K TRON INTERNATIONAL INC" 2005   9.5
1145949 "CLOUES EDWARD B II"                          20 "K TRON INTERNATIONAL INC" 2006  11.1
1145949 "CLOUES EDWARD B II"                          20 "K TRON INTERNATIONAL INC" 2007  11.1
1145949 "CLOUES EDWARD B II"                          20 "K TRON INTERNATIONAL INC" 2008  11.1
1145949 "CLOUES EDWARD B II"                          20 "K TRON INTERNATIONAL INC" 2009   8.9
1145949 "CLOUES EDWARD B II"                          20 "K TRON INTERNATIONAL INC" 2010   7.5
1328618 "Nichols James William"                       20 "K TRON INTERNATIONAL INC" 2005   6.2
1620275 "Paradice Investment Management LLC"          63 "FNW BANCORP INC"          2020   5.3
  38777 "FRANKLIN RESOURCES INC"                    1750 "AAR CORP"                 2012   9.8
  38777 "FRANKLIN RESOURCES INC"                    1750 "AAR CORP"                 2013   9.3
  38777 "FRANKLIN RESOURCES INC"                    1750 "AAR CORP"                 2014   9.7
  38777 "FRANKLIN RESOURCES INC"                    1750 "AAR CORP"                 2015  10.3
  38777 "FRANKLIN RESOURCES INC"                    1750 "AAR CORP"                 2016  12.3
  38777 "FRANKLIN RESOURCES INC"                    1750 "AAR CORP"                 2017  10.9
  70858 "BANK OF AMERICA CORP /DE/"                 1750 "AAR CORP"                 2009   6.4
  70858 "BANK OF AMERICA CORP /DE/"                 1750 "AAR CORP"                 2010     5
  72971 "NORWEST CORP"                              1750 "AAR CORP"                 1994   6.7
  72971 "NORWEST CORP"                              1750 "AAR CORP"                 1995   8.7
  72971 "NORWEST CORP"                              1750 "AAR CORP"                 1996     5
 102909 "VANGUARD GROUP INC"                        1750 "AAR CORP"                 2011  5.04
 102909 "VANGUARD GROUP INC"                        1750 "AAR CORP"                 2012  5.75
 102909 "VANGUARD GROUP INC"                        1750 "AAR CORP"                 2013  5.93
 102909 "VANGUARD GROUP INC"                        1750 "AAR CORP"                 2014  6.23
 102909 "VANGUARD GROUP INC"                        1750 "AAR CORP"                 2015  7.06
 102909 "VANGUARD GROUP INC"                        1750 "AAR CORP"                 2016  8.06
 102909 "VANGUARD GROUP INC"                        1750 "AAR CORP"                 2017  8.85
 102909 "VANGUARD GROUP INC"                        1750 "AAR CORP"                 2018  9.58
 102909 "VANGUARD GROUP INC"                        1750 "AAR CORP"                 2019  9.96
 102909 "VANGUARD GROUP INC"                        1750 "AAR CORP"                 2020  9.45
 315066 "FMR CORP"                                  1750 "AAR CORP"                 2003 10.11
 315066 "FMR CORP"                                  1750 "AAR CORP"                 2004 10.11
 315066 "FMR CORP"                                  1750 "AAR CORP"                 2005  8.35
 315066 "FMR CORP"                                  1750 "AAR CORP"                 2006 10.85
 354204 "DIMENSIONAL FUND ADVISORS INC"             1750 "AAR CORP"                 1999  7.06
 354204 "DIMENSIONAL FUND ADVISORS INC"             1750 "AAR CORP"                 2000  7.56
 354204 "DIMENSIONAL FUND ADVISORS INC"             1750 "AAR CORP"                 2001  7.69
 354204 "DIMENSIONAL FUND ADVISORS INC"             1750 "AAR CORP"                 2002     5
 354204 "DIMENSIONAL FUND ADVISORS INC"             1750 "AAR CORP"                 2003   6.7
 354204 "DIMENSIONAL FUND ADVISORS INC"             1750 "AAR CORP"                 2004  8.25
 354204 "DIMENSIONAL FUND ADVISORS INC"             1750 "AAR CORP"                 2005  8.48
end
Listed 100 out of 518009 observations

and I found the following Statalist posts very helpful:
https://www.statalist.org/forums/for...s-observations
https://www.statalist.org/forums/for...d-observations

Based on these discussions, I tried using the following code to identify the ownership structures in my dataset:

Code:
clear all
import delimited "C:\Users\Downloads\ownership.csv"
* Create a temporary file for the data
tempfile data
save `data'
* Generate list of largest owner by company
bys company_name (position): gen level1 = blockholder_name[_N]
* Tag ties
bys company_name (position): gen tie = position[_N] == position[_N-1]
* Data set of largest shareholders
contract company_name level1
drop _freq
tempfile level1
sort level1
save `level1'
clear
use `data'
merge m:1 company_name using `level1'
drop if _merge == 2
drop _merge
sort level1
save `data', replace
* Initialize the iteration
local continue = 1
local iter = 1
while `continue' {
    local next_iter = `iter' + 1
    
    * Identify next level of ownership
    use `level1'
    rename (*) (level`iter' level`next_iter')
    sort level`iter'
    save `level1', replace
    
    use `data'
    merge m:1 level`iter' using `level1'
    drop if _merge == 2
    drop _merge
    
    * Check if any new levels were added
    count if !missing(level`next_iter')
    local new_levels = r(N)
    
    * Save the updated data
    save `data', replace
    
    * Update iteration counter
    local iter = `next_iter'
    
    * Stop if no new levels were added
    if `new_levels' == 0 local continue = 0
}
* Generate the ultimate parent variable
gen ultimate_parent = level`iter'
* Replace missing ultimate parents with the highest known level
forval i = `=`iter'-1'(-1)1 {
    replace ultimate_parent = level`i' if missing(ultimate_parent)
}
* Save the results
save "calculated_cash_flow_control_rights_pyramidal.dta", replace


However, my code is still running because it appears that each firm holds another, leading to an extensive and far-reaching chain of ultimate parents. I am concerned that I may be stuck in an infinite loop and not producing the desired results. I need to identify all four types of ownership structures: Pyramids, Cross-Holding, Circular, and Dual Cross-Holding. it would be appreciated if I can have an indicator (flag) for each so I can use afterwards whn computing cash flow rights and control rights!

Can anyone please help me resolve this issue?


these are the current results of the code:
Code:
   Result                      Number of obs
    -----------------------------------------
    Not matched                       499,088
        from master                   464,200  (_merge==1)
        from using                     34,888  (_merge==2)

    Matched                            53,809  (_merge==3)
    -----------------------------------------
(34,888 observations deleted)
  53,809
file C:\Users\Thea\AppData\Local\Temp\ST_535c_000001.tmp saved as .dta format
file C:\Users\Thea\AppData\Local\Temp\ST_535c_000002.tmp saved as .dta format

    Result                      Number of obs
    -----------------------------------------
    Not matched                       499,088
        from master                   464,200  (_merge==1)
        from using                     34,888  (_merge==2)

    Matched                            53,809  (_merge==3)
    -----------------------------------------
(34,888 observations deleted)
  53,809
file C:\Users\Thea\AppData\Local\Temp\ST_535c_000001.tmp saved as .dta format
file C:\Users\Thea\AppData\Local\Temp\ST_535c_000002.tmp saved as .dta format
I feel the code keeps repeating the output, but I'm not sure.



Regards,
Thea

Validity of my results - Gravity model of Trade - Difference-in-Hansen tests of exogeneity missing''

$
0
0

A dynamic panel data of 73 countries and 20 years. I am estimating my regression equation using the system-GMM command, however the Difference-in-Hansen tests of exogeneity of instrument subsets are missing. Is my model valid? ΑR(1), AR(2), SARGAN and Hansen Tests are good.




















Changing Colors of Individual Bars marginsplot

$
0
0
Hi Everyone,

I'm having an oddly difficult time trying to graph something basic in Stata.

I want to produce a graph from marginsplot that shows bars of k-colors. In the example below, there are four regions, and I would like each bar for NE, N Cntrl, South, and West to be its own color.

Code:
sysuse citytemp, clear 
regress tempjan tempjuly i.region cooldd 
margins, over(region) atmeans
*below only uses the first color
marginsplot, recast(bar) plotopts(color(red green blue orange))
*below produces an error
marginsplot, name(`dv', replace) recast(bar) plotopts(bar(1, color(green)) bar(2, color(red)) bar(3, color(orange)) bar(4, color(yellow)))
I'm kinda at a loss - how do I change individual bar colors following marginsplot?

Cheers,

David.

DAGs with multiple levels (and a dichotomous outcome)

$
0
0
I am currently working on a paper that involves a multilevel analysis examining the relationship between perceptions of neighborhood-based violence and leisure-time physical activity across Chicago neighborhoods following the COVID-19 stay-at-home order.

The paper has gone through a few rounds of peer reviews and one of the reviewers is advocating for something I am struggling to address – a Directed Acyclic Graph (DAG) that explores the mediating effects of a level-2 continuous variable called neighborhood safety rate (percentage of adults who report feeling safe within their neighborhood all or most of the time) on the (significant) effects of a level 1 independent variable (neighborhood violence – either “low” or “high”) on the level 1 dependent variable (physical activity in the past month – either “yes” or “no”).

I have not been able to find any clear guidance on whether mediation models/DAGs are possible with multilevel modeling that involves a dichotomous dependent variable (physical activity in the past month – either “yes” or “no”).

Can anyone provide guidance on how such an analysis may be carried out in Stata?


Plotting the Nonlinear Regression with Moderation Using xtdpdqml Quasi-Maximum Likelihood (QML)

$
0
0
Hello Statalist,

I'm using xtdpdqml for a quasi-maximum likelihood model to estimate a nonlinear relationship between "lc" and "zsc", with "pol_stab" moderating this relationship. The following stata code is used to run the regression:

xtdpdqml zsc lc lc*lc lc*pol_stab lc*lc*pol_stab capi size dep roe costincome gdp gdpcap inf pol_stab y1 y2 y3 y4 y5 y6 y7 y8 y9 y10 y11, fe vce(robust) nolog

Could someone please advise on how to visualize this nonlinearity using xtdpdqml command? I've tried marginsplot but encountered issues with predicted values ("default prediction is a function of possibly stochastic quantities other than e(b)").


Thank you for your assistance!
Viewing all 65116 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>