Dear all,
For those of you who use Stata for clustering of pairwise distance matrices derived from sequence analysis (or other sources), you may or may not be aware that Stata's builtin clustermat suite does not calculate stopping rules as one might expect. It operates on squared Euclidean distances between variables named in the variables() option , and not on the pairwise distance matrix.
I have recently uploaded two utilities to SSC (thanks to Kit Baum for his help with installation), which correctly calculate stopping rules based on the pairwise distance matrices: calinski and dudahart
To install:
. ssc install calinski
. ssc install dudahart
For more information see my short paper at https://osf.io/rjqe3
You may also find the related discrepancy and silhouette commands, which calculate Studer's discrepancy measure and cluster silhouette widths, respectively, also available from SSC.
If anyone is interested in "Partitioning Around Medoids" clustering from pairwise distance matrices in Stata, let me know, and I can make a preliminary version of a pam command available for testing.
Regards,
Brendan
For those of you who use Stata for clustering of pairwise distance matrices derived from sequence analysis (or other sources), you may or may not be aware that Stata's builtin clustermat suite does not calculate stopping rules as one might expect. It operates on squared Euclidean distances between variables named in the variables() option , and not on the pairwise distance matrix.
I have recently uploaded two utilities to SSC (thanks to Kit Baum for his help with installation), which correctly calculate stopping rules based on the pairwise distance matrices: calinski and dudahart
To install:
. ssc install calinski
. ssc install dudahart
For more information see my short paper at https://osf.io/rjqe3
You may also find the related discrepancy and silhouette commands, which calculate Studer's discrepancy measure and cluster silhouette widths, respectively, also available from SSC.
If anyone is interested in "Partitioning Around Medoids" clustering from pairwise distance matrices in Stata, let me know, and I can make a preliminary version of a pam command available for testing.
Regards,
Brendan