Estimates the optimal number of clusters (k
) using various methods.
mt_cluster_k( data, use = "sp_trajectories", dimensions = c("xpos", "ypos"), kseq = 2:15, compute = c("stability", "gap", "jump", "slope"), method = "hclust", weights = rep(1, length(dimensions)), pointwise = TRUE, minkowski_p = 2, hclust_method = "ward.D", kmeans_nstart = 10, n_bootstrap = 10, model_based = FALSE, n_gap = 10, na_rm = FALSE, verbose = FALSE )
data  a mousetrap data object created using one of the mt_import
functions (see mt_example for details). Alternatively, a trajectory
array can be provided directly (in this case 

use  a character string specifying which trajectory data should be used. 
dimensions  a character vector specifying which trajectory variables should be used. Can be of length 2 or 3, for twodimensional or threedimensional trajectories respectively. 
kseq  a numeric vector specifying set of candidates for k. Defaults to
2:15, implying that all values of k within that range are compared using
the metrics specified in 
compute  character vector specifying the to be computed measures. Can
be any subset of 
method  character string specifying the type of clustering procedure
for the stabilitybased method. Either 
weights  numeric vector specifying the relative importance of the
variables specified in 
pointwise  boolean specifying the way in which dissimilarity between
the trajectories is measured. If 
minkowski_p  an integer specifying the distance metric for the cluster
solution. 
hclust_method  character string specifying the linkage criterion used.
Passed on to the 
kmeans_nstart  integer specifying the number of reruns of the kmeans
procedure. Larger numbers minimize the risk of finding local minima. Passed
on to the 
n_bootstrap  an integer specifying the number of bootstrap comparisons
used by 
model_based  boolean specifying whether the modelbased or the
modelfree should be used by 
n_gap  integer specifying the number of simulated datasets used by

na_rm  logical specifying whether trajectory points containing NAs should be removed. Removal is done columnwise. That is, if any trajectory has a missing value at, e.g., the 10th recorded position, the 10th position is removed for all trajectories. This is necessary to compute distance between trajectories. 
verbose  logical indicating whether function should report its progress. 
A list containing two lists that store the results of the different
methods. kopt
contains the estimated k
for each of the
methods specified in compute
. paths
contains the values for
each k
in kseq
as computed by each of the methods specified
in compute
. The values in kopt
are optima for each of the
vectors in paths
.
mt_cluster_k
estimates the number of clusters (k
) using four
commonly used kselection methods (specified via compute
): cluster
stability (stability
), the gap statistic (gap
), the jump
statistic (jump
), and the slope statistic (slope
).
Cluster stability methods select k
as the number of clusters for which
the assignment of objects to clusters is most stable across bootstrap
samples. This function implements the modelbased and modelfree methods
described by Haslbeck & Wulff (2016). See references.
The remaining three methods select k
as the value that optimizes the
gap statistic (Tibshirani, Walther, & Hastie, 2001), the jump statistic
(Sugar & James, 2013), and the slope statistic (Fujita, Takahashi, &
Patriota, 2014), respectively.
For clustering trajectories, it is often useful that the endpoints of all trajectories share the same direction, e.g., that all trajectories end in the topleft corner of the coordinate system (mt_remap_symmetric or mt_align can be used to achieve this). Furthermore, it is recommended to use spatialized trajectories (see mt_spatialize; Wulff et al., in press; Haslbeck et al., 2018).
Haslbeck, J., & Wulff, D. U. (2016). Estimating the Number of Clusters via Normalized Cluster Instability. arXiv preprint arXiv:1608.07494.
Wulff, D. U., Haslbeck, J. M. B., Kieslich, P. J., Henninger, F., & SchulteMecklenbeck, M. (2019). Mousetracking: Detecting types in movement trajectories. In M. SchulteMecklenbeck, A. Kühberger, & J. G. Johnson (Eds.), A Handbook of Process Tracing Methods (pp. 131145). New York, NY: Routledge.
Haslbeck, J. M. B., Wulff, D. U., Kieslich, P. J., Henninger, F., & SchulteMecklenbeck, M. (2018). Advanced mouse and handtracking analysis: Detecting and visualizing clusters in movement trajectories. Manuscript in preparation.
Tibshirani, R., Walther, G., & Hastie, T. (2001). Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 63(2), 411423.
Sugar, C. A., & James, G. M. (2013). Finding the number of clusters in a dataset. Journal of the American Statistical Association, 98(463), 750763.
Fujita, A., Takahashi, D. Y., & Patriota, A. G. (2014). A nonparametric method to estimate the number of clusters. Computational Statistics & Data Analysis, 73, 2739.
mt_distmat for more information about how the distance matrix is computed when the hclust method is used.
mt_cluster for performing trajectory clustering with a specified number of clusters.
if (FALSE) { # Spatialize trajectories KH2017 < mt_spatialize(KH2017) # Find k results < mt_cluster_k(KH2017, use="sp_trajectories") # Retrieve results results$kopt results$paths }