Estimates the optimal number of clusters (`k`

) using various methods.

```
mt_cluster_k(
data,
use = "ln_trajectories",
dimensions = c("xpos", "ypos"),
kseq = 2:15,
compute = c("stability", "gap", "jump", "slope"),
method = "hclust",
weights = rep(1, length(dimensions)),
pointwise = TRUE,
minkowski_p = 2,
hclust_method = "ward.D",
kmeans_nstart = 10,
n_bootstrap = 10,
model_based = FALSE,
n_gap = 10,
na_rm = FALSE,
verbose = FALSE
)
```

- data
a mousetrap data object created using one of the mt_import functions (see mt_example for details). Alternatively, a trajectory array can be provided directly (in this case

`use`

will be ignored).- use
a character string specifying which trajectory data should be used.

- dimensions
a character vector specifying which trajectory variables should be used. Can be of length 2 or 3, for two-dimensional or three-dimensional trajectories respectively.

- kseq
a numeric vector specifying set of candidates for k. Defaults to 2:15, implying that all values of k within that range are compared using the metrics specified in

`compute`

.- compute
character vector specifying the to be computed measures. Can be any subset of

`c("stability","gap","jump","slope")`

.- method
character string specifying the type of clustering procedure for the stability-based method. Either

`hclust`

or`kmeans`

.- weights
numeric vector specifying the relative importance of the variables specified in

`dimensions`

. Defaults to a vector of 1s implying equal importance. Technically, each variable is rescaled so that the standard deviation matches the corresponding value in`weights`

. To use the original variables, set`weights = NULL`

.- pointwise
boolean specifying the way in which dissimilarity between the trajectories is measured. If

`TRUE`

(the default),`mt_distmat`

measures the average dissimilarity and then sums the results. If`FALSE`

,`mt_distmat`

measures dissimilarity once (by treating the various points as independent dimensions). This is only relevant if`method`

is "hclust". See mt_distmat for further details.- minkowski_p
an integer specifying the distance metric for the cluster solution.

`minkowski_p = 1`

computes the city-block distance,`minkowski_p = 2`

(the default) computes the Euclidian distance,`minkowski_p = 3`

the cubic distance, etc. Only relevant if`method`

is "hclust". See mt_distmat for further details.- hclust_method
character string specifying the linkage criterion used. Passed on to the

`method`

argument of hclust. Default is set to`ward.D`

. Only relevant if`method`

is "hclust".- kmeans_nstart
integer specifying the number of reruns of the kmeans procedure. Larger numbers minimize the risk of finding local minima. Passed on to the

`nstart`

argument of kmeans. Only relevant if`method`

is "kmeans".- n_bootstrap
an integer specifying the number of bootstrap comparisons used by

`stability`

. See cStability.- model_based
boolean specifying whether the model-based or the model-free should be used by

`stability`

, when method is`kmeans`

. See cStability and Haslbeck & Wulff (2020).- n_gap
integer specifying the number of simulated datasets used by

`gap`

. See Tibshirani et al. (2001).- na_rm
logical specifying whether trajectory points containing NAs should be removed. Removal is done column-wise. That is, if any trajectory has a missing value at, e.g., the 10th recorded position, the 10th position is removed for all trajectories. This is necessary to compute distance between trajectories.

- verbose
logical indicating whether function should report its progress.

A list containing two lists that store the results of the different
methods. `kopt`

contains the estimated `k`

for each of the
methods specified in `compute`

. `paths`

contains the values for
each `k`

in `kseq`

as computed by each of the methods specified
in `compute`

. The values in `kopt`

are optima for each of the
vectors in `paths`

.

`mt_cluster_k`

estimates the number of clusters (`k`

) using four
commonly used k-selection methods (specified via `compute`

): cluster
stability (`stability`

), the gap statistic (`gap`

), the jump
statistic (`jump`

), and the slope statistic (`slope`

).

Cluster stability methods select `k`

as the number of clusters for which
the assignment of objects to clusters is most stable across bootstrap
samples. This function implements the model-based and model-free methods
described by Haslbeck & Wulff (2020). See references.

The remaining three methods select `k`

as the value that optimizes the
gap statistic (Tibshirani, Walther, & Hastie, 2001), the jump statistic
(Sugar & James, 2013), and the slope statistic (Fujita, Takahashi, &
Patriota, 2014), respectively.

For clustering trajectories, it is often useful that the endpoints of all trajectories share the same direction, e.g., that all trajectories end in the top-left corner of the coordinate system (mt_remap_symmetric or mt_align can be used to achieve this). Furthermore, it is recommended to use length normalized trajectories (see mt_length_normalize; Wulff et al., 2019).

Haslbeck, J. M. B., & Wulff, D. U. (2020). Estimating the Number
of Clusters via a Corrected Clustering Instability. *Computational
Statistics, 35*, 1879–1894.

Wulff, D. U., Haslbeck, J. M. B., Kieslich, P. J., Henninger, F., &
Schulte-Mecklenbeck, M. (2019). Mouse-tracking: Detecting types in movement
trajectories. In M. Schulte-Mecklenbeck, A. Kühberger, & J. G. Johnson
(Eds.), *A Handbook of Process Tracing Methods* (pp. 131-145). New
York, NY: Routledge.

Tibshirani, R., Walther, G., & Hastie, T. (2001). Estimating the number of
clusters in a data set via the gap statistic. *Journal of the Royal
Statistical Society: Series B (Statistical Methodology), 63*(2), 411-423.

Sugar, C. A., & James, G. M. (2013). Finding the number of clusters in a
dataset. *Journal of the American Statistical Association, 98*(463),
750-763.

Fujita, A., Takahashi, D. Y., & Patriota, A. G. (2014). A non-parametric
method to estimate the number of clusters. *Computational Statistics &
Data Analysis, 73*, 27-39.

mt_distmat for more information about how the distance matrix is computed when the hclust method is used.

mt_cluster for performing trajectory clustering with a specified number of clusters.

```
if (FALSE) {
# Length normalize trajectories
KH2017 <- mt_length_normalize(KH2017)
# Find k
results <- mt_cluster_k(KH2017, use="ln_trajectories")
# Retrieve results
results$kopt
results$paths
}
```