% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/quality_control.R
\name{tof_assess_clusters_knn}
\alias{tof_assess_clusters_knn}
\title{Assess a clustering result by calculating a cell's cluster assignment to that
of its K nearest neighbors.}
\usage{
tof_assess_clusters_knn(
  tof_tibble,
  cluster_col,
  marker_cols = where(tof_is_numeric),
  num_neighbors = min(10, nrow(tof_tibble)),
  distance_function = c("euclidean", "cosine", "l2", "ip"),
  augment = FALSE
)
}
\arguments{
\item{tof_tibble}{A `tof_tbl` or `tibble`.}

\item{cluster_col}{An unquoted column name indicating which column in `tof_tibble`
stores the cluster ids for the cluster to which each cell belongs.
Cluster labels can be produced via any method the user chooses - including manual gating,
any of the functions in the `tof_cluster_*` function family, or any other method.}

\item{marker_cols}{Unquoted column names indicating which column in `tof_tibble`
should be interpreted as markers to be used in the mahalanobis distance calculation.
Defaults to all numeric columns. Supports tidyselection.}

\item{num_neighbors}{An integer indicating how many neighbors should be found
during the nearest neighbor calculation.}

\item{distance_function}{A string indicating which distance function should
be used to perform the k nearest neighbor calculation.
 Options are "euclidean" (the default) and "cosine".}

\item{augment}{A boolean value indicating if the output should column-bind the
computed flags for each cell (see below) as new columns in `tof_tibble` (TRUE) or if
a tibble including only the computed flags should be returned (FALSE, the default).}
}
\value{
If augment = FALSE (the default), a tibble with 2 columns: ".knn_cluster"
(a character vector indicating which cluster received the majority vote of each
cell's k nearest neighbors) and "flagged_cell" (a boolean value indicating if
the cell's cluster assignment matched the majority vote (TRUE) or not (FALSE)).
If augment = TRUE, the same 2 columns will be column-bound to
tof_tibble, and the resulting tibble will be returned.
}
\description{
This function evaluates the result of a clustering procedure by finding the cell's
K nearest neighbors, determining which cluster the majority of them are assigned to,
and checking if this matches the cell's own cluster assignment. If the cluster
assignment of the majority of a cell's nearest neighbors does not match with the
cell's own cluster assignment, the cell is flagged as potentially anomalous.
}
\examples{
sim_data <-
    dplyr::tibble(
        cd45 = c(rnorm(n = 1000, sd = 1.5), rnorm(n = 1000, mean = 2), rnorm(n = 1000, mean = -2)),
        cd38 = c(rnorm(n = 1000, sd = 1.5), rnorm(n = 1000, mean = 2), rnorm(n = 1000, mean = -2)),
        cd34 = c(rnorm(n = 1000, sd = 1.5), rnorm(n = 1000, mean = 2), rnorm(n = 1000, mean = -2)),
        cd19 = c(rnorm(n = 1000, sd = 1.5), rnorm(n = 1000, mean = 2), rnorm(n = 1000, mean = -2)),
        cluster_id = c(rep("a", 1000), rep("b", 1000), rep("c", 1000))
    )

knn_result <-
    sim_data |>
    tof_assess_clusters_knn(
        cluster_col = cluster_id,
        num_neighbors = 10
    )

}
