% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/upsample.R
\name{tof_upsample_distance}
\alias{tof_upsample_distance}
\title{Upsample cells into the closest cluster in a reference dataset}
\usage{
tof_upsample_distance(
  tof_tibble,
  reference_tibble,
  reference_cluster_col,
  upsample_cols = where(tof_is_numeric),
  parallel_cols,
  distance_function = c("mahalanobis", "cosine", "pearson"),
  num_cores = 1L,
  return_distances = FALSE
)
}
\arguments{
\item{tof_tibble}{A `tibble` or `tof_tbl` containing cells to be upsampled
into their nearest reference subpopulation.}

\item{reference_tibble}{A `tibble` or `tof_tibble` containing cells that have
already been clustered or manually gated into subpopulations.}

\item{reference_cluster_col}{An unquoted column name indicating which column in
`reference_tibble` contains the subpopulation label (or cluster id) for
each cell in `reference_tibble`.}

\item{upsample_cols}{Unquoted column names indicating which columns in `tof_tibble` to
use in computing the distances used for upsampling. Defaults to all numeric columns
in `tof_tibble`. Supports tidyselect helpers.}

\item{parallel_cols}{Optional. Unquoted column names indicating which columns in `tof_tibble` to
use for breaking up the data in order to parallelize the upsampling using
`foreach` on a `doParallel` backend.
Supports tidyselect helpers.}

\item{distance_function}{A string indicating which distance function should
be used to perform the upsampling. Options are "mahalanobis" (the default),
"cosine", and "pearson".}

\item{num_cores}{An integer indicating the number of CPU cores used to parallelize
the classification. Defaults to 1 (a single core).}

\item{return_distances}{A boolean value indicating whether or not the returned
result should include only one column, the cluster ids corresponding to each row
of `tof_tibble` (return_distances = FALSE, the default), or if the returned
result should include additional columns representing the distance between each
row of `tof_tibble` and each of the reference subpopulation centroids
(return_distances = TRUE).}
}
\value{
If `return_distances = FALSE`, a tibble with one column named
`.upsample_cluster`, a character vector of length `nrow(tof_tibble)`
indicating the id of the reference cluster to which each cell
(i.e. each row) in `tof_tibble` was assigned.

If `return_distances = TRUE`, a tibble with `nrow(tof_tibble)` rows and num_clusters + 1
columns, where num_clusters is the number of clusters in `reference_tibble`.
Each row represents a cell from `tof_tibble`, and num_clusters
of the columns represent the distance between the cell and each of the reference
subpopulations' cluster centroids. The final column represents the cluster id of
the reference subpopulation with the minimum distance to the cell represented
by that row.
}
\description{
This function performs distance-based upsampling on CyTOF data
by sorting single cells (passed into the function as `tof_tibble`) into
their most phenotypically similar cell subpopulation in a reference dataset
(passed into the function as `reference_tibble`). It does so by calculating
the distance (either mahalanobis, cosine, or pearson) between each cell in
`tof_tibble` and the centroid of each cluster in `reference_tibble`, then
sorting cells into the cluster corresponding to their closest centroid.
}
\examples{
# simulate single-cell data (and reference data with clusters to upsample
# into
sim_data <-
    dplyr::tibble(
        cd45 = rnorm(n = 1000),
        cd38 = rnorm(n = 1000),
        cd34 = rnorm(n = 1000),
        cd19 = rnorm(n = 1000)
    )

reference_data <-
    dplyr::tibble(
        cd45 = rnorm(n = 200),
        cd38 = rnorm(n = 200),
        cd34 = rnorm(n = 200),
        cd19 = rnorm(n = 200),
        cluster_id = c(rep("a", times = 100), rep("b", times = 100))
    )

# upsample using mahalanobis distance
tof_upsample_distance(
    tof_tibble = sim_data,
    reference_tibble = reference_data,
    reference_cluster_col = cluster_id
)

# upsample using cosine distance
tof_upsample_distance(
    tof_tibble = sim_data,
    reference_tibble = reference_data,
    reference_cluster_col = cluster_id,
    distance_function = "cosine"
)

}
