% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/scDD.R
\name{scDD}
\alias{scDD}
\title{scDD}
\usage{
scDD(SCdat, prior_param = list(alpha = 0.1, mu0 = 0, s0 = 0.01, a0 = 0.01, b0
  = 0.01), permutations = 0, testZeroes = TRUE, adjust.perms = FALSE,
  param = bpparam(), parallelBy = c("Genes", "Permutations"),
  condition = "condition", min.size = 3, min.nonzero = NULL,
  level = 0.05, categorize = TRUE)
}
\arguments{
\item{SCdat}{An object of class \code{SingleCellExperiment} that contains 
normalized single-cell expression and metadata. The \code{assays} 
  slot contains a named list of matrices, where the normalized counts are 
  housed in the one named \code{normcounts}.  This matrix should have one
   row for each gene and one sample for each column.  
  The \code{colData} slot should contain a data.frame with one row per 
  sample and columns that contain metadata for each sample.  This data.frame
  should contain a variable that represents biological condition, which is 
  in the form of numeric values (either 1 or 2) that indicates which 
  condition each sample belongs to (in the same order as the columns of 
  \code{normcounts}).  Optional additional metadata about each cell can also
  be contained in this data.frame, and additional information about the 
  experiment can be contained in the \code{metadata} slot as a list.}

\item{prior_param}{A list of prior parameter values to be used when modeling
each gene as a mixture of DP normals.  Default 
  values are given that specify a vague prior distribution on the 
  cluster-specific means and variances.}

\item{permutations}{The number of permutations to be used in calculating 
empirical p-values.  If set to zero (default),
  the full Bayes Factor permutation test will not be performed.  Instead, 
  a fast procedure to identify the genes with significantly different
  expression distributions will be performed using the nonparametric 
  Kolmogorov-Smirnov test, which tests the null hypothesis that 
  the samples are generated from the same continuous distribution.  
  This test will yield
  slightly lower power than the full permutation testing framework 
  (this effect is more pronounced at smaller sample 
  sizes, and is more pronounced in the DB category), but is orders of 
  magnitude faster.  This option
  is recommended when compute resources are limited.  The remaining 
  steps of the scDD framework will remain unchanged
  (namely, categorizing the significant DD genes into patterns that 
  represent the major distributional changes, 
  as well as the ability to visualize the results with violin plots 
  using the \code{sideViolin} function).}

\item{testZeroes}{Logical indicating whether or not to test for a 
difference in the proportion of zeroes. This will only be done for genes 
that have at least one zero value (genes where all cells have a nonzero value
will have a `zero.pvalue` of NA).}

\item{adjust.perms}{Logical indicating whether or not to adjust the 
permutation tests for the sample
  detection rate (proportion of nonzero values).  If true, the 
  residuals of a linear model adjusted for 
  detection rate are permuted, and new fitted values are 
  obtained using these residuals.}

\item{param}{a \code{MulticoreParam} or \code{SnowParam} object of 
the \code{BiocParallel}
package that defines a parallel backend.  The default option is 
\code{BiocParallel::bpparam()} which will automatically creates a cluster 
appropriate for 
the operating system.  Alternatively, the user can specify the number
of cores they wish to use by first creating the corresponding 
\code{MulticoreParam} (for Linux-like OS) or \code{SnowParam} (for Windows)
object, and then passing it into the \code{scDD}
function. This could be done to specify a parallel backend on a Linux-like
OS with, say 12 
cores by setting \code{param=BiocParallel::MulticoreParam(workers=12)}}

\item{parallelBy}{For the permutation test (if invoked), the manner in 
which to parallelize.  The default option
 is \code{"Genes"} which will spawn processes that divide up the genes 
 across all cores defined in \code{param} cores, and then loop through the 
 permutations. 
 The alternate option is \code{"Permutations"} which
 loop through each gene and spawn processes that divide up the permutations 
 across all cores defined in \code{param}.  
 The default option is recommended when analyzing more genes than the number
  of permutations.}

\item{condition}{A character object that contains the name of the column in 
\code{colData} that represents 
 the biological group or condition of interest (e.g. treatment versus 
 control).  Note that this variable should only contain two 
 possible values since \code{scDD} can currently only handle two-group 
 comparisons.  The default option assumes that there
 is a column named "condition" that contains this variable.}

\item{min.size}{a positive integer that specifies the minimum size of a 
cluster (number of cells) for it to be used
 during the classification step.  Any clusters containing fewer than 
 \code{min.size} cells will be considered an outlier
 cluster and ignored in the classfication algorithm.  The default value
  is three.}

\item{min.nonzero}{a positive integer that specifies the minimum number of
nonzero cells in each condition required for the test of differential 
distributions.  If a gene has fewer nonzero cells per condition, it will
still be tested for DZ (if \code{testZeroes} is TRUE). Default value is
NULL (no minimum value is enforced).}

\item{level}{numeric value between 0 and 1 that specifies the alpha level
for significance of a differential gene test (default value 0.05). This is 
used to decide whether to classify a gene into one of the differential
patterns. If `testZeroes` is FALSE and the adjusted p-value for a given gene 
is below `level`, then the gene is categorized. Alternatively, if `testZeroes` 
is TRUE, then the adjusted p-value must be below `level/2` in order to be
considered significant and categorized. This is done to control for multiple
testing since `testZeroes=TRUE` means that each gene is tested for a 
difference in nonzeroes and zeroes separately.}

\item{categorize}{a logical indicating whether to determine which 
categories (DE, DP, DM, DB) each gene belongs to (default = TRUE). This
can only be set to FALSE if `permutations` is set to zero, since the full
model fitting will automatically be carried out if permutations are run.}
}
\value{
A \code{SingleCellExperiment} object that contains the data and 
sample information from the input object, but where the results objects
are now added to the \code{metadata} slot. The metadata slot is now a
list with four items: the first (main results object) is a data.frame 
with the following columns: 
\itemize{
  \item `gene`: gene name (matches rownames of SCdat)
  \item `DDcategory`: name of the DD (DE, DP, DM, DB, DZ) pattern (or NS = not significant) 
  \item `Clusters.combined`: the number of clusters identified overall
  \item `Clusters.C1`: the number of clusters identified in condition 1 alone
  \item `Clusters.C2`: the number of clusters identified in condition 2 alone
  \item `nonzero.pvalue`: permutation (or KS) p-value for testing difference
  in nonzero expression values
  \item `nonzero.pvalue.adj`: Benjamini-Hochberg adjusted version of the 
    `nonzero.pvalue`column
  \item `zero.pvalue`: p-value for test of difference in dropout rate 
  (only if `testZeroes` is TRUE) 
  \item `zero.pvalue`: Benjamini-Hochberg adjusted version of the previous column 
  (only if `testZeroes` is TRUE) 
  \item `combined.pvalue`: Fisher's combined p-value for a difference in nonzero or zero values
  (only if `testZeroes` is TRUE). 
  \item `combined.pvalue.adj`: Benjamini-Hochberg adjusted version of the previous column 
  (only if `testZeroes` is TRUE) 
}
 
The remaining three elements are matrices (first for condition
  1 and 2 combined, 
 then condition 1 alone, then condition 2 alone) that contains the cluster
  memberships for each sample (cluster 1,2,3,...) in columns and
 genes in rows.  Zeroes, which are not involved in the clustering, are
  labeled as zero.  See the \code{results} function for a convenient
  way to extract these results objects.
}
\description{
Find genes with differential distributions (DD) across two conditions
}
\details{
Find genes with differential distributions (DD) across two 
conditions.  Models each log-transformed gene as a Dirichlet 
  Process Mixture of normals and uses a permutation test to determine 
  whether condition membership is independent of sample clustering.
  The FDR adjusted (Benjamini-Hochberg) permutation p-value is returned 
  along with the classification of each significant gene 
  (with p-value less than 0.05 (or 0.025 if also testing for a difference
   in the proportion of zeroes)) into one of four categories 
  (DE, DP, DM, DB).  For genes that do not show significant influence, 
  of condition on clustering, an optional test of whether the 
  proportion of zeroes (dropout rate) is different across conditions is 
  performed (DZ).
}
\examples{
 
# load toy simulated example SingleCellExperiment object to find DD genes

data(scDatExSim)


# check that this object is a member of the SingleCellExperiment class
# and that it contains 200 samples and 30 genes

class(scDatExSim)
show(scDatExSim)


# set arguments to pass to scDD function
# we will perform 100 permutations on each of the 30 genes

prior_param=list(alpha=0.01, mu0=0, s0=0.01, a0=0.01, b0=0.01)
nperms <- 100


# call the scDD function to perform permutations, classify DD genes, 
# and return results
# we won't perform the test for a difference in the proportion of zeroes  
# since none exists in this simulated toy example data
# this step will take significantly longer with more genes and/or 
# more permutations

scDatExSim <- scDD(scDatExSim, prior_param=prior_param, permutations=nperms, 
            testZeroes=FALSE)
}
\references{
Korthauer KD, Chu LF, Newton MA, Li Y, Thomson J, Stewart R, 
Kendziorski C. A statistical approach for identifying differential 
distributions
in single-cell RNA-seq experiments. Genome Biology. 2016 Oct 25;17(1):222. 
\url{https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-
1077-y}
}
