% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/subsampleClustering.R
\name{subsampleClustering}
\alias{subsampleClustering}
\alias{subsampleClustering,character-method}
\alias{subsampleClustering,ClusterFunction-method}
\title{Cluster subsamples of the data}
\usage{
\S4method{subsampleClustering}{character}(clusterFunction, ...)

\S4method{subsampleClustering}{ClusterFunction}(
  clusterFunction,
  inputMatrix,
  inputType,
  clusterArgs = NULL,
  classifyMethod = c("All", "InSample", "OutOfSample"),
  resamp.num = 100,
  samp.p = 0.7,
  ncores = 1,
  warnings = TRUE,
  ...
)
}
\arguments{
\item{clusterFunction}{a \code{\link{ClusterFunction}} object that defines
the clustering routine. See \code{\link{ClusterFunction}} for required
format of user-defined clustering routines. User can also give a character
value to the argument \code{clusterFunction} to indicate the use of
clustering routines provided in package. Type
\code{\link{listBuiltInFunctions}} at command prompt to see the built-in
clustering routines. If \code{clusterFunction} is missing, the default is
set to "pam".}

\item{...}{arguments passed to mclapply (if ncores>1).}

\item{inputMatrix}{numerical matrix on which to run the clustering or a
\code{\link[SummarizedExperiment]{SummarizedExperiment}},
\code{\link{SingleCellExperiment}}, or \code{\link{ClusterExperiment}}
object.}

\item{inputType}{a character vector defining what type of input is given in
the \code{inputMatrix} argument. Must consist of values "diss","X", or
"cat" (see details). "X" and "cat" should be indicate
matrices with features in the row and samples in the column; "cat"
corresponds to the features being numerical integers corresponding to
categories, while "X" are continuous valued features. "diss" corresponds to
an \code{inputMatrix} that is a NxN dissimilarity matrix. "cat" is largely
used internally for clustering of sets of clusterings.}

\item{clusterArgs}{a list of parameter arguments to be passed to the function
defined in the \code{clusterFunction} slot of the \code{ClusterFunction}
object. For any given \code{\link{ClusterFunction}} object, use function
\code{\link{requiredArgs}} to get a list of required arguments for the
object.}

\item{classifyMethod}{method for determining which samples should be used in
calculating the co-occurance matrix. "All"= all samples, "OutOfSample"=
those not subsampled, and "InSample"=those in the subsample.  See details
for explanation.}

\item{resamp.num}{the number of subsamples to draw.}

\item{samp.p}{the proportion of samples to sample for each subsample.}

\item{ncores}{integer giving the number of cores. If ncores>1, mclapply will
be called.}

\item{warnings}{logical as to whether should give warning if arguments given
that don't match clustering choices given. Otherwise, inapplicable 
arguments will be ignored without warning.}
}
\value{
A \code{n x n} matrix of co-occurances, i.e. a symmetric matrix with
  [i,j] entries equal to the percentage of subsamples where the ith and jth
  sample were clustered into the same cluster. The percentage is only out of
  those subsamples where the ith and jth samples were both assigned to a
  clustering. If \code{classifyMethod=="All"}, this is all subsamples for all
  i,j pairs. But if \code{classifyMethod=="InSample"} or
  \code{classifyMethod=="OutOfSample"}, then the percentage is only taken on
  those subsamples where the ith and jth sample were both in or out of
  sample, respectively, relative to the subsample.
}
\description{
Given input data, this function will subsample the samples, cluster the
subsamples, and return a \code{n x n} matrix with the probability of
co-occurance.
}
\details{
\code{subsampleClustering} is not usually called directly by the
  user. It is only an exported function so as to be able to clearly document
  the arguments for \code{subsampleClustering}  which can be passed via the
  argument \code{subsampleArgs} in functions like \code{\link{clusterSingle}}
  and \code{\link{clusterMany}}.

\code{requiredArgs:} The choice of "All" or "OutOfSample" for
  \code{requiredArgs} require the classification of arbitrary samples not
  originally in the clustering to clusters; this is done via the classifyFUN
  provided in the \code{\link{ClusterFunction}} object. If the
  \code{\link{ClusterFunction}} object does not have such a function to
  define how to classify into a cluster samples not in the subsample that
  created the clustering then \code{classifyMethod} must be
  \code{"InSample"}. Note that if "All" is chosen, all samples will be
  classified into clusters via the classifyFUN, not just those that are
  out-of-sample; this could result in different assignments to clusters for
  the in-sample samples than their original assignment by the clustering
  depending on the classification function. If you do not choose 'All',it is
  possible to get NAs in resulting S matrix (particularly if when not enough
  subsamples are taken) which can cause errors if you then pass the resulting
  D=1-S matrix to \code{\link{mainClustering}}. For this reason the default is
  "All".
}
\examples{
\dontrun{
#takes a bit of time, not run on checks:
data(simData)
coOccur <- subsampleClustering( inputMatrix=simData, inputType="X",
clusterFunction="kmeans",
clusterArgs=list(k=3,nstart=10), resamp.n=100, samp.p=0.7)

#visualize the resulting co-occurance matrix
plotHeatmap(coOccur)
}
}
