% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/CAM.R
\name{CAM}
\alias{CAM}
\title{Convex Analysis of Mixtures}
\usage{
CAM(data, K = NULL, corner.strategy = 2, dim.rdc = 10,
  thres.low = 0.05, thres.high = 0.95, cluster.method = c("K-Means",
  "apcluster"), cluster.num = 50, MG.num.thres = 20,
  lof.thres = 0.02, quickhull = TRUE, quick.select = NULL,
  sample.weight = NULL, appro3 = TRUE, generalNMF = FALSE,
  cores = NULL)
}
\arguments{
\item{data}{Matrix of mixture expression profiles.
Data frame, SummarizedExperiment or ExpressionSet object will be
internally coerced into a matrix.
Each row is a gene and each column is a sample.
Data should be in non-log linear space with non-negative numerical values
(i.e. >= 0). Missing values are not supported.
All-zero rows will be removed internally.}

\item{K}{The candidate subpopulation number(s), e.g. K = 2:8.}

\item{corner.strategy}{The method to find corners of convex hull.
1: minimum sum of margin-of-errors; 2: minimum sum of reconstruction
errors. The default is 2.}

\item{dim.rdc}{Reduced data dimension;
should be not less than maximum candidate K.}

\item{thres.low}{The lower bound of percentage of genes to keep for CAM
with ranked norm. The value should be between 0 and 1.
The default is 0.05.}

\item{thres.high}{The higher bound of percentage of genes to keep for CAM
with ranked norm. The value should be between 0 and 1.
The default is 0.95.}

\item{cluster.method}{The method to do clustering.
The default "K-Means" will use \code{\link{kmeans}}.
The alternative "apcluster" will use
\code{\link[apcluster]{apclusterK-methods}}.}

\item{cluster.num}{The number of clusters; should be much larger than K.
The default is 50.}

\item{MG.num.thres}{The clusters with the gene number smaller than
MG.num.thres will be treated as outliers.
The default is 20.}

\item{lof.thres}{Remove local outlier using \code{\link[DMwR2]{lofactor}}.
MG.num.thres is used as the number of neighbors in the calculation of
the local outlier factors.
The default value 0.02 will remove top 2\% local outliers.
Zero value will disable lof.}

\item{quickhull}{Perform quickhull to select clusters or not. The default is
True.}

\item{quick.select}{The number of candidate corners kept after quickhull
and SFFS greedy search. If Null, only quickhull is applied.
The default is 20. If this value is larger than the number of candidate
corners after quickhull, greedy search will also not be applied.}

\item{sample.weight}{Vector of sample weights. If NULL, all samples have
the same weights. The length should be the same as sample numbers.
All values should be positive.}

\item{appro3}{Estimate A and S matrix by approach 3 or not. Please see
\code{\link{CAMASest}} for further information. The default is TRUE.}

\item{generalNMF}{If TRUE, the decomposed proportion matrix has no sum-to-one
constraint for each row. The default is FALSE.
TRUE value brings two changes: (1) Without assuming samples are
normalized, the first principal component will not forced to be along
c(1,1,..,1) but a standard PCA will be applied during preprocessing.
(2) Without sum-to-one constraint for each row, the scale ambiguity of
each column vector in proportion matrix will not be removed.}

\item{cores}{The number of system cores for parallel computing.
If not provided, one core for each element in K will be invoked.
Zero value will disable parallel computing.}
}
\value{
An object of class "\code{\link{CAMObj}}" containing the following
components:
\item{PrepResult}{An object of class "\code{\link{CAMPrepObj}}" containing
data preprocessing results from \code{\link{CAMPrep}} function.}
\item{MGResult}{A list of "\code{\link{CAMMGObj}}" objects containing
marker gene detection results from \code{\link{CAMMGCluster}} function for
each K value.}
\item{ASestResult}{A list of "\code{\link{CAMASObj}}" objects containing
estimated proportions, subpopulation-specific expressions and mdl values
from \code{\link{CAMASest}} function for each K value.}
}
\description{
This function performs a fully unsupervised computational deconvolution
    to identify marker genes that define each of the multiple
    subpopulations, and estimate the proportions of these subpopulations in
    the mixture tissues as well as their respective expression profiles.
}
\details{
This function includes three necessary steps to decompose a matrix
of mixture expression profiles: data preprocessing, marker gene cluster
search, and matrix decomposition. They are implemented in
\code{\link{CAMPrep}}, \code{\link{CAMMGCluster}} and
\code{\link{CAMASest}}, separately.
More details can be found in the help document of each function.

For this function, you needs to specify the range of possible
subpopulation numbers and the percentage of low/high-expressed genes to
be removed. Typically, 30\% ~ 50\% low-expressed genes can be removed from
gene expression data. The removal of high-expressed genes has much less
impact on results, and usually set to be 0\% ~ 10\%.

This function can also analyze other molecular expression data, such as
proteomics data. Much less low-expressed proteins need to be removed,
e.g. 0\% ~ 10\%, due to a limited number of proteins without missing values.
}
\examples{
#obtain data
data(ratMix3)
data <- ratMix3$X

#set seed to generate reproducible results
set.seed(111)

#CAM with known subpopulation number
rCAM <- CAM(data, K = 3, dim.rdc = 3, thres.low = 0.30, thres.high = 0.95)
#Larger dim.rdc can improve performance but increase time complexity

\dontrun{
#CAM with a range of subpopulation number
rCAM <- CAM(data, K = 2:5, dim.rdc = 10, thres.low = 0.30, thres.high = 0.95)

#Use "apcluster" to aggregate gene vectors in CAM
rCAM <- CAM(data, K = 2:5, dim.rdc = 10, thres.low = 0.30, thres.high = 0.95,
            cluster.method = 'apcluster')

#CAM with quick selection to reduce time complexity
rCAM <- CAM(data, K = 3, dim.rdc = 10, thres.low = 0.30, thres.high = 0.95,
        quick.select = 20)

#CAM with different sample weights (e.g. adjusted based on sample quality)
rCAM <- CAM(data, K = 3, dim.rdc = 5, thres.low = 0.30, thres.high = 0.95,
        sample.weight = c(rep(10,11),rep(1,10)))

#CAM for general NMF (no sum-to-one contraint for proportion matrix)
rCAM <- CAM(data, K = 3, dim.rdc = 5, thres.low = 0.30, thres.high = 0.95,
       generalNMF = TRUE)
}
}
