% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/block.splsda.R
\name{block.splsda}
\alias{block.splsda}
\alias{wrapper.sgccda}
\title{N-integration and feature selection with Projection to Latent Structures
models (PLS) with sparse Discriminant Analysis}
\usage{
block.splsda(
  X,
  Y,
  indY,
  ncomp = 2,
  keepX,
  design,
  scale = TRUE,
  tol = 1e-06,
  max.iter = 100,
  near.zero.var = FALSE,
  all.outputs = TRUE,
  verbose.call = FALSE
)

wrapper.sgccda(
  X,
  Y,
  indY,
  ncomp = 2,
  keepX,
  design,
  scale = TRUE,
  tol = 1e-06,
  max.iter = 100,
  near.zero.var = FALSE,
  all.outputs = TRUE,
  verbose.call = FALSE
)
}
\arguments{
\item{X}{A named list of data sets (called 'blocks') measured on the same 
samples. Data in the list should be arranged in matrices, samples x variables, 
with samples order matching in all data sets.}

\item{Y}{a factor or a class vector for the discrete outcome.}

\item{indY}{To supply if \code{Y} is missing, indicates the position of 
the matrix response in the list \code{X}.}

\item{ncomp}{the number of components to include in the model. Default to 2.
Applies to all blocks.}

\item{keepX}{A named list of same length as X. Each entry is the number of
variables to select in each of the blocks of X for each component. By
default all variables are kept in the model.}

\item{design}{numeric matrix of size (number of blocks in X) x (number of
blocks in X) with values between 0 and 1. Each value indicates the strenght
of the relationship to be modelled between two blocks; a value of 0
indicates no relationship, 1 is the maximum value. Alternatively, one of
c('null', 'full') indicating a disconnected or fully connected design,
respecively, or a numeric between 0 and 1 which will designate all
off-diagonal elements of a fully connected design (see examples in
\code{block.splsda}). If \code{Y} is provided instead of \code{indY}, the
\code{design} matrix is changed to include relationships to \code{Y}.}

\item{scale}{Logical. If scale = TRUE, each block is standardized to zero
means and unit variances (default: TRUE)}

\item{tol}{Positive numeric used as convergence criteria/tolerance during the
iterative process. Default to \code{1e-06}.}

\item{max.iter}{Integer, the maximum number of iterations. Default to  100.}

\item{near.zero.var}{Logical, see the internal \code{\link{nearZeroVar}}
function (should be set to TRUE in particular for data with many zero
values). Setting this argument to FALSE (when appropriate) will speed up the
computations. Default value is FALSE.}

\item{all.outputs}{Logical. Computation can be faster when some specific
(and non-essential) outputs are not calculated. Default = \code{TRUE}.}

\item{verbose.call}{Logical (Default=FALSE), if set to TRUE then the \code{$call}
component of the returned object will contain the variable values for all 
parameters. Note that this may cause large memory usage.}
}
\value{
\code{block.splsda} returns an object of class \code{"block.splsda",
"block.spls"}, a list that contains the following components:

\item{X}{the centered and standardized original predictor matrix.}
\item{indY}{the position of the outcome Y in the output list X.}
\item{ncomp}{the number of components included in the model for each block.}
\item{mode}{the algorithm used to fit the model.} \item{keepX}{Number of
variables used to build each component of each block} \item{variates}{list
containing the variates of each block of X.} \item{loadings}{list containing
the estimated loadings for the variates.} \item{names}{list containing the
names to be used for individuals and variables.} \item{nzv}{list containing
the zero- or near-zero predictors information.} \item{iter}{Number of
iterations of the algorithm for each component} \item{weights}{Correlation
between the variate of each block and the variate of the outcome. Used to
weight predictions.} \item{prop_expl_var}{Percentage of explained
variance for each component and each block}
\item{call}{if \code{verbose.call = FALSE}, then just the function call is returned.
If \code{verbose.call = TRUE} then all the inputted values are accessable via
this component}
Note that the argument 'scheme' has now been hardcoded to 'horst' and 'init' to 'svd.single'.
}
\description{
Integration of multiple data sets measured on the same samples or
observations to classify a discrete outcome to classify a discrete outcome
and select features from each data set, ie. N-integration with sparse
Discriminant Analysis. The method is partly based on Generalised Canonical
Correlation Analysis.
}
\details{
\code{block.splsda} function fits a horizontal integration PLS-DA model with
a specified number of components per block). A factor indicating the
discrete outcome needs to be provided, either by \code{Y} or by its position
\code{indY} in the list of blocks \code{X}.

\code{X} can contain missing values. Missing values are handled by being
disregarded during the cross product computations in the algorithm
\code{block.pls} without having to delete rows with missing data.
Alternatively, missing data can be imputed prior using the
\code{\link{impute.nipals}} function.

The type of algorithm to use is specified with the \code{mode} argument.
Four PLS algorithms are available: PLS regression \code{("regression")}, PLS
canonical analysis \code{("canonical")}, redundancy analysis
\code{("invariant")} and the classical PLS algorithm \code{("classic")} (see
References and \code{?pls} for more details).

Note that our method is partly based on sparse Generalised Canonical
Correlation Analysis and differs from the MB-PLS approaches proposed by
Kowalski et al., 1989, J Chemom 3(1), Westerhuis et al., 1998, J Chemom,
12(5) and sparse variants Li et al., 2012, Bioinformatics 28(19); Karaman et
al (2014), Metabolomics, 11(2); Kawaguchi et al., 2017, Biostatistics.

Variable selection is performed on each component for each block of \code{X}
if specified, via input parameter \code{keepX}.
}
\examples{
# block.splsda
# -------------
data("breast.TCGA")
# this is the X data as a list of mRNA, miRNA and proteins
data = list(mrna = breast.TCGA$data.train$mrna, mirna = breast.TCGA$data.train$mirna,
protein = breast.TCGA$data.train$protein)
# set up a full design where every block is connected
design = matrix(1, ncol = length(data), nrow = length(data),
dimnames = list(names(data), names(data)))
diag(design) =  0
design
# set number of component per data set
ncomp = c(2)
# set number of variables to select, per component and per data set (this is set arbitrarily)
list.keepX = list(mrna = rep(8,2), mirna = rep(8,2), protein = rep(8,2))


TCGA.block.splsda = block.splsda(X = data, Y = breast.TCGA$data.train$subtype, 
                                 ncomp = ncomp, keepX = list.keepX, design = design)
## use design = 'full'
TCGA.block.splsda = block.splsda(X = data, Y = breast.TCGA$data.train$subtype, 
                                 ncomp = ncomp, keepX = list.keepX, design = 'full')
TCGA.block.splsda$design

plotIndiv(TCGA.block.splsda, ind.names = FALSE)
## use design = 'null'
TCGA.block.splsda = block.splsda(X = data, Y = breast.TCGA$data.train$subtype, 
                                 ncomp = ncomp, keepX = list.keepX, design = 'null')
TCGA.block.splsda$design
## set all off-diagonal elements to 0.5
TCGA.block.splsda = block.splsda(X = data, Y = breast.TCGA$data.train$subtype, 
                                 ncomp = ncomp, keepX = list.keepX, design = 0.5)
TCGA.block.splsda$design
# illustrates coefficient weights in each block
plotLoadings(TCGA.block.splsda, ncomp = 1, contrib = 'max')
plotVar(TCGA.block.splsda, style = 'graphics', legend = TRUE)

## plot markers (selected variables) for mrna and mirna
# mrna: show each selected feature separately
plotMarkers(object = TCGA.block.splsda, comp = 1, block = 'mrna')
# mrna: aggregate all selected features and separate by loadings signs
plotMarkers(object = TCGA.block.splsda, comp = 1, block = 'mrna', global = TRUE)
# proteins
plotMarkers(object = TCGA.block.splsda, comp = 1, block = 'protein')
## do not show violin plots
plotMarkers(object = TCGA.block.splsda, comp = 1, block = 'protein', violin = FALSE)
# show top 5 markers
plotMarkers(object = TCGA.block.splsda, comp = 1, block = 'protein', markers = 1:5)
# show specific markers
my.markers <- selectVar(TCGA.block.splsda, comp = 1)[['protein']]$name[c(1,3,5)]
my.markers
plotMarkers(object = TCGA.block.splsda, comp = 1, block = 'protein', markers = my.markers)
}
\references{
On multiple integration with sPLS-DA and 4 data blocks:

Singh A., Gautier B., Shannon C., Vacher M., Rohart F., Tebbutt S. and Lê
Cao K.A. (2016). DIABLO: multi omics integration for biomarker discovery.
BioRxiv available here:
\url{http://biorxiv.org/content/early/2016/08/03/067611}

On data integration:

Tenenhaus A., Philippe C., Guillemot V, Lê Cao K.A., Grill J, Frouin V.
Variable selection for generalized canonical correlation analysis.
\emph{Biostatistics}. kxu001

Gunther O., Shin H., Ng R. T. , McMaster W. R., McManus B. M. , Keown P. A.
, Tebbutt S.J. , Lê Cao K-A. , (2014) Novel multivariate methods for
integration of genomics and proteomics data: Applications in a kidney
transplant rejection study, OMICS: A journal of integrative biology, 18(11),
682-95.

mixOmics article:

Rohart F, Gautier B, Singh A, Lê Cao K-A. mixOmics: an R package for 'omics
feature selection and multiple data integration. PLoS Comput Biol 13(11):
e1005752
}
\seealso{
\code{\link{plotIndiv}}, \code{\link{plotArrow}},
\code{\link{plotLoadings}}, \code{\link{plotVar}}, \code{\link{predict}},
\code{\link{perf}}, \code{\link{selectVar}}, \code{\link{block.plsda}},
\code{\link{block.spls}} and http://www.mixOmics.org/mixDIABLO for more
details and examples.
}
\author{
Florian Rohart, Benoit Gautier, Kim-Anh Lê Cao, Al J Abadi
}
\keyword{multivariate}
\keyword{regression}
