% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/tune.pls.R
\name{tune.pls}
\alias{tune.pls}
\title{Tuning functions for PLS method}
\usage{
tune.pls(
  X,
  Y,
  ncomp,
  validation = c("Mfold", "loo"),
  nrepeat = 1,
  folds,
  measure = NULL,
  mode = c("regression", "canonical", "classic"),
  scale = TRUE,
  logratio = "none",
  tol = 1e-06,
  max.iter = 100,
  near.zero.var = FALSE,
  multilevel = NULL,
  BPPARAM = SerialParam(),
  seed = NULL,
  progressBar = FALSE,
  ...
)
}
\arguments{
\item{X}{numeric matrix of predictors with the rows as individual observations.}

\item{Y}{numeric matrix of response(s) with the rows as individual observations matching \code{X}.}

\item{ncomp}{Positive Integer. The number of components to include in the
model. Default to 2.}

\item{validation}{character.  What kind of (internal) validation to use,
matching one of \code{"Mfold"} or \code{"loo"} (Leave-One-out). Default is
\code{"Mfold"}.}

\item{nrepeat}{Positive integer. Number of times the Cross-Validation process
should be repeated. \code{nrepeat > 2} is required for robust tuning. See
details.}

\item{folds}{Positive Integer, The folds in the Mfold cross-validation.}

\item{measure}{The tuning measure to use. Cannot be NULL when applied to sPLS1 object. See details.}

\item{mode}{Character string indicating the type of PLS algorithm to use. One
of \code{"regression"}, \code{"canonical"}, \code{"invariant"} or \code{"classic"}. See Details.}

\item{scale}{Logical. If scale = TRUE, each block is standardized to zero means and unit variances (default: TRUE}

\item{logratio}{Character, one of ('none','CLR') specifies the log ratio transformation to deal with compositional values that may arise from specific normalisation in sequencing data. Default to 'none'. See ?logratio.transfo for details.}

\item{tol}{Positive numeric used as convergence criteria/tolerance during the iterative process. Default to 1e-06.}

\item{max.iter}{Integer, the maximum number of iterations. Default to 100.}

\item{near.zero.var}{Logical, see the internal nearZeroVar function (should be set to TRUE in particular for data with many zero values). Setting this argument to FALSE (when appropriate) will speed up the computations. Default value is FALSE.}

\item{multilevel}{Numeric, design matrix for repeated measurement analysis, where multilevel decomposition is required. For a one factor decomposition, the repeated measures on each individual, i.e. the individuals ID is input as the first column. For a 2 level factor decomposition then 2nd AND 3rd columns indicate those factors. See examples.}

\item{BPPARAM}{A \linkS4class{BiocParallelParam} object indicating the type
of parallelisation. See examples in \code{?tune.spca}.}

\item{seed}{set a number here if you want the function to give reproducible outputs. 
Not recommended during exploratory analysis. Note if RNGseed is set in 'BPPARAM', this will be overwritten by 'seed'.}

\item{progressBar}{Logical. If \code{TRUE} a progress bar is shown as the
computation completes. Default to \code{FALSE}.}

\item{...}{Optional parameters passed to \code{\link{pls}}}
}
\value{
Returns a list with the following components for every repeat:
\item{MSEP}{Mean Square Error Prediction for each \eqn{Y} variable, only 
applies to object inherited from \code{"pls"}, and \code{"spls"}. Only 
available when in regression (s)PLS.} 
\item{RMSEP}{Root Mean Square Error Prediction for each \eqn{Y} variable, only 
applies to object inherited from \code{"pls"}, and \code{"spls"}. Only 
available when in regression (s)PLS.} 
\item{R2}{a matrix of \eqn{R^2} values of the \eqn{Y}-variables for models 
with \eqn{1, \ldots ,}\code{ncomp} components, only applies to object
inherited from \code{"pls"}, and \code{"spls"}. Only available when in 
regression (s)PLS.}
\item{Q2}{if \eqn{Y} contains one variable, a vector of \eqn{Q^2} values
else a list with a matrix of \eqn{Q^2} values for each \eqn{Y}-variable.
Note that in the specific case of an sPLS model, it is better to have a look
at the Q2.total criterion, only applies to object inherited from
\code{"pls"}, and \code{"spls"}. Only available when in regression (s)PLS.} 
\item{Q2.total}{a vector of \eqn{Q^2}-total values for models with \eqn{1, 
\ldots ,}\code{ncomp} components, only applies to object inherited from 
\code{"pls"}, and \code{"spls"}. Available in both (s)PLS modes.}
\item{RSS}{Residual Sum of Squares across all selected features and the 
components.}
\item{PRESS}{Predicted Residual Error Sum of Squares across all selected 
features and the components.}
\item{features}{a list of features selected across the 
folds (\code{$stable.X} and \code{$stable.Y}) for the \code{keepX} and
\code{keepY} parameters from the input object. Note, this will be \code{NULL} 
if using standard (non-sparse) PLS.} 
\item{cor.tpred, cor.upred}{Correlation between the 
predicted and actual components for X (t) and Y (u)} 
\item{RSS.tpred, RSS.upred}{Residual Sum of Squares between the
predicted and actual components for X (t) and Y (u)}
}
\description{
Computes M-fold or Leave-One-Out Cross-Validation scores on a user-input
grid to determine optimal values for the parameters in \code{spls}.
}
\details{
This tuning function should be used to tune the number of components to select for spls models.
}
\section{folds}{
 
During a cross-validation (CV), data are randomly split into \code{M}
subgroups (folds). \code{M-1} subgroups are then used to train submodels
which would be used to predict prediction accuracy statistics for the
held-out (test) data. All subgroups are used as the test data exactly once.
If \code{validation = "loo"}, leave-one-out CV is used where each group
consists of exactly one sample and hence \code{M == N} where N is the number
of samples.
}

\section{nrepeat}{
 
The cross-validation process is repeated \code{nrepeat} times and the
accuracy measures are averaged across repeats. If \code{validation = "loo"},
the process does not need to be repeated as there is only one way to split N
samples into N groups and hence nrepeat is forced to be 1.
}

\section{measure}{
 
\itemize{
\item \bold{For PLS2} Two measures of accuracy are available: Correlation
(\code{cor}, used as default), as well as the Residual Sum of Squares
(\code{RSS}). For \code{cor}, the parameters which would maximise the
correlation between the predicted and the actual components are chosen. The
\code{RSS} measure tries to predict the held-out data by matrix
reconstruction and seeks to minimise the error between actual and predicted
values. For \code{mode='canonical'}, The X matrix is used to calculate the
\code{RSS}, while for others modes the \code{Y} matrix is used. This measure
gives more weight to any large errors and is thus sensitive to outliers. It
also intrinsically selects less number of features on the \code{Y} block
compared to \code{measure='cor'}. 
\item \bold{For PLS1} Four measures of accuracy are available: Mean Absolute
Error (\code{MAE}), Mean Square Error (\code{MSE}, used as default),
\code{Bias} and \code{R2}. Both MAE and MSE average the model prediction
error. MAE measures the average magnitude of the errors without considering
their direction. It is the average over the fold test samples of the absolute
differences between the Y predictions and the actual Y observations. The MSE
also measures the average magnitude of the error. Since the errors are
squared before they are averaged, the MSE tends to give a relatively high
weight to large errors. The Bias is the average of the differences between
the Y predictions and the actual Y observations and the R2 is the correlation
between the predictions and the observations.
}
}

\section{Optimisation Process}{
 
The optimisation process is data-driven and similar to the process detailed
in (Rohart et al., 2016), where one-sided t-tests assess whether there is a
gain in performance when incrementing the number of features or components in
the model. However, it will assess all the provided grid through pair-wise
comparisons as the performance criteria do not always change linearly with
respect to the added number of features or components.
}

\section{more}{

See also \code{?perf} for more details.
}

\examples{
# set up data
data(liver.toxicity)
X <- liver.toxicity$gene
Y <- liver.toxicity$clinic

# tune PLS2 model to find optimal number of components
tune.res <- tune.pls(X, Y, ncomp = 10, measure = "cor",
                    folds = 5, nrepeat = 3, progressBar = TRUE)
plot(tune.res) # plot outputs

# PLS1 model example
Y1 <- liver.toxicity$clinic[,1]

tune.res <- tune.pls(X, Y1, ncomp = 10, measure = "cor",
                    folds = 5, nrepeat = 3, progressBar = TRUE)

plot(tune.res)

# Multilevel PLS2 model
repeat.indiv <- c(1, 2, 1, 2, 1, 2, 1, 2, 3, 3, 4, 3, 4, 3, 4, 4, 5, 6, 5, 5,
                  6, 5, 6, 7, 7, 8, 6, 7, 8, 7, 8, 8, 9, 10, 9, 10, 11, 9, 9,
                  10, 11, 12, 12, 10, 11, 12, 11, 12, 13, 14, 13, 14, 13, 14,
                  13, 14, 15, 16, 15, 16, 15, 16, 15, 16)
design <- data.frame(sample = repeat.indiv)

tune.res <- tune.pls(X, Y1, ncomp = 10, measure = "cor", multilevel = design,
                     folds = 5, nrepeat = 3, progressBar = TRUE)

plot(tune.res)
}
\references{
mixOmics article:

Rohart F, Gautier B, Singh A, Lê Cao K-A. mixOmics: an R package for 'omics
feature selection and multiple data integration. PLoS Comput Biol 13(11):
e1005752

PLS and PLS citeria for PLS regression: Tenenhaus, M. (1998). La regression
PLS: theorie et pratique. Paris: Editions Technic.

Chavent, Marie and Patouille, Brigitte (2003). Calcul des coefficients de
regression et du PRESS en regression PLS1. Modulad n, 30 1-11. (this is the
formula we use to calculate the Q2 in perf.pls and perf.spls)

Mevik, B.-H., Cederkvist, H. R. (2004). Mean Squared Error of Prediction
(MSEP) Estimates for Principal Component Regression (PCR) and Partial Least
Squares Regression (PLSR). Journal of Chemometrics 18(9), 422-429.

Sparse PLS regression mode:

Lê Cao, K. A., Rossouw D., Robert-Granie, C. and Besse, P. (2008). A sparse
PLS for variable selection when integrating Omics data. Statistical
Applications in Genetics and Molecular Biology 7, article 35.

One-sided t-tests (suppl material):

Rohart F, Mason EA, Matigian N, Mosbergen R, Korn O, Chen T, Butcher S,
Patel J, Atkinson K, Khosrotehrani K, Fisk NM, Lê Cao K-A&, Wells CA&
(2016). A Molecular Classification of Human Mesenchymal Stromal Cells. PeerJ
4:e1845.
}
\seealso{
\code{\link{splsda}}, \code{\link{predict.splsda}}, and http://www.mixOmics.org for more details.
}
\author{
Kim-Anh Lê Cao, Al J Abadi, Benoit Gautier, Francois Bartolo and Florian Rohart.
}
\keyword{multivariate}
\keyword{regression}
