% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/Classification.R
\name{DaMiR.EnsembleLearning}
\alias{DaMiR.EnsembleLearning}
\title{Build Classifier using 'Staking' Ensemble Learning strategy.}
\usage{
DaMiR.EnsembleLearning(
  data,
  classes,
  variables,
  fSample.tr = 0.7,
  fSample.tr.w = 0.7,
  iter = 100,
  cl_type = c("RF", "kNN", "SVM", "LDA", "LR", "NB", "NN", "PLS")
)
}
\arguments{
\item{data}{A transposed data frame of normalized expression data.
Rows and Cols should be, respectively, observations and features}

\item{classes}{A class vector with \code{nrow(data)} elements.
Each element represents the class label for each observation.
More than two different class labels are handled.}

\item{variables}{An optional data frame containing other variables
(but without 'class' column). Each column represents a different
covariate to be considered in the model}

\item{fSample.tr}{Fraction of samples to be used as training set;
default is 0.7}

\item{fSample.tr.w}{Fraction of samples of training set to be used
during weight estimation; default is 0.7}

\item{iter}{Number of iterations to assess classification accuracy;
default is 100}

\item{cl_type}{List of weak classifiers that will compose the
meta-learners. Only "RF", "kNN", "SVM", "LDA", "LR", "NB", "NN", "PLS"
are allowed. Default is c("RF", "LR", "kNN", "LDA", "NB", "SVM")}
}
\value{
A list containing:
\itemize{
  \item A matrix of accuracies of each classifier in each iteration.
  \item A matrix of weights used for each classifier in each iteration.
  \item A list of all models generated in each iteration.
  \item A violin plot of model accuracy obtained for each iteration.
}
}
\description{
This function implements a 'Stacking' ensemble learning
strategy.
Users can provide heterogeneous features (other than genomic features)
which will be taken into account during
classification model building.
}
\details{
To assess the robustness of a set of predictors, a specific 'Stacking'
strategy
has been implemented. First, a training set (TR1) and a test set (TS1)
 are generated
by 'bootstrap' sampling. Then, sampling again from TR1 subset, another
pair of training (TR2) and test set (TS2) are obtained. TR2 is used to
train
Random Forest (RF), Naive Bayes (NB), Support Vector Machines
(SVM), k-Nearest Neighbour (kNN), Linear Discriminant Analysis (LDA)
and Logistic
Regression (LR) classifiers, whereas TS2 is used to test their accuracy
 and to calculate weights.
The decision rule of 'Stacking' classifier is made by a linear
combination of the
product between weigths (w) and predictions (Pr) of each classifier;
for each sample k, the prediction
is computed by:
 \deqn{Pr_{k, Ensemble} = w_{RF} * Pr_{k, RF} + w_{NB} * Pr_{k, NB} +
  w_{SVM} * Pr_{k, SVM} + w_{k, kNN} * Pr_{k, kNN} +
 w_{k, LDA} * Pr_{k, LDA} + w_{k, LR} * Pr_{k, LR}}
 \deqn{Pr_{k, Ensemble} = sum(w[RF] * Pr[k,i]), i = 1, N}
Performance of 'Stacking' classifier is evaluated by using TS1. This
process is
repeated several times (default 100 times).
}
\examples{
# use example data:
data(selected_features)
data(df)
set.seed(1)
# only for the example:
# speed up the process setting a low 'iter' argument value;
# for real data set use default 'iter' value (i.e. 100) or higher:
# Classification_res <- DaMiR.EnsembleLearning(selected_features,
# classes=df$class, fSample.tr=0.6, fSample.tr.w=0.6, iter=3,
#  cl_type=c("RF","kNN"))

}
\author{
Mattia Chiesa, Luca Piacentini
}
