% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/factors.R
\name{runICA}
\alias{runICA}
\title{Run standard or stabilised Independent Component Analysis}
\usage{
runICA(
  X,
  nc,
  use_stability = FALSE,
  resample = FALSE,
  method = "fast",
  stability_threshold = NULL,
  center_X = TRUE,
  scale_X = FALSE,
  reorient_skewed = TRUE,
  scale_components = TRUE,
  scale_reduced = TRUE,
  n_runs = 30,
  BPPARAM = BiocParallel::SerialParam(),
  ...
)
}
\arguments{
\item{X}{Either a \link[SummarizedExperiment]{SummarizedExperiment} object
or a matrix containing data to be subject to ICA. \code{X} should have rows as
features and columns as samples.}

\item{nc}{The number of components to be identified. See
\link[ReducedExperiment]{estimateStability} for a method to estimate the
optimal number of components.}

\item{use_stability}{Whether to use a stability-based approach to estimate
factors. See \code{details} for further information.}

\item{resample}{If \code{TRUE}, a boostrap approach is used to estimate factors
and quantify stability. Else, random initialisation of ICA is employed.
Ignored if \code{use_stability} is \code{FALSE}.}

\item{method}{The ICA method to use. Passed to \link[ica]{ica}, the options
are "fast", "imax" or "jade".}

\item{stability_threshold}{A stability threshold for pruning factors. Factors
with a stability below this threshold will be removed. If used, the threshold
can lead to fewer factors being returned than that specified by \code{nc}.}

\item{center_X}{If \code{TRUE}, X is centered (i.e., features / rows are transformed
to have a mean of 0) prior to ICA. Generally recommended.}

\item{scale_X}{If \code{TRUE}, X is scaled (i.e., features / rows are transformed
to have a standard deviation of 1) before ICA.}

\item{reorient_skewed}{If \code{TRUE}, factors are reorientated to ensure that the
loadings of each factor (i.e., the source signal matrix) have positive skew.
Helps ensure that the most influential features for each factor are
positively associated with it.}

\item{scale_components}{If \code{TRUE}, the loadings are standardised (to have a
mean of 0 and standard deviation of 1).}

\item{scale_reduced}{If \code{TRUE}, the reduced data (mixture matrix) are
standardised (to have a mean of 0 and standard deviation of 1).}

\item{n_runs}{The number of times to run ICA to estimate factors and quantify
stability. Ignored if \code{use_stability} is \code{FALSE}.}

\item{BPPARAM}{A class containing parameters for parallel evaluation. Uses
\link[BiocParallel]{SerialParam} by default, running only a single
ICA computation at a time. Ignored if \code{use_stability}
is \code{FALSE}.}

\item{...}{Additional arguments to be passed to
\link[ica]{ica}.}
}
\value{
A list containing the following:
\describe{
\item{M}{The mixture matrix (reduced data) with samples as rows and columns
as factors.}
\item{S}{The source signal matrix (loadings) with rows as features and
columns as factors.}
\item{stab}{If \code{use_stability} is TRUE, "stab" will be a component of the
list. It is a vector indicating the relative stability, as described
above.}
}
}
\description{
Runs ICA through \link[ica]{ica}. If \code{use_stability} is FALSE, then \code{X} is
passed directly to \link[ica]{ica} and a standard ICA analysis is performed.
If \code{use_stability} is \code{TRUE}, then the stabilised ICA procedure is carried
out (see \code{details}).
}
\details{
Function performs ICA for a data matrix. If \code{use_stability} is \code{TRUE}, then
ICA is performed multiple times with either: i) random initialisation
(default); or ii) bootstrap resampling of the data (if \code{resample} is \code{TRUE}).

Note that the seed must be set if reproducibility is needed. Specifically,
one can use \code{set.seed} prior to running standard ICA
(\code{use_stability = FALSE}) or set the \code{RNGseed} argument of \code{BPPARAM} when
running stabilised ICA (\code{use_stability = TRUE}).

The stability-based ICA algorithm is similar to the the ICASSO approach
(\url{https://www.cs.helsinki.fi/u/ahyvarin/papers/Himberg03.pd}) that is
implemented in the stabilized-ica Python package
(\url{https://github.com/ncaptier/stabilized-ica/tree/master}).

In short, the stability-based algorithm consists of:
\itemize{
\item Running ICA multiple times with either random initialisation or
bootstrap resampling of the input data.
\item Clustering the resulting factors across all runs based on the
signature matrix.
\item Calculating intra- (aics) and extra- (aecs) cluster
stability, and defining the final cluster stability as \code{aics - aecs}.
\item Calculating the cluster centrotype as the factor with the highest
intra-cluster stability.
\item Optionally removing factors below a specified stability threshold
(\code{stability_threshold}).
}

Results from this function should be broadly similar to those generated by
other implementations of stabilised ICA, although they will not be identical.
Notable differences include:
\describe{
\item{ICA algorithm}{Differences in the underlying implementation of
ICA.}
\item{Stability threshold}{The \code{stability_threshold} argument, if
specified, removes unstable components. Such a threshold is not
used by stabilized-ica.}
\item{Mixture matrix recovery}{ICA is generally formulated as
\code{X = MS}, where \code{X} is the input data, \code{M} is the mixture matrix
(reduced data) and \code{S} is the source signal matrix (feature loadings).
The stabilised ICA approach first calculates a source signal matrix
before recovering the mixture matrix. To do this, other implementations,
including that of the stabilized-ica package, multiply \code{X} by the
pseudo-inverse of \code{S}. Such an operation is implemented in the \code{ginv}
function of the \code{MASS} R package. In the development of ReducedExperiment,
we noticed that taking the inverse of \code{S} often failed, particularly when
there were correlated factors. For this reason, we instead formulate the
mixture matrix as \code{M = XS}. After standardisation of \code{M}, both approaches
return near-identical results, given that the matrix inverse was
successfully calculated.}
}
}
\examples{
# Get a random matrix with rnorm, with 100 rows (features)
# and 20 columns (observations)
X <- ReducedExperiment:::.makeRandomData(100, 20, "feature", "obs")

# Run standard ICA on the data with 5 components
set.seed(1)
ica_res <- runICA(X, nc = 5, use_stability = FALSE)

# Run stabilised ICA on the data with 5 components (low runs for example)
ica_res_stab <- runICA(X, nc = 5, use_stability = TRUE, n_runs = 5,
                        BPPARAM = BiocParallel::SerialParam(RNGseed = 1))

}
\seealso{
\code{\link[ica:ica]{ica::ica()}}, \code{\link[=estimateStability]{estimateStability()}}
}
\author{
Jack Gisby
}
