% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/topological-integration.R
\name{topologicalAnalysis}
\alias{topologicalAnalysis}
\title{Perform a topologically-aware integrative pathway analysis (TAIPA)}
\usage{
topologicalAnalysis(
  mirnaObj,
  pathways,
  pCutoff = 0.05,
  pAdjustment = "max-T",
  nPerm = 10000,
  progress = FALSE,
  tasks = 0,
  BPPARAM = bpparam()
)
}
\arguments{
\item{mirnaObj}{A \code{\link[=MirnaExperiment-class]{MirnaExperiment}} object
containing miRNA and gene data}

\item{pathways}{A \code{list} of miRNA-augmented pathways returned by the
\code{\link[=preparePathways]{preparePathways()}} function}

\item{pCutoff}{The adjusted p-value cutoff to use for statistical
significance. The default value is \code{0.05}}

\item{pAdjustment}{The p-value correction method for multiple testing. It
must be one of: \code{max-T} (default), \code{fdr}, \code{BH}, \code{none}, \code{holm}, \code{hochberg},
\code{hommel}, \code{bonferroni}, \code{BY}}

\item{nPerm}{The number of permutation used for assessing the statistical
significance of each pathway. Default is 10000. See the \emph{details} section
for additional information}

\item{progress}{Logical, whether to show a progress bar during p-value
calculation or not. Default is FALSE, not to include a progress bar. Please
note that setting \code{progress = TRUE} with high values of \code{tasks} leads to
less efficient parallelization. See the \emph{details} section for additional
information}

\item{tasks}{An integer between 0 and 100 that specifies how frequently the
progress bar must be updated. Default is 0 to simply split the computation
among the workers. High values of \code{tasks} can lead to 15-30\% slower p-value
calculation. See the \emph{details} section for additional information}

\item{BPPARAM}{The desired parallel computing behavior. This parameter
defaults to \code{BiocParallel::bpparam()}, but this can be edited. See
\code{\link[BiocParallel:register]{BiocParallel::bpparam()}} for information on parallel computing in R}
}
\value{
An object of class
\code{\link[=IntegrativePathwayAnalysis-class]{IntegrativePathwayAnalysis}} that stores
the results of the analysis. See the relative help page for further details.
}
\description{
This function allows to perform an integrative pathway analysis that aims
to identify the biological networks that are most affected by miRNomic
and transcriptomic dysregulations. This function takes miRNA-augmented
pathways, created by the \code{\link[=preparePathways]{preparePathways()}} function, and then calculates
a score that estimates the degree of impairment for each pathway. Later,
statistical significance is calculated through a permutation test. The main
advantages of this method are that it doesn't require matched samples, and
that it allows to perform an integrative miRNA-mRNA pathway analysis that
takes into account the topology of biological networks. See the \emph{details}
section for additional information.
}
\details{
\subsection{Topologically-Aware Integrative Pathway Analysis (TAIPA)}{

This analysis aims to identify the biological pathways that result affected
by miRNA and mRNA dysregulations. In this analysis, biological pathways are
retrieved from a pathway database such as KEGG, and the interplay between
miRNAs and genes is then added to the networks. Each network is defined as
a graph \eqn{G(V, E)}, where \eqn{V} represents nodes, and \eqn{E}
represents the relationships between nodes.

Then, nodes that are not significantly differentially expressed are assigned
a weight \eqn{w_i = 1}, whereas differentially expressed nodes are assigned
a weight \eqn{w_i = \left| \Delta E_i \right|}, where \eqn{\Delta E_i} is
the linear fold change of the node. Moreover, to consider the biological
interaction between two nodes, namely \eqn{i} and \eqn{j}, we define an
interaction parameter \eqn{\beta_{i \rightarrow j} = 1} for activation
interactions and \eqn{\beta_{i \rightarrow j} = -1} for repression
interactions. Subsequently, the concordance coefficient
\eqn{\gamma_{i \rightarrow j}} is defined as:

\deqn{\gamma_{i \rightarrow j} = \begin{cases} \beta_{i \rightarrow j}
&\text{if } sign(\Delta E_i) = sign(\Delta E_j) \\ - \beta_{i \rightarrow j}
&\text{if } sign(\Delta E_i) \not= sign(\Delta E_j) \end{cases}\,.}

Later in the process, a breadth-first search (BFS) algorithm is applied to
topologically sort pathway nodes so that each individual node occurs after
all its upstream nodes. Nodes within cycles are considered leaf nodes. At
this point, a node score \eqn{\phi} is calculated for each pathway node
\eqn{i} as:

\deqn{\phi_i = w_i + \sum_{j=1}^{U} \gamma_{i \rightarrow j} \cdot k_j\,.}

where \eqn{U} represents the number of upstream nodes,
\eqn{\gamma_{i \rightarrow j}} denotes the concordance coefficient, and
\eqn{k_j} is a propagation factor defined as:

\deqn{k_j = \begin{cases} w_j &\text{if } \phi_j = 0 \\ \phi_j &\text{if }
\phi_j \not = 0 \end{cases}\,.}

Finally, the pathway score \eqn{\Psi} is calculated as:

\deqn{\Psi = \frac{1 - M}{N} \cdot \sum_{i=1}^{N} \phi_i\,,}

where \eqn{M} represents the proportion of miRNAs in the pathway, and
\eqn{N} represents the total number of nodes in the network.

Then, to compute the statistical significance of each pathway score, a
permutation procedure is applied. Later, both observed pathway scores and
permuted scores are standardized by subtracting the mean score of the
permuted sets \eqn{\mu_{\Psi_P}} and then dividing by the standard deviation
of the permuted scores \eqn{\sigma_{\Psi_P}}.

Finally, the p-value is defined based on the fraction of permutations that
reported a higher normalized pathway score than the observed one.
However, to prevent p-values equal to zero, we define p-values as:

\deqn{p = \frac{\sum_{n=1}^{N_p} \left[ \Psi_{P_N} \ge \Psi_N \right] + 1}
{N_p + 1}\,.}

In the end, p-values are corrected for multiple testing either through the
max-T procedure (default option) which is particularly suited for
permutation tests, or through the standard multiple testing approaches.
}

\subsection{Implementation details}{

For computational efficiency, pathway score computation has been implemented
in C++ language. Moreover, to define the statistical significance of each
network, a permutation test is applied following the number of permutations
specified with \code{nPerm}. The default setting is to perform 10000 permutations.
The higher is the number of permutations, the more stable are the calculated
p-values, even though the time needed will increase. In this regard, since
computing pathway score for 10000 networks for each pathway is
computationally intensive, parallel computing has been employed to reduce
running time. The user can modify the parallel computing behavior by
specifying the \code{BPPARAM} parameter. See \code{\link[BiocParallel:register]{BiocParallel::bpparam()}} for
further details. Further, a progress bar can also be included to show the
completion percentage by setting \code{progress = TRUE}. Moreover, the user can
define how frequently the progress bar gets updated by tweaking the \code{tasks}
parameter. When using \code{progress = TRUE}, setting \code{tasks} to 100 tells the
function to update the progress bar 100 times, so that the user can see
increases of 1\%. Instead, setting \code{tasks} to 50, means that the progress bar
gets updated every 2\% of completion. However, keep in mind that \code{tasks}
values from 50 to 100 lead to 15-30\% slower p-value calculation due to
increased data transfer to the workers. Instead, lower \code{tasks} values like
20 determine less frequent progress updates but are only slightly less
efficient than not including a progress bar.
}
}
\examples{

\donttest{
# load example MirnaExperiment object
obj <- loadExamples()

# perform integration analysis with default settings
obj <- mirnaIntegration(obj)

# retrieve pathways from KEGG and augment them with miRNA-gene interactions
paths <- preparePathways(obj)

# perform the integrative pathway analysis with 1000 permutations
ipa <- topologicalAnalysis(obj, paths, nPerm = 1000)

# access the results of pathway analysis
integratedPathways(ipa)

# create a dotplot of integrated pathways
integrationDotplot(ipa)

# explore a specific biological network
visualizeNetwork(ipa, "Thyroid hormone synthesis")
}

}
\references{
Peter H. Westfall and S. Stanley Young. Resampling-Based Multiple Testing:
Examples and Methods for p-Value Adjustment. John Wiley & Sons.
ISBN 978-0-471-55761-6.
}
\author{
Jacopo Ronchi, \email{jacopo.ronchi@unimib.it}
}
