\name{calcDenovo}
\alias{calcDenovo}
%- Also NEED an '\alias' for EACH other topic documented here.
\title{
  Estimate expression of gene splicing variants de novo. 
}
\description{
  \code{calcDenovo} estimates expression of gene splicing variants,
  considering both known variants and variants that have not been
  previously described.
}
\usage{
calcDenovo(distrs, targetGenomeDB, knownGenomeDB=targetGenomeDB, pc,
readLength, islandid, priorq=3, mprior, minpp=0.001, selectBest=FALSE,
searchMethod="submodels", niter, exactMarginal=TRUE,
integrateMethod="plugin", verbose=TRUE, mc.cores=1)
}
%- maybe also 'usage' for other objects documented here.
\arguments{
\item{distrs}{ List of fragment distributions as generated by the \code{getDistrs} function}
\item{targetGenomeDB}{ \code{annotatedGenome} object with isoforms we wish to
  quantify. By default these are the same as in \code{knownGenomeDB},
  but more typically \code{targetGenomeDB} is imported from a .gtf file
  produced by some isoform prediction software.}
\item{knownGenomeDB}{ \code{annotatedGenome} object with known isoforms,
e.g. from UCSC or GENCODE annotations. Used to set the prior
probability that any given isoform is expressed. \code{knownGenome}
should be the same genome annotations used to create argument
\code{mprior} (when provided)}
\item{pc}{Named vector of exon path counts as returned by \code{pathCounts}}
\item{readLength}{ Read length in bp, e.g. in a paired-end experiment where
  75bp are sequenced on each end one would set \code{readLength=75}.}
\item{islandid}{Name of the gene island to be analyzed. If not specified, all
  gene islands are analyzed.}
\item{priorq}{Parameter of the prior distribution on the proportion of
  reads coming from each variant. The prior is Dirichlet with prior
  sample size for each variant equal to priorq.
  We recommend \code{priorq=3} as this defines a non-local
  prior that penalizes falsely predicted isoforms that show low expression.}
\item{mprior}{Prior on the model space returned by
  \code{modelPrior}, used to favor isoforms
  consistent with \code{knownGenomeDB}. If left
  missing it is estimated from \code{knownGenomeDB}.
  See details.}
\item{minpp}{Models (i.e. splicing configurations) with posterior probability less than \code{minpp}
  are not reported. This argument can help reduce substantially the amount of
  required memory to store the results.}
\item{selectBest}{If set to \code{TRUE} only the model with highest
  posterior probability is reported. While this can save memory, we do
  not recommend this option as it may ignore a substantial amount of
  uncertainty.}
\item{searchMethod}{Method used to perform the model search. 
  \code{"allmodels"} enumerates all possible models (warning:
  this is not feasible for genes with >5 exons). \code{"rwmcmc"} uses a
  random-walk MCMC scheme to focus on models with high posterior
  probability.
  \code{"submodels"} considers that some isoforms in \code{targetGenomeDB}
  may not be expressed, but does not search for new variants.
  \code{"auto"} uses \code{"allmodels"} for genes with up to 5
  exons and \code{"rwmcmc"} for longer genes. See details.}
\item{niter}{Number of MCMC iterations.}
\item{exactMarginal}{Set to \code{FALSE} to estimate posterior model
  probabilities as the proportion of MCMC visits. Set to \code{TRUE} to
  use the integrated likelihoods (default). See details.}
\item{integrateMethod}{Method to compute integrated likelihoods. The default
  (\code{'plugin'}) evaluates likelihood*prior at the posterior mode and
  is the faster option. Set \code{'Laplace'} for Laplace approximations
and \code{'IS'} for Importance Sampling. The latter increases
computation cost very substantially.}
\item{verbose}{Set to \code{TRUE} to display progress information.}
\item{mc.cores}{Number of processors to be used for parallel
  computation. Can only be used if the package \code{multicore} is
  available for your system.
Warning: using multiple processors substantially increases the memory
requirements, so set this value carefully.}

}
\details{
  \code{calcDenovo} explores which subset of the isoforms indicated in
  \code{targetGenomeDB} are truly expressed.
  It also adds new isoforms when some reads follow an exon path that
  is not possible under any of the isoforms in \code{targetGenomeDB}.
  \code{calcDenovo} the posterior probability of each model
  (i.e. configuration of expressed variants) via Bayes theorem.

  P(model|y) "proportional to" m(y|model) P(model)

  where m(y|model) is the integrated likelihood and P(model) is the
  prior probability of the model.
  For example, a gene with 20 predicted isoforms in \code{targetGenome}
  gives rise 2^20 - 1 possible models (configurations of expressed isoforms).

  Importantly, P(model) can be set by analyzing available genome
  annotations in \code{knownGenomeDB}.
  For instance, genes with 20 exons have isoforms that tend
  to use most of the 20 exons. They also tend to express more
  isoforms than genes with 5 exons. The function \code{modelPrior}
  analyzes \code{knownGenomeDB} to set reasonable values for P(model).

  An exhaustive enumeration of all possible models is
  not feasible unless the gene is very short (e.g. around 5 exons).
  For longer genes we use computational strategies to search a subset of
  "interesting" models. This is controlled by the argument \code{searchMethod}
  (see above).

  In order to compute P(model|y) one can either use the computed
  m(y|model) P(model) (option \code{exactMarginal==TRUE}) or the
  proportion of MCMC visits (option \code{exactMarginal==FALSE}). Unless
  \code{niter} is large the former option typically provides more
  precise estimates.
}
\value{
  A denovoGenomeExpr object.
}
\references{
Rossell D, Stephan-Otto Attolini C, Kroiss M, Stocker A. Quantifying Alternative Splicing from Paired-End
RNA-sequencing data. Annals of Applied Statistics, 8(1):309-330.
}
\author{
  Camille Stephan-Otto Attolini, Manuel Kroiss, David Rossell
}
\seealso{
\code{\link{denovoExpr}} to obtain expression estimates from the
  \code{calcDenovo} output.
\code{plotExpr} to produce a plot with splicing variants and estimated expression.
}
\examples{
## See help(denovoExpr)
}
% Add one or more standard keywords, see file 'KEYWORDS' in the
% R documentation directory.
\keyword{ models }% __ONLY ONE__ keyword per line
