\name{callVariantsSingle}
\alias{callVariantsSingle}
\title{Single sample variant calling}
\description{
A simple single sample variant calling function (calling SNVs and deletions)
}
\usage{
callVariantsSingle( data, sampledata, samples = sampledata$Sample, errorRate = 0.001, minSupport = 2, minAF = 0.05, minStrandSupport = 1, mergeDels = TRUE, aggregator = mean)
}
\arguments{
  \item{data}{A \code{list} with elements
  \code{Counts} (a 4d \code{integer} array of size [1:12, 1:2, 1:k, 1:n]),
  \code{Coverage} (a 3d \code{integer} array of size [1:2, 1:k, 1:n]),
  \code{Deletions} (a 3d \code{integer} array of size [1:2, 1:k, 1:n]),
  \code{Reference} (a 1d \code{integer} vector of size [1:n]) -- see Details.}
  \item{sampledata}{A \code{data.frame} with \code{k} rows (one for each
  sample) and columns \code{Column} and (\code{Sample}.
  The tally file should contain this information as a group attribute, see \code{getSampleData} for an example.
  }
  \item{samples}{
  The samples on which variants should be called, by default all samples specified in sampledata are used
  }
  \item{errorRate}{ The expected error rate of the sequencing technology that was used, for illumina this should be \code{1/1000} }
  \item{minSupport}{minimal support required for a position to be considered variant}
  \item{minAF}{minimal allelic frequency for an allele at a position to be cosidered a variant}
  \item{minStrandSupport}{minimal per-strand support for a position to be considered variant}
\item{mergeDels}{Boolean flag to specify that adjacent deletion calls should be
  merged}
\item{aggregator}{Aggregator function for merging statistics of adjacent deletion calls,
  defaults to \code{mean}, which means that a deletion larger than 1bp
  will be annotated with the means of the counts and coverages etc.}
}
\details{

  \code{data} is a list of datasets which has to at least contain the
  \code{Counts} and \code{Coverages} for variant calling respectively
  \code{Deletions} for deletion calling (if \code{Deletions} is not present no deletion calls will be made).
  This list will usually be generated by a call to the \code{h5dapply} function in which the tally
  file, chromosome, datasets and regions within the datasets would be
  specified. See \code{\link{h5dapply}} for specifics.

  \code{callVariantsSingle} implements a simple single sample variant callign approach for SNVs and deletions (if \code{Deletions} is a dataset present in the \code{data} parameter. The function applies three essential filters to the provided data, requiring:
  
  - \code{minSupport} total support for the variant at the position
  - \code{minStrandSupport} support for the variant on each strand
  - an allele freqeuncy of at least \code{minAF} (for pure diploid samples this can be set relatively high, e.g. 0.3, for calling potentially homozygous variants a value of 0.8 or higher might be used)
  
  Calls are annotated with the p-Value of a \code{\link{binom.test}} of the present support and coverage given the error rate provided in the \code{errorRate} parameter, no filtering is done on this annotation.
  
  Adjacent deletion calls are merged based in the value of the \code{mergeDels} parameter and their statistics are aggregated with the function supplied in the \code{aggregator} parameter.
}
\value{
This function returns a \code{data.frame} containing annotated calls with the following slots:
  \item{Chrom}{The chromosome the potential variant / deletion is on}
  \item{Start}{The starting position of the variant / deletion}
  \item{End}{The end position of the variant / deletions (equal to Start for SNVs and single basepair deletions)}
  \item{Sample}{The sample in which the variant was called}
  \item{altAllele}{The alternate allele for SNVs (deletions will have a \code{"-"} in that slot)}
  \item{refAllele}{The reference allele for SNVs (deletions will have the deleted sequence here as extracted from the \code{Reference} dataset, if the tally file contains a sparse representation of the reference, i.e. only positions with mismatches show a reference value the missing values are substituted with \code{"N"}'s. It is strongly suggested to write the whole reference into the tally file prior to deletion calling - see \code{\link{writeReference}} for details)}
  \item{SupFwd}{Support for the variant in the sample on the forward strand}
  \item{SupRev}{Support for the variant in the sample on the reverse strand}
  \item{CovFwd}{Coverage of the variant position in the sample on the forward strand}
  \item{CovRev}{Coverage of the variant position in the sample on the reverse strand}
  \item{AF_Fwd}{Allele frequency of the variant in the sample on the forward strand}
  \item{AF_Rev}{Allele frequency of the variant in the sample on the reverse strand}
  \item{Support}{Total Support of the variant - i.e. \code{SupFwd + SupRev}}
  \item{Coverage}{Total Coverage of the variant position - i.e. \code{CovFwd + CovRev}}
  \item{AF}{Total allele frequency of the variant, i.e. \code{Support / Coverage}}
  \item{fBackground}{Background frequency of the variant in all samples but the one the variant is called in}
  \item{pErrorFwd}{Probablity of the observed support and coverage given the error rate on the forward strand}
  \item{pErrorRev}{Probablity of the observed support and coverage given the error rate on the reverse strand}
  \item{pError}{Probablity of the observed support and coverage given the error rate on both strands combined}
  \item{pError}{Coverage of the variant position in the \code{Control} sample on the forward strand}
  \item{pStrand}{p-Value of a \code{\link{fisher.test}} on the contingency matrix \code{matrix(c(CovFwd,CovRev,SupFwd,SupRev), nrow = 2)} at this position - low values could indicate strand bias }
  }
\author{
Paul Pyl
}
\examples{
  library(h5vc) # loading library
  tallyFile <- system.file( "extdata", "example.tally.hfs5", package = "h5vcData" )
  sampleData <- getSampleData( tallyFile, "/ExampleStudy/16" )
  position <- 29979629
  windowsize <- 1000
  vars <- h5dapply( # Calling Variants
    filename = tallyFile,
    group = "/ExampleStudy/16",
    blocksize = 500,
    FUN = callVariantsSingle,
    sampledata = sampleData,
    names = c("Coverages", "Counts", "Reference", "Deletions"),
    range = c(position - windowsize, position + windowsize)
  )
  vars <- do.call( rbind, vars ) # merge the results from all blocks by row
  vars # We did find a variant
}
