% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/reads_simulator.R
\name{sim_read_count}
\alias{sim_read_count}
\title{Synthetic reads generator for genetic variants}
\usage{
sim_read_count(
  Config,
  D,
  Psi = NULL,
  means = c(0.002, 0.45),
  vars = c(100, 1),
  wise0 = "element",
  wise1 = "variant",
  cell_num = 300,
  permute_D = FALSE,
  sample_cell = TRUE,
  doublet = 0
)
}
\arguments{
\item{Config}{A matrix of binary values. The clone-variant configuration,
which encodes the phylogenetic tree structure, and the genotype of each clone}

\item{D}{A matrix of integers. Sequencing depth for N variants across x
cells (ideally >100 cells). NA means 0 here.}

\item{Psi}{A vector of float. The fractions of each clone. If NULL, set a
uniform distribution.}

\item{means}{A vector of two floats. The mean theta_1 (false positive rate)
and the mean theta_2 (true positive rate).}

\item{vars}{A vector of two floats. The variance of theta_1 and theta_2.}

\item{wise0}{A string, the beta-binomial parameter specificity for theta0:
global, variant, element.}

\item{wise1}{A string, the beta-binomial parameter specificity for theta1:
global, variant, element.}

\item{cell_num}{A integer. The number of cells to generate.}

\item{permute_D}{A Boolean value. If True permute variants in D.}

\item{sample_cell}{A Boolean value. If True and M > ncol(D), sample cells.}

\item{doublet}{A float between 0 and 1, the rate of doublets}
}
\value{
a list containing \code{A_sim}, a matrix for alteration reads,
\code{A_sim}, a matrix for total reads, \code{I_sim}, a matrix for clonal
label, \code{H_sim}, a matrix for genotype, \code{theta0}, a matrix of
expected false positive rate, \code{theta1}, a matrix of expected true
positive rate, \code{theta0_binom}, theta0 as binomial parameter,
\code{theta1_binom}, theta0 as binomial parameter, and \code{is_doublet}, a
vector of Boolean value if a cell is a doublet
}
\description{
There are following steps to generate the simulated reads counts for variants
in single cells:
1) given the clonal genotype and the clonal prevalence, the genotypes (i.e,
the clone) of cells will be generated following a multinomial distribution.
Note, one cell may contain variants from two clones when it is a doublet.
2) given the distribution of reads coverage, e.g., a matrix of read coverage
from real data, (variant specific), the total reads of each variant will be
generated by random sampling. Note, the missing rate is governed by this
matrix.
3) the allelic frequency of each variant will be generated by following a
beta distribution with parameters of mean and variance.
4) Given the genotype of a cell, if the mutation exists in a cell, the
alteration read counts will be generated by a binomial distribution,
parameterized the allelic frequency, sampled from step 3.
5) Given the genotype of a cell, if the mutation does not exist in a cell,
the alteration read counts will be generated by a binomial distribution,
parameterized by the technical error rate.
}
\examples{
data(simulation_input)
D2 <- sample_seq_depth(D_input, n_cells = 500, n_sites = nrow(tree_4clone$Z))
simu <- sim_read_count(tree_4clone$Z, D2, Psi = NULL, cell_num = 500)
}
