% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/methyLImp2.R
\name{methyLImp2}
\alias{methyLImp2}
\title{Impute missing values in methylation dataset}
\usage{
methyLImp2(
  input,
  which_assay = NULL,
  type = c("450K", "EPIC", "user"),
  annotation = NULL,
  groups = NULL,
  range = NULL,
  skip_imputation_ids = NULL,
  BPPARAM = BiocParallel::bpparam(),
  minibatch_frac = 1,
  minibatch_reps = 1,
  overwrite_res = TRUE
)
}
\arguments{
\item{input}{either a numeric data matrix with missing values to be, 
with named samples in rows and variables (probes) in named columns, or a 
SummarizedExperiment object, with an assay with variables in rows 
and samples in columns, as standard.}

\item{which_assay}{a character specifying the name of assay of the 
SummarizedExperiment object to impute. By default the first one will be imputed.}

\item{type}{a type of data, 450K or EPIC. Type is used to split CpGs across 
chromosomes. Match of CpGs to chromosomes is taken from ChAMPdata package. 
If you wish to provide your own match, specify "user" in 
this argument and provide a data frame in the next argument.}

\item{annotation}{a data frame, user provided match between CpG sites and 
chromosomes. Must contain two columns: cpg and chr. Choose "user" in the 
previous argument to be able to provide user annotation.}

\item{groups}{a vector of the same length as the number of samples that 
identifies what groups does each sample correspond, e.g. \code{c(1, 1, 2, 3)}
or \code{c("group1", "group1", "group2", "group3")}. Unique elements of the 
vector will be identified as groups and data will be split accordingly. 
Imputation will be done for each group separately consecutively. 
The default is NULL, so all samples are considered as one group.}

\item{range}{a vector of two numbers, \eqn{min} and \eqn{max}, 
specifying the range of values in the data. 
Since we assume the beta-value representation of the methylation data, 
the default range is \eqn{[0, 1]}. 
However, if a user wishes to apply the method to the other kind of data, 
they can change the range in this argument.}

\item{skip_imputation_ids}{a numeric vector of ids of the columns with NAs
for which \emph{not} to perform the imputation. If \code{NULL}, all columns 
are considered.}

\item{BPPARAM}{set of options for parallelization through BiocParallel package.
For details we refer to their documentation. The one thing most users probably
wish to customize is the number of cores. By default it is set to 
\eqn{\#cores - 2}. If you wish to change is, supply
\code{BBPARAM = SnowParam(workers = ncores)} with your desired \code{ncores}.
If the default or user-specified number of workers is higher than number of 
chromosomes, it will be overwritten.
We also recommend setting \code{exportglobals = FALSE} since it can help reduce
running time.}

\item{minibatch_frac}{a number between 0 and 1, what fraction of samples 
to use for mini-batch computation. Remember that if your data has several groups, 
mini-batch will be applied to each group separately but with the same fraction, 
so choose it accordingly. However, if your chosen fraction will be smaller 
than a matrix dimension for some groups, mini-batch will be just ignored. 
We advise to use mini-batch only if you have large number of samples, 
order of hundreds. The default is 1 (i.e., 100\% of samples are used, 
no mini-batch).}

\item{minibatch_reps}{a number, how many times to repeat computations with 
a fraction of samples specified above (more times -> better performance but 
more runtime). The default is 1 (as a companion to default fraction of 100\%,
i.e. no mini-batch).}

\item{overwrite_res}{a boolean specifying whether to overwrite an imputed slot
of the SummarizedExperiment object or to add another slot with 
imputed data. The default is \code{TRUE} to reduced the object size.}
}
\value{
Either a numeric matrix with imputed data or a SummarizedExperiment 
object.
}
\description{
This function performs missing value imputation specific for DNA methylation 
data. The method is based on linear regression since methylation levels 
show a high degree of inter-sample correlation. Implementation is 
parallelised over chromosomes to improve the running time.
}
\examples{
data(beta)
beta_with_nas <- generateMissingData(beta, lambda = 3.5)$beta_with_nas
beta_imputed <- methyLImp2(input = beta_with_nas, type = "EPIC", 
                          minibatch_frac = 0.5, 
                          BPPARAM = BiocParallel::SnowParam(workers = 1))
}
