% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/make_mat_bsseq.R
\name{make_mat_bsseq}
\alias{make_mat_bsseq}
\title{Make M/beta and coverage matrices from WGBS BED files}
\usage{
make_mat_bsseq(
  bedfiles,
  regions,
  aligner = "biscuit",
  mval = TRUE,
  merged = TRUE,
  sparse = FALSE,
  prealloc = 10000,
  nthreads = NULL
)
}
\arguments{
\item{bedfiles}{A vector of BED file paths}

\item{regions}{A vector, data frame or GenomicRanges of genomic regions. See details.}

\item{aligner}{The aligner used to produce the BED files - one of "biscuit",
"bismark", "bsbolt".}

\item{mval}{Whether to return M-values or beta-values with the coverage
matrix. Defaults to M-value. Set \code{mval=FALSE} to get beta value matrix.}

\item{merged}{Whether the input strands have been merged/collapsed}

\item{sparse}{Whether to return a sparse matrix}

\item{prealloc}{The number of rows to initialize the matrices with. If the
number of loci are approximately known, this can reduce runtime as fewer
resizes need to be made.}

\item{nthreads}{Set the number of threads to use. Overrides the
\code{"iscream.threads"} option. See \code{?set_threads} for more information.}
}
\value{
A named list of
\itemize{
\item coverage and either a beta- or M-value matrix
\item a character vector of chromosomes and numeric vector of corresponding CpG
base positions
\item a character vector of the input sample names
}
}
\description{
Queries the CpG/CpH loci from provided regions and produces M/beta and
coverage matrices with their genomic positions. Parallelized across files
using threads from the \code{"iscream.threads"} option. The output of
\code{make_mat_bsseq} may be used to create a BSseq object: \code{do.call(BSseq, make_mat_bsseq(...))}.
}
\details{
The input regions may be string vector in the form "chr:start-end"
or a GRanges object. If a data frame is provided, they must have "chr",
"start", and "end" columns.
}
\section{Bitpacking limits}{

\code{make_mat_bsseq()} makes two matrices: M-value (or beta-value) and coverage.
For speed and memory efficiency these two values are bitpacked during matrix
creation so that only one matrix needs to be populated and resized. This
matrix is unpacked into the two required matrices only after the matrix
dimensions are known after querying all input files. The two values are
packed using the INT16 type, which has an upper limit of 32,767, into one
INT32. If the coverage values exceed 32,767, the upper limit of a 16-bit
signed integer, it will be capped at the limit. Beta values will also be
capped similarly, but any such beta values would indicate a bug in the
aligner that produced the data.
}

\examples{
bedfiles <- system.file("extdata", package = "iscream") |>
  list.files(pattern = "[a|b|c|d].bed.gz$", full.names = TRUE)
# examine the BED files
colnames <- c("chr", "start", "end", "beta", "coverage")
lapply(bedfiles, function(i) knitr::kable(read.table(i, col.names = colnames)))

# make a vector of regions
regions <- c("chr1:1-6", "chr1:7-10", "chr1:11-14")
mat <- make_mat_bsseq(bedfiles, regions)
# for BSseq object run
if (requireNamespace("bsseq", quietly = TRUE)) {
  do.call(bsseq::BSseq, mat)
}
}
