This functions is best used to re-count large number of small peaks (e.g. <= 5000bp) into equal or larger bins. The genome is either cut in fixed bins (e.g. 50,000bp) or into an user defined number of bins. Bins are calculated based on the canconical chromosomes. Note that if peaks are larger than bins, or if peaks are overlapping multiple bins, the signal is added to each bin. Users can increase the minimum overlap to consider peaks overlapping bins (by default 150bp, size of a nucleosome) to disminish the number of peaks overlapping multiple region. Any peak smaller than the minimum overlapp threshold will be dismissed. Therefore, library size might be slightly different from peaks to bins if signal was duplicated into multiple bins or ommitted due to peaks smaller than minimum overlap.

peaks_to_bins(
  mat,
  bin_width = 50000,
  n_bins = NULL,
  minoverlap = 150,
  verbose = TRUE,
  ref = "hg38"
)

Arguments

mat

A matrix of peaks x cells

bin_width

width of bins to produce in base pairs (minimum 500) (50000)

n_bins

number of bins (exclusive with bin_width)

minoverlap

Minimum overlap between a peak and a bin to consider the peak as overlapping the bin (150).

verbose

Verbose

ref

reference genome to use (hg38)

Value

A sparse matrix of bins instead of peaks

Examples

mat = create_scDataset_raw()$mat binned_mat = peaks_to_bins(mat,bin_width = 10e6)
#> ChromSCape::peaks_to_bins - converting 600 peaks into 332 bins of 9302017.63855422 bp in average.
#> [1] "Running aggregation of peaks to bins in parallel"
#> ChromSCape::peaks_to_bins - From peaks to bins in 2.43099999999998 sec.
#> ChromSCape::peaks_to_bins - removed 12 empty bins from the binned matrix.
dim(binned_mat)
#> [1] 320 300