Contents

The DMCFB package is a pipeline to identify differentially methylated cytosine (DMC) in bisulfite sequencing data using Bayesian functional regression models. In what follows we provides some guidelines on how to read your data and analyze them.

1 Reading data

1.1 Reading bisulfite data (using files)

The R-method readBismark() is used to read bisulfite data files that are created by Bismark. Each file must include six columns, with no header, that represent

  • Chromosome
  • Start position in the chromosome
  • End position in the chromosome
  • Methylation level (m/(m+u))
  • Number of methylated reads (m)
  • Number of un-methylated reads (u)

and each row is a cytosine (or a small region) in DNA.

The function readBismark(<files' paths>, <files' names>) has two inputs: ‘the paths of the files’ and ‘the names of the files’. Using this function an object of class BSDMC is created. Extra information about data such as Age, Gender, Group, etc, must be assigned to the object using DataFrame function. As an example, we have provided three files in the package that can be read as follows:

library(DMCFB)
fn <- list.files(system.file("extdata",package = "DMCFB"))
fn.f <- list.files(system.file("extdata",package="DMCFB"), full.names=TRUE)
OBJ <- readBismark(fn.f, fn, mc.cores = 2)
#> 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |===================================                                   |  50%
#> 
#> Processing sample blk.BCU1568_BC_BS_1 ... 
#> Read 23710 records
#> 
  |                                                                            
  |======================================================================| 100%
#> 
#> Processing sample blk.BCU173_TC_BS_1 ... 
#> Read 24421 records
#> 
#> Processing sample blk.BCU551_Mono_BS_1 ... 
#> Read 23541 records
#> 
#> Building BSDMC object.
cdOBJ <- DataFrame(Cell = factor(c("BC", "TC","Mono"),
levels = c("BC", "TC", "Mono")), row.names = c("BCU1568","BCU173","BCU551"))
colData(OBJ) <- cdOBJ
OBJ
#> class: BSDMC 
#> dim: 25668 3 
#> metadata(0):
#> assays(3): methReads totalReads methLevels
#> rownames(25668): 1 2 ... 25667 25668
#> rowData names(0):
#> colnames(3): BCU1568 BCU173 BCU551
#> colData names(1): Cell

1.2 Reading bisulfite data (using matrices)

Alternatively, one can use two integer matrices and a DataFrame to create BSDMC object using cBSDMC() function. One matrix includes the read-depth and the other one includes methylation reads. The columns of these matrices represent samples and the rows represent cytosine positions.

Additional information about the genomic positions and covariates must be stored in a DataFrame and then assign to the object.

The following exampel shows the details.

library(DMCFB)
set.seed(1980)
nr <- 1000
nc <- 8
metht <- matrix(as.integer(runif(nr * nc, 0, 100)), nr)
methc <- matrix(rbinom(n=nr*nc,c(metht),prob = runif(nr*nc)),nr,nc)
methl <- methc/metht
r1 <- GRanges(rep('chr1', nr), IRanges(1:nr, width=1), strand='*')
names(r1) <- 1:nr
cd1 <- DataFrame(Group=rep(c('G1','G2'),each=nc/2),row.names=LETTERS[1:nc])
OBJ2 <- cBSDMC(rowRanges=r1,methReads=methc,totalReads=metht,
  methLevels=methl,colData=cd1)
OBJ2
#> class: BSDMC 
#> dim: 1000 8 
#> metadata(0):
#> assays(3): methReads totalReads methLevels
#> rownames(1000): 1 2 ... 999 1000
#> rowData names(0):
#> colnames(8): A B ... G H
#> colData names(1): Group

2 Identifying DMCs

To identify DMCs, one need to use the function findDMCFB() function. The function

library(DMCFB)
start.time <- Sys.time()
path0 <- "..//BCData/" # provide the path to the files
namelist.new <- list.files(path0,pattern="blk",full.names=F)
namelist.new.f <- list.files(path0,pattern="blk",full.names=T)
type <- NULL
for(i in seq_along(namelist.new)){
    type[i] <- unlist(strsplit(namelist.new[i], split=c('_'), fixed=TRUE))[2]
}
type
table(type)
indTC <- which(type=="TC")
indBC <- which(type=="BC")
indMono <- which(type=="Mono")
namelist.new <- namelist.new[c(indBC,indMono,indTC)]
namelist.new.f <- namelist.new.f[c(indBC,indMono,indTC)]
BLKDat <- readBismark(namelist.new.f, namelist.new, mc.cores = 2)
colData1 <- DataFrame(Group = factor(
  c(rep("BC",length(indBC)), rep("Mono",length(indMono)), 
  rep("TC", length(indTC))), levels = c("BC", "Mono", "TC")), 
  row.names = colnames(BLKData))
colData(BLKDat) <- colData1
BLK.BC.Mono.TC <- sort(BLKDat)
DMC.obj = findDMCFB(object = BLKDat, bwa = 30, bwb = 30, nBurn = 300, nMC = 300,
  nThin = 1, alpha = 5e-5, pSize = 500, sfiles = FALSE)

3 Figures

To plot DMCs one can use the plotDMCFB() function to plot an BSDMC object that resulted from running findDMCFB() function. To illustrate use the following example:

library(DMCFB)
set.seed(1980)
nr <- 1000
nc <- 8
metht <- matrix(as.integer(runif(nr * nc, 0, 100)), nr)
methc <- matrix(rbinom(n=nr*nc,c(metht),prob = runif(nr*nc)),nr,nc)
methl <- methc/metht
r1 <- GRanges(rep('chr1', nr), IRanges(1:nr, width=1), strand='*')
names(r1) <- 1:nr
cd1 <- DataFrame(Group=rep(c('G1','G2'),each=nc/2),row.names=LETTERS[1:nc])
OBJ1 <- cBSDMC(rowRanges=r1,methReads=methc,totalReads=metht,
  methLevels=methl,colData=cd1)
OBJ2 = findDMCFB(object = OBJ1, bwa = 30, bwb = 30, nBurn = 10, nMC = 10,
  nThin = 1, alpha = 0.05, pSize = 500, sfiles = FALSE)
#> ------------------------------------------------------------
#> Running Bayesian functional regression model ...
#> The priors's SD = 0.3027, estimated from data ...
#> Number of assigned cores: 14 ...
#> ------------------------------------------------------------
#> Fitted model:
#> logit(MethRead/ReadDepth) ~ F(Group)
#> ------------------------------------------------------------
#> Creating 1 batches of genomic positions ...
#> Running batch 1/1; chr1; 1000 positions; Region [   1, 1000]; Date 2020-09-27 15:57:27
#> ------------------------------------------------------------
#> Combining 1 objects ...
#> Objects are combined.
#> ------------------------------------------------------------
#> Identifying DMCs ...
#> DMCs are identified.
#> ------------------------------------------------------------
#> Percentage of non-DMCs and DMCs:
#> Equal(%)   DMC(%) 
#>     33.7     66.3
#> ------------------------------------------------------------
#> Percentage of hyper-, hypo-, and equal-methylated positions:
#>        Equal(%) Hyper(%) Hypo(%)
#> G2vsG1     33.7     32.4    33.9
#> ------------------------------------------------------------
plotDMCFB(OBJ2, region = c(1,400), nSplit = 2)

4 Session info

sessionInfo()
#> R version 4.0.2 (2020-06-22)
#> Platform: x86_64-apple-darwin17.0 (64-bit)
#> Running under: macOS Catalina 10.15.7
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
#> 
#> locale:
#> [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> attached base packages:
#> [1] parallel  stats4    stats     graphics  grDevices utils     datasets 
#> [8] methods   base     
#> 
#> other attached packages:
#>  [1] BiocStyle_2.17.1            DMCFB_1.3.1                
#>  [3] BiocParallel_1.23.2         SummarizedExperiment_1.19.6
#>  [5] DelayedArray_0.15.9         matrixStats_0.57.0         
#>  [7] Matrix_1.2-18               Biobase_2.49.1             
#>  [9] GenomicRanges_1.41.6        GenomeInfoDb_1.25.11       
#> [11] IRanges_2.23.10             S4Vectors_0.27.13          
#> [13] BiocGenerics_0.35.4        
#> 
#> loaded via a namespace (and not attached):
#>   [1] minqa_1.2.4              colorspace_1.4-1         ellipsis_0.3.1          
#>   [4] rprojroot_1.3-2          benchmarkme_1.0.4        htmlTable_2.1.0         
#>   [7] XVector_0.29.3           base64enc_0.1-3          fs_1.5.0                
#>  [10] rstudioapi_0.11          roxygen2_7.1.1           remotes_2.2.0           
#>  [13] fansi_0.4.1              xml2_1.3.2               codetools_0.2-16        
#>  [16] splines_4.0.2            doParallel_1.0.15        knitr_1.30              
#>  [19] pkgload_1.1.0            Formula_1.2-3            speedglm_0.3-2          
#>  [22] nloptr_1.2.2.2           Rsamtools_2.5.3          cluster_2.1.0           
#>  [25] png_0.1-7                BiocManager_1.30.10      httr_1.4.2              
#>  [28] compiler_4.0.2           backports_1.1.10         assertthat_0.2.1        
#>  [31] cli_2.0.2                htmltools_0.5.0          prettyunits_1.1.1       
#>  [34] tools_4.0.2              coda_0.19-3              gtable_0.3.0            
#>  [37] glue_1.4.2               GenomeInfoDbData_1.2.3   dplyr_1.0.2             
#>  [40] Rcpp_1.0.5               vctrs_0.3.4              Biostrings_2.57.2       
#>  [43] nlme_3.1-149             rtracklayer_1.49.5       iterators_1.0.12        
#>  [46] xfun_0.17                stringr_1.4.0            ps_1.3.4                
#>  [49] testthat_2.3.2           lme4_1.1-23              lifecycle_0.2.0         
#>  [52] devtools_2.3.2           statmod_1.4.34           XML_3.99-0.5            
#>  [55] zlibbioc_1.35.0          MASS_7.3-53              scales_1.1.1            
#>  [58] RColorBrewer_1.1-2       yaml_2.2.1               memoise_1.1.0           
#>  [61] gridExtra_2.3            ggplot2_3.3.2            rpart_4.1-15            
#>  [64] latticeExtra_0.6-29      stringi_1.5.3            fastDummies_1.6.2       
#>  [67] desc_1.2.0               foreach_1.5.0            checkmate_2.0.0         
#>  [70] boot_1.3-25              pkgbuild_1.1.0           benchmarkmeData_1.0.4   
#>  [73] rlang_0.4.7              pkgconfig_2.0.3          bitops_1.0-6            
#>  [76] evaluate_0.14            arm_1.11-2               lattice_0.20-41         
#>  [79] purrr_0.3.4              GenomicAlignments_1.25.3 htmlwidgets_1.5.1       
#>  [82] processx_3.4.4           tidyselect_1.1.0         bookdown_0.20           
#>  [85] magrittr_1.5             R6_2.4.1                 snow_0.4-3              
#>  [88] generics_0.0.2           Hmisc_4.4-1              pillar_1.4.6            
#>  [91] foreign_0.8-80           withr_2.3.0              survival_3.2-3          
#>  [94] abind_1.4-5              RCurl_1.98-1.2           nnet_7.3-14             
#>  [97] tibble_3.0.3             crayon_1.3.4             rmarkdown_2.3           
#> [100] jpeg_0.1-8.1             usethis_1.6.3            grid_4.0.2              
#> [103] data.table_1.13.0        callr_3.4.4              digest_0.6.25           
#> [106] munsell_0.5.0            sessioninfo_1.1.1