Compute weighted single sample logFCs for each treated samples using normalised logCPM values. Fit a lowess curve on variance of single sample logFCs ~ mean of logCPM, and use it to predict a gene-wise weight. The weighted single sample logFCs are ready for computing perturbation scores.

weight_ss_fc(expreMatrix, metadata = NULL, factor, control)

# S4 method for matrix
weight_ss_fc(expreMatrix, metadata = NULL, factor, control)

# S4 method for data.frame
weight_ss_fc(expreMatrix, metadata = NULL, factor, control)

# S4 method for DGEList
weight_ss_fc(expreMatrix, metadata = NULL, factor, control)

# S4 method for SummarizedExperiment
weight_ss_fc(expreMatrix, metadata = NULL, factor, control)

Arguments

expreMatrix

matrix and data.frame of logCPM, or DGEList/SummarizedExperiment storing gene expression counts and sample metadata. Feature names need to be gene entrez IDs, and column names need to be sample names

metadata

Sample metadata data frame as described in the details section.

factor

Factor defines how samples can be put into matching pairs (eg. patient).

control

Treatment level that is the control.

Value

A list with two elements: $weight gene-wise weights; $logFC weighted single sample logFC matrix

Details

This function computes weighted single sample logFCs from normalised logCPM values, used for computing single sample perturbation scores. Since genes with smaller logCPM turn to have a larger variance among single sample logFCs. A lowess curve is fitted to estimate the relationship between variance of single sample logFCs and mean of logCPM, and the relationship is used to estimate the variance of each mean logCPM value. Gene-wise weights, which are inverse of variances, are then multiplied to single sample logFCs to downweight genes with low counts. It is assumed that the genes with extremely low counts have been removed and the count matrix has been normalised prior to logCPM matrix was derived. Rownames of the matrix must be genes' entrez ID. To convert other gene identifiers to entrz ID, see example.

If a S4 object of DGEList or SummarizedExperiment is provided as input to expreMatrix, gene expression matrix will be extracted from it and converted to logCPM matrix. Sample metadata will also be extracted from the same S4 object unless otherwise specified.

Provided sample metadata should have the same number of rows as the number of columns in the logCPM matrix. Metadata also must have a column called "sample" storing sample names (column names of logCPM matrix), and a column called "treatment" storing treatment of each sample.The control treatment level specified by control parameter must exist in the treatment column.

This analysis was designed for experimental designs that include matched pairs of samples, such as when tissues collected from the same patient were treated with different treatments to study different treatment effects. Parameter factor tells the function how samples can be put into matching pairs. It must also be included as a column in the metadata.

Examples

# Inspect metadata data frame to make sure it has treatment, sample and patient columns
data(metadata_example)
data(logCPM_example)
length(setdiff(colnames(logCPM_example), metadata_example$sample)) == 0
#> [1] TRUE
ls <- weight_ss_fc(logCPM_example, metadata = metadata_example,
 factor = "patient", control = "Vehicle")