Plot pathways and genes contained in them as a network

plot_gs2gene(
  normalisedScores,
  gsTopology,
  geneFC = NULL,
  mapEntrezID = NULL,
  colorGS_By = c("robustZ", "pvalue"),
  foldGSname = TRUE,
  foldafter = 2,
  layout = "fr",
  edgeAlpha = 0.8,
  upGS_col = "brown3",
  downGS_col = "steelblue3",
  upGene_col = "pink",
  downGene_col = "lightblue",
  GeneNode_size = 3,
  GeneNode_shape = 17,
  GsNode_size = 2,
  GsNode_shape = 16,
  label_Gene = TRUE,
  GeneName_size = 3,
  GsName_size = 6,
  gene_lg_title = "Changes in Gene Expression",
  gs_lg_title = "Pathway Perturbation",
  arc_strength = 0.5
)

Arguments

normalisedScores

A data.frame derived from the normalise_by_permu() function. Only gene-sets of interest should be included

gsTopology

List of pathway topology matrices generated using function retrieve_topology()

geneFC

An optional named vector of pathways' fold changes

mapEntrezID

Optional. A data.frame matching genes' entrez ID to other identifier. Must contain 2 columns: "entrezid","mapTo"

colorGS_By

Choose to color nodes by robustZ or pvalue. A column must exist in the normalisedScores data.frame for the chosen parameter

foldGSname

logical. Should long gene-set names be folded into two lines

foldafter

The number of words after which gene-set names should be folded. Defaulted to 2

layout

The layout algorithm to apply. Accept all layout supported by igraph.

edgeAlpha

Transparency of edges. Default to 0.8

upGS_col

Color for activated gene-sets. Only applicable if colorGS_By is set to be "robustZ"

downGS_col

Color for inhibited gene-sets. Only applicable if colorGS_By is set to be "robustZ"

upGene_col

Color for up-regulated genes. Only applicable if geneFC is not NULL

downGene_col

Color for down-regulated genes. Only applicable if geneFC is not NULL

GeneNode_size

Size for gene nodes

GeneNode_shape

Shape for gene nodes

GsNode_size

Size for gene-set nodes

GsNode_shape

Shape for gene nodes

label_Gene

logical. Should gene name be plotted

GeneName_size

Size of gene name label

GsName_size

Size of gene-set name label

gene_lg_title

character. Legend for gene nodes color

gs_lg_title

character. Legend for gene-set nodes color

arc_strength

The bend of edges. 1 approximates a halfcircle while 0 will give a straight line.

Value

A ggplot2 object

Details

Taking the perturbation scores of a list of gene-sets derived from normalise_by_permu(), this function matches gene-set to their associated genes by utilizing information from pathway topology matrices.

It's optional to provide genes' logFCs as a named vector, where the names must be genes' entrez ID in the format of "ENTREZID:XXXX". This is because pathway topology matrices retrieved through retrieve_topology() alwyas use entrez ID as identifiers. However, it might not be very informative to label genes with their entrez ID. So users can also choose to proivde a mapEntrezID data.frame to match genes' entrez ID to their chosen identifiers. The data.frame should contain two columns: "entrezid" and "mapTo".If geneFC is provided, gene nodes will be colored by changes in direction. Otherwise, all gene nodes will be black.

Since some gene-sets could contain hundreds of genes, it is not recommended to plot all of those genes. If mapEntrezID data.frame is provided, only genes included in that data.frame will be used in the plot. Consider filter for genes with highest magnitude of changes. If all pathway genes have to be plotted, consider setting label_Gene to FALSE to turn off plotting all gene names.

Examples

load(system.file("extdata", "gsTopology.rda", package = "sSNAPPY"))
load(system.file("extdata", "normalisedScores.rda", package = "sSNAPPY"))
#Subset pathways significantly perturbed in sample R5020_N2_48
subset <- dplyr::filter(normalisedScores, adjPvalue < 0.05, sample == "R5020_N2_48")

# Color gene-sets nodes by robust z-scores.
plot_gs2gene(subset, gsTopology, colorGS_By = "robustZ", label_Gene = FALSE,
GeneNode_size = 1)

# When genes' fold-changes are not provided, gene nodes are colored in black.

# To color genes by their directions of changes, firstly compute genes' single-sample logFCs
data(logCPM_example)
data(metadata_example)
ls <- weight_ss_fc(logCPM_example, metadata = metadata_example,
 factor = "patient", control = "Vehicle")
# Provide fold-changes of sample R5020_N2_48
plot_gs2gene(subset, gsTopology, geneFC = ls$logFC[,"R5020_N2_48"], colorGS_By = "robustZ",
label_Gene = FALSE)


# There are still a large number of genes, making the plot cumbersome. There only fold-changes of
# genes with top 500 absolute fold-changes are provide so only pathway genes in that list of 500
# genes were plotted.
FC <- sort(abs(ls$logFC[,"R5020_N2_48"]), decreasing = TRUE)[1:500]
plot_gs2gene(subset, gsTopology, geneFC = FC, colorGS_By = "robustZ")


# To make the gene labels more informative, map genes' entrez id to chosen identifiers.
load(system.file("extdata", "entrez2name.rda", package = "sSNAPPY"))
plot_gs2gene(subset, gsTopology, geneFC = FC, mapEntrezID = entrez2name, colorGS_By = "robustZ")