Cancer arises from the dysregulated cell proliferation caused by acquired mutations in key driver genes. With the rapid accumulation of cancer genomics alterations data, the major goal of cancer genome is to distinguish tumorigenesis driver mutations from passenger mutations, which may improve our understanding of the complex processes involved in cancer formation and progression and tail personalize therapies to a tumor’s mutational pattern. Nowadays, there have been numerous algorithms developed to uncover the genomics mutational signatures, but they are generally limited by their high computational complexity, high false-positive rate, and impracticality for clinical application. To elucidate the underlying mechanisms of cancer initiation, we believed that developing algorithms to identify mutation-driven modules that take into account the impact on patient prognosis while balancing mutation coverage and exclusivity may uncover intricate associations between mutations and survival, and will provide us with crucial insights for cancer diagnosis and treatment. This package attempts to develop a novel bioinformatics tool, ProgModule, to identify candidate driver modules for predicting the prognosis of patients by integrating exclusive coverage of mutations with clinical characteristics in cancer. The detailed flowchart of this package is shown as follows:
The ProgModule package is a bioinformatics tool to
identify driver modules for predicting the prognosis of cancer patients,
which balances the exclusive coverage of mutations and simultaneously
considers the mutation combination-mediated mechanism in cancer. And
ProgModule functions can be categorized into mainly
Analysis and Visualization modules. Each of these functions and a short
description is summarized as shown below:
1.Obtain non-silent
mutations frequency matrix.
2.Identify cohort-specific local
subnetworks.
3.Calculate the prognosis-related mutually exclusive
mutation (PRMEM) score of module.
4.Identify the prognosis-related
mutually exclusive mutation modules.
5.Visualization results:
5.1 Plot Patients’ Kaplan-Meier Survival Curves based on the mutation
status of driver module.
5.2 Plot patient-specific dysfunction
pathways and user-interested geneset mutually exclusive and
co-occurrence plots.
5.3 Plot patient-specific dysfunction
pathways’ waterfall plots.
5.4 Plot genes’ hotspot mutation
lollipop plots.
We downloaded patients’ mutation data from the TCGA database in Mutation Annotation Format (MAF) format. About the mutation status of a specific gene in a specific sample, we converted MAF format data into a mutation status matrix, in which every row represents the gene and every column represents the sample. In our study, we only extract the non-silent somatic mutations (nonsense mutation, missense mutation, frame-shift indels, splice site, nonstop mutation, translation start site, inframe indels) in protein-coding regions.The function get_mut_status in the ProgModule package can implement the above process. Take simulated data as an example, the command lines are as follows:
MAF files contain many fields ranging from chromosome names to cosmic annotations. However, most of the analysis in our uses the following fields.
#load the mutation annotation file
maf<-system.file("extdata","maffile.maf",package = "ProgModule")
maf_data<-read.delim(maf)
mutvariant<-maf_data[,c("Hugo_Symbol","Tumor_Sample_Barcode","Variant_Classification")]
#perform the function 'get_mut_status'
mut_status<-get_mut_status(mutvariant=mutvariant,nonsynonymous = TRUE)
#view the first five lines of mut_status matrix
mut_status[1:5,1:5]
#> TCGA-B0-5117-01A-01D-1421-08 TCGA-B0-5109-01A-02D-1421-08
#> ACTR8 1 0
#> PKHD1 1 0
#> MUC17 1 0
#> SMC3 1 0
#> LARP4 1 0
#> TCGA-A3-3367-01A-02D-1421-08 TCGA-B0-5120-01A-01D-1421-08
#> ACTR8 0 0
#> PKHD1 0 0
#> MUC17 0 0
#> SMC3 0 0
#> LARP4 0 0
#> TCGA-CZ-5453-01A-01D-1501-10
#> ACTR8 0
#> PKHD1 0
#> MUC17 0
#> SMC3 0
#> LARP4 0
The breadth-first search algorithm was then used to search
cohort-specific local subnetworks from protein-protein interaction(PPI)
networks, which starting at each driver gene obtained from NCG database
(defined as seed node) and iteratively exploring its neighbor mutation
genes until reaching a maximal number of genes (500 in our study), and
the maximum size of the local network is determined by users. The
function get_local_network in the
ProgModule package can implement the above process.
#load mutation matrix and PPI network
data(mut_status,subnet)
# find the local network of each gene
localnetwork<-get_local_network(network=subnet,freq_matrix=mut_status,max.size=500)