A computational approach for prioritization of patient-specific cancer drivers
Abstract
major challenge in cancer genomics is to distinguish the driver mutations that are causally linked to cancer from passenger mutations that are neutral and do not contribute to cancer development. The identification of these driver genes could lead to the development of therapies. Numerous methods have been proposed for this problem; however, the majority of these methods provide a single driver gene list for the entire cohort of patients. On the other hand, mutational profiles of cancer patients show a high degree of mutational heterogeneity. As such, because the set of driver genes can be distinct for each patient, a more ideal approach is to identify patient-specific drivers. The results from such an approach can lead to the development of personalized treatments and therapies. In this thesis, we develop a computational approach that integrates genomic data, biological
pathways, and protein connectivity information to identify patient-specific cancer driver genes. We construct a bipartite graph that relates specific mutated genes and various outliers for each specific patient. For each patient, we rank the mutated genes based on a convex combination of two terms. The first term is a weighted scoring of the number of connections to outlier genes of that patient as well as the outlier genes of other patients.
The second term incorporates the co-occurrences of a mutated gene and an outlier gene within the same pathway. We compare our method against state-of-the-art patient-specific cancer gene prioritization methods on patients and cell line data for colon, lung, and headneck cancer. We define novel reference gene sets for evaluation of results obtained from
cell line data by utilizing drug sensitivity datasets. Furthermore, we propose and discuss alternative approaches for evaluating the recovery of known cancer drivers when patientspecific drivers are provided. Overall, we show that our method can better recover known and rare cancer genes based on various reference compared to other approaches. Additionally, we demonstrate the importance of pathway coverage in the identification and ranking of driver genes.