From Mapping Cancer Markers at WCG: “Mapping Cancer Markers Team Analyzes Lung Cancer Data”

New WCG Logo


World Community Grid (WCG)

By: The Mapping Cancer Markers research team
6 Apr 2017

In this project update, the Mapping Cancer Markers team describes how they are analyzing 45 million of the most promising lung cancer data results, and how they have begun to disseminate their early findings.

The Mapping Cancer Markers (MCM) project continues to process work units for the Ovarian Cancer dataset. As we accumulate these results, we continue to analyze MCM results from the previous Lung Cancer dataset. Below, we discuss one direction in which we are pursuing the analysis.

Patterns of gene-family signatures in lung cancer

In cancer, and human biology in general, multiple biomarkers (genes, proteins, microRNAs, etc.) can have similar patterns of activity. This may be because the genes serve redundant roles, or because the genes (or other molecules) participate together in a group to serve a biological function. A cancer signature composed of a set of specific genes may appear different than another signature composed of different, specific genes, and yet perform equivalently because the genes in each are functionally related. With this problem in mind, post-doctorate fellow Anne-Christin Hauschild is leading a study of frequently-occurring patterns (or motifs) of genes present in high-performing lung cancer gene signatures.

Illustration 1: Summary of the analysis workflow

This project looked at the first phase results from the Lung Cancer MCM analysis, which was a systematic exploration of the entire space of potential fixed-length signatures. We began by selecting 45 million high-performing signatures derived from World-Community-Grid-computed MCM results. These are the signatures evaluated to carry the most information for lung cancer diagnosis.

Next, we divided all genes in the lung cancer dataset into 180 clusters (gene families), where genes in each family show similar activity in the lung cancer dataset. We then labelled those top signatures with the gene families into which the genes were assigned. This gave us a set of high-performing signatures expressed as gene families instead of genes. This allowed us to treat two different gene signatures as the same gene-family signature, as long as the corresponding genes in each signature are members of the same family.

To help understand the gene-families themselves, we can visualize each one with word clouds that describe the functions of the genes they contain, or the biological pathways they represent. We draw this information from databases such as Gene Ontology, pathDIP, or other sources.

From there, we looked for patterns in these gene-family signatures: which families appear unusually frequently (or rarely) in high-performing signatures, or families that tend to appear multiple times in the same signature. We used Frequent-Itemset mining algorithm to discover specific patterns that occur unusually frequently in good signatures.

Illustration 2: Some gene families occur multiple times in a single signature with surprising frequency (high or low). Family 109 rarely appears multiple times. Family 12 appears surprisingly often in 9x multiples.

Illustration 3: Several important gene families, characterized by word clouds describing the genes’ molecular function annotations from the Gene Ontology database. Circles group families into common patterns found in high-performing signatures. Patterns often overlap, as in this example: one pattern containing families 3, 5, and 18 intersects with another containing families 12, 18, and 57.

Using databases such as IID or pathDIP, we can take these patterns and examine the relationships between the gene-families they contain, so we can start to understand why certain combinations of such families carry so much information about lung cancer. We use NAViGaTOR to visualize and explore these complex sets of relationships.

Illustration 4: Relationship between 11 significant gene families (large circles) within a protein interaction network. Only the most important genes (dots, colour-coded by biological function) in each family are shown.

We presented the preliminary results of this project to Canadian and international cancer researchers this February, in a poster at the Personalizing Cancer Medicine Conference 2017 in Toronto, Ontario. We gained many insights and ideas from discussing this early work, and we continue developing them further.

Some of the additional, related results have been presented in other publications, including:

Pinheiro, M., Drigo, S.A., Tonhosolo, R., Andrade, S.C.S., Marchi, F.A., Jurisica, I., Kowalski, L.P., Achatz, M.I., Rogatto, S.R., HABP2 p.G534E variant in patients with family history of thyroid and breast cancer, Oncotarget, In press.
Citron, F., Armenia, J., Barzan, L., Franchin, G., Polesel, J., Talamini, R., Sulfaro, S., Croce, C.M., Klement, W., Pastrello, C., Jurisica, I., Vecchione, A., Belletti, B., Baldassarre, G., A microRNA signature identifies SP1 and TGFbeta pathways as potential mediators of local recurrences in head and neck squamous carcinomas, Clin Cancer Res, In press.
Sokolina K, Kittanakom S, Snider J, Kotlyar M, Maurice P, Gandía J, Benleulmi-Chaachoua A, Tadagaki K, Wong V, Malty RH, Deineko V, Aoki H, Amin S, Riley L, Yao Z, Morató X, Otasek D, Kobayashi H, Menendez J, Auerbach D, Angers S, Pržulj N, Bouvier M, Babu M, Ciruela F, Jockers R, Jurisica I, and Stagljar I. Systematic protein-protein interaction mapping for clinically-relevant human GPCRs, Mol Sys Biol, In press.
Yao Z, Darowski K, St-Denis N, Wong V, Offensperger F, Villedieu A, Amin S, Malty R, Aoki H, Guo H, Xu Y, Iorio C, Kotlyar M, Emili A, Jurisica I, Babu M, Neel B, Gingras AC, and Stagljar I, A global analysis of the protein phosphatase interactome, Mol Cell, in press.
Petschnigg J, Kotlyar M, Blair L, Jurisica I, Stagljar I, and Ketteler R, Systematic identification of oncogenic EGFR interaction partners, J Mol Biol, in press.
Rahmati, S., Abovsky, M., Pastrello, C., Jurisica, I. pathDIP: An annotated resource for known and predicted human gene-pathway associations and pathway enrichment analysis. Nucl Acids Res, 45(D1): D419-D426, 2016.
Chehade, R., R. Pettapiece-Phillips, Salmena, L., Kotlyar, M., Jurisica, I., Narod, S. A., Akbari, M. R., Kotsopoulos, J. Reduced BRCA1 transcript levels in freshly isolated blood leukocytes from BRCA1 mutation carriers is mutation specific, Breast Cancer Res, 18(1): 87, 2016.
Cierna, Z., Mego, M., Jurisica, I., Machalekova, K., Chovanec, M., Miskovska, V., Svetlovska, D., Hainova, K., Kajo, K., Mardiak, J., Babal, P. Fibrillin-1 (FBN-1) a new marker of germ cell neoplasia in situ, BMC Cancer, 16: 597, 2016.

Thank you to members

This work would not be possible without the participation of World Community Grid Members. Thank you for generously contributing CPU cycles, and for your interest in this and other World Community Grid projects.

See the full article here.

Please help promote STEM in your local schools.

Stem Education Coalition

World Community Grid (WCG) brings people together from across the globe to create the largest non-profit computing grid benefiting humanity. It does this by pooling surplus computer processing power. We believe that innovation combined with visionary scientific research and large-scale volunteerism can help make the planet smarter. Our success depends on like-minded individuals – like you.”
WCG projects run on BOINC software from UC Berkeley.

BOINC is a leader in the field(s) of Distributed Computing, Grid Computing and Citizen Cyberscience.BOINC is more properly the Berkeley Open Infrastructure for Network Computing.

BOINC WallPaper



“Download and install secure, free software that captures your computer’s spare power when it is on, but idle. You will then be a World Community Grid volunteer. It’s that simple!” You can download the software at either WCG or BOINC.

Please visit the project pages-

FightAIDS@home Phase II


Rutgers Open Zika

Help Stop TB
WCG Help Stop TB
Outsmart Ebola together

Outsmart Ebola Together

Mapping Cancer Markers

Uncovering Genome Mysteries
Uncovering Genome Mysteries

Say No to Schistosoma

GO Fight Against Malaria

Drug Search for Leishmaniasis

Computing for Clean Water

The Clean Energy Project

Discovering Dengue Drugs – Together

Help Cure Muscular Dystrophy

Help Fight Childhood Cancer

Help Conquer Cancer

Human Proteome Folding




World Community Grid is a social initiative of IBM Corporation
IBM Corporation

IBM – Smarter Planet