From Mapping Cancer Markers at WCG: “Project roadmap and first phase results from the Mapping Cancer Markers team”

Mapping Cancer Markers

Mapping Cancer Markers Banner

Mapping Cancer Markers

By: The Mapping Cancer Markers research team
10 Jul 2014

The lead researcher for Mapping Cancer Markers presents a roadmap for the project to analyze signatures for 4 types of cancer: lung, ovarian, prostate and sarcoma; an update on his team’s progress thus far, and an invitation to join the research team in an August cancer fundraiser.

On behalf of the Mapping Cancer Markers team, we want to start by saying thank you! In just 7 months, World Community Grid members have donated over 60,000 years of processing time to support our research. As a result, we are nearly done with the “benchmarking” portion of the project, which determines the characteristics of our search space. Over the coming months and years, we will pursue more targeted approaches to discover relevant gene signatures. Today we want to give you both a high-level roadmap and some further detail about what is happening with the project.

Project roadmap

The project is anticipated to run for two years, and we plan to analyze signatures for 4 different types of cancer. At the moment, we’re enlisting your help to process research tasks for lung cancer, and will move on to ovarian cancer, prostate cancer and sarcoma.

Currently, the Mapping Cancer Markers project has two phases:

In the first phase we have been attempting to set a benchmark for further experiments.
The second phase will be geared towards finding clinically useful molecular signatures, initially focusing on gene signatures that can predict the occurrence of various types of cancer.

We expect a smooth transition between the two phases, with no interruption in work. The “benchmarking” phase of our project is important not only for our own research, but for other researchers around the world. Every year, numerous groups worldwide develop and publish interesting molecular signatures for various diseases, including multiple cancers. One of the challenges of interpreting these findings is that many of the reports are not directly comparable to each other. The benchmarking phase of our project is designed to set a standard benchmark so that we and other groups can estimate how well individual signatures perform.

You can think of this benchmarking phase as a bit like designing an IQ test. By establishing a standard test and scoring system, we can evaluate any person’s intelligence. The results from the first phase of Mapping Cancer Markers will allow us to create such a test for existing and future gene signatures, so that we can tell which ones have the best predictive ability.


Our preliminary analysis of the work units processed so far (roughly 26 billion gene signatures) is focused on the nature of genes in the signatures, measuring their quality by assessing how accurately they contribute to identifying patients with poor prognosis. On the analytics side, we have also been evaluating the use of a software package to aid with post-processing our results.

One of the goals of the first project phase is to understand if some genes might have better predictive ability than others. To do this, we took the top 0.1% of the gene signatures and identified the individual genes that make up each signature. For each gene, we looked at how many times it occurred within top scoring signatures and plotted the scores of those signatures (see figure below). The blue line shows the average of all of the genes together. The red line highlights the worst-performing single gene while the green line indicates our best-performing gene. The average of all the genes is very similar to the worst single gene. This is not surprising, because most genes are likely to have poor predictive ability. However, we are looking for the few genes that stand out from the field. In other words, if we have 1 million potential gene signatures, and we look at the top 1,000 scoring signatures, we can find groups of genes such as the one shown in green, which have better predictive ability.

This information is important because if we know which genes have the best predictive ability, it may help us and other researchers to evaluate the value of other signatures: if an unknown signature has one of the top genes in it, it is likely to be a useful signature for identifying, assessing, predicting or treating a disease.

As a side note, this benchmarking process is why members may have experienced shorter or longer than usual runtimes over the past several months. The core algorithm of the Mapping Cancer Markers engine, used to evaluate each potential gene signature, has a processing time that is highly dependent on the statistical characteristics of each signature. The search space targeted by a single work unit can sometimes contain time-consuming signatures, which together lead to a longer total runtime. This also means variability with the size of Mapping Cancer Markers results. A typical work unit will evaluate tens of thousands of potential gene signatures, many of which are of low quality. Signatures below a certain quality threshold are removed from the returned results. However, the search space targeted by a single work unit can sometimes contain a high proportion of high-quality gene signatures. If this happens, the result file is larger than usual.

Funding & Fundraising

We’re happy to report that there are several potential sources for further funding. Applications are in progress with the Ontario Research Fund, the Canada Foundation for Innovation, and the US Department of Defense. Of course, the free computing power provided by World Community Grid volunteers is absolutely essential to our research. However, additional funding will help us to both leverage contributions from volunteers, and fully utilize findings of the Mapping Cancer Markers computations, with a primary focus on lung and ovarian cancer.

Finally, if you will be in Ontario between 15-17 August, please consider donating to, or cheering on the Team Ian Ride from Kingston to Montreal, which raises money for the Ian Lawson Van Toch Cancer Informatics Fund at the Princess Margaret Cancer Centre (if you are interested, please contact us about joining the Team Ian ride this or next year). If you can join us, it will give you the chance to meet some of the research team, as well as raise money for a worthy cause and participate in an outstanding event. For more details visit:

Cancers, one of the leading causes of death worldwide, come in many different types and forms in which uncontrolled cell growth can spread to other parts of the body. Unchecked and untreated, cancer can spread from an initial site to other parts of the body and ultimately lead to death. The disease is caused by genetic or environmental changes that interfere with biological mechanisms that control cell growth. These changes, as well as normal cell activities, can be detected in tissue samples through the presence of their unique chemical indicators, such as DNA and proteins, which together are known as “markers.” Specific combinations of these markers may be associated with a given type of cancer.

The pattern of markers can determine whether an individual is susceptible to developing a specific form of cancer, and may also predict the progression of the disease, helping to suggest the best treatment for a given individual. For example, two patients with the same form of cancer may have different outcomes and react differently to the same treatment due to a different genetic profile. While several markers are already known to be associated with certain cancers, there are many more to be discovered, as cancer is highly heterogeneous.

Mapping Cancer Markers on World Community Grid aims to identify the markers associated with various types of cancer. The project is analyzing millions of data points collected from thousands of healthy and cancerous patient tissue samples. These include tissues with lung, ovarian, prostate, pancreatic and breast cancers. By comparing these different data points, researchers aim to identify patterns of markers for different cancers and correlate them with different outcomes, including responsiveness to various treatment options.

This project runs on BOINC software. Visit BOINC or WCG, download and install the software and attach to the project. While you are at BOINC and WCG, look over the other projects for some that you might find of interest.



ScienceSprings is powered by MAINGEAR computers