April 15, 2015
Cancer researchers are using the Abel supercomputer at the University of Oslo in Norway to detect which versions of genes are only found in cancer cells. Every form of cancer, even every tumour, has its own distinct variants.
“This charting may help tailor the treatment to each patient,” says Rolf Skotheim, who is affiliated with the Centre for Cancer Biomedicine and the research group for biomedical informatics at the University of Oslo, as well as the Department of Molecular Oncology at Oslo University Hospital.
His research group is working to identify the genes that cause bowel and prostate cancer, which are both common diseases. There are 4,000 new cases of bowel cancer in Norway every year. Only six out of ten patients survive the first five years. Prostate cancer affects 5,000 Norwegians every year. Nine out of ten survive.
Comparisons between healthy and diseased cells
In order to identify the genes that lead to cancer, Skotheim and his research group are comparing genetic material in tumours with genetic material in healthy cells. In order to understand this process, a brief introduction to our genetic material is needed:
Our genetic material consists of just over 20,000 genes. Each gene consists of thousands of base pairs, represented by a specific sequence of the four building blocks, adenine, thymine, guanine, and cytosine, popularly abbreviated to A, T, G, and C. The sequence of these building blocks is the very recipe for the gene. Our whole DNA consists of some six billion base pairs.
The DNA strand carries the molecular instructions for activity in the cells. In other words, DNA contains the recipe for proteins, which perform the tasks in the cells. DNA, nevertheless, does not actually produce proteins. First, a copy of DNA is made: this transcript is called RNA and it is this molecule that is read when proteins are produced.
RNA is only a small component of DNA, and is made up of its active constituents. Most of DNA is inactive. Only 1–2 % of the DNA strand is active.
In cancer cells, something goes wrong with the RNA transcription. There is either too much RNA, which means that far too many proteins of a specific type are formed, or the composition of base pairs in the RNA is wrong. The latter is precisely the area being studied by the University of Oslo researchers.
All genes can be divided into active and inactive parts. A single gene may consist of tens of active stretches of nucleotides (exons). “RNA is a copy of a specific combination of the exons from a specific gene in DNA,” explains Skotheim. There are many possible combinations, and it is precisely this search for all of the possible combinations that is new in cancer research.
Different cells can combine the nucleotides in a single gene in different ways. A cancer cell can create a combination that should not exist in healthy cells. And as if that didn’t make things complicated enough, sometimes RNA can be made up of stretches of nucleotides from different genes in DNA. These special, complex genes are called fusion genes.
“We need powerful computers to crunch the enormous amounts of raw data,” says Skotheim. “Even if you spent your whole life on this task, you would not be able to find the location of a single nucleotide.”
In other words, researchers must look for errors both inside genes and between the different genes. “Fusion genes are usually found in cancer cells, but some of them are also found in healthy cells,” says Skotheim. In patients with prostate cancer, researchers have found some fusion genes that are only created in diseased cells. These fusion genes may then be used as a starting-point in the detection of and fight against cancer.
The researchers have also found fusion genes in bowel cells, but they were not cancer-specific. “For some reason, these fusion genes can also be found in healthy cells,” adds Skotheim. “This discovery was a let-down.”
There are different RNA errors in the various cancer diseases. The researchers must therefore analyze the RNA errors of each disease.
Among other things, the researchers are comparing RNA in diseased and healthy tissue from 550 patients with prostate cancer. The patients that make up the study do not receive any direct benefits from the results themselves. However, the research is important in order to be able to help future patients.
“We want to find the typical defects associated with prostate cancer,” says Skotheim. “This will make it easier to understand what goes wrong with healthy cells, and to understand the mechanisms that develop cancer. Once we have found the cancer-specific molecules, they can be used as biomarkers.” In some cases, the biomarkers can be used to find cancer, determine the level of severity of the cancer and the risk of spreading, and whether the patient should be given a more aggressive treatment.
Even though the researchers find deviations in the RNA, there is no guarantee that there is appropriate, targeted medicine available. “The point of our research is to figure out more of the big picture,” says Skotheim. “If we identify a fusion gene that is only found in cancer cells, the discovery will be so important in itself that other research groups around the world will want to begin working on this straight away. If a cure is found that counteracts the fusion genes, this may have enormous consequences for the cancer treatment.”
Recreating RNA is laborious work. The set of RNA molecules consists of about 100 million bases, divided into a few thousand bases from each gene.
The laboratory machine reads millions of small nucleotides. Each one is only 100 base pairs long. In order for the researchers to be able to place them in the right location, they must run large statistical analyses. The RNA analysis of a single patient can take a few days.
All of the nucleotides must be matched with the DNA strand. Unfortunately the researchers do not have the DNA strands of each patient. In order to learn where the base pairs come from in the DNA strand, they must therefore use the reference genome of the human species. “This is not ideal, because there are individual differences,” explains Skotheim. The future potentially lies in fully sequencing the DNA of each patient when conducting medical experiments.
There is no way this research could be carried out using pen and paper. “We need powerful computers to crunch the enormous amounts of raw data. Even if you spent your whole life on this task, you would not be able to find the location of a single nucleotide. This is a matter of millions of nucleotides that must be mapped correctly in the system of coordinates of the genetic material. Once we have managed to find the RNA versions that are only found in cancer cells, we will have made significant progress. However, the work to get that far requires advanced statistical analyses and supercomputing,” says Skotheim.
The analyses are so demanding that the researchers must use the University of Oslo’s Abel supercomputer, which has a theoretical peak performance of over 250 teraFLOPS. “With the ability to run heavy analyses on such large amounts of data, we have an enormous advantage not available to other cancer researchers,” explains Skotheim. “Many medical researchers would definitely benefit from this possibility. This is why they should spend more time with biostatisticians and informaticians. RNA samples are taken from the patients only once. The types of analyses that can be run are only limited by the imagination.”
“We need to be smart in order to analyze the raw data.” He continues: “There are enormous amounts of data here that can be interpreted in many different ways. We just got started. There is lots of useful information that we have not seen yet. Asking the right questions is the key. Most cancer researchers are not used to working with enormous amounts of data, and how to best analyze vast data sets. Once researchers have found a possible answer, they must determine whether the answer is chance or if it is a real finding. The solution is to find out whether they get the same answers from independent data sets from other parts of the world.”
See the full article here.
iSGTW is an international weekly online publication that covers distributed computing and the research it enables.
“We report on all aspects of distributed computing technology, such as grids and clouds. We also regularly feature articles on distributed computing-enabled research in a large variety of disciplines, including physics, biology, sociology, earth sciences, archaeology, medicine, disaster management, crime, and art. (Note that we do not cover stories that are purely about commercial technology.)
In its current incarnation, iSGTW is also an online destination where you can host a profile and blog, and find and disseminate announcements and information about events, deadlines, and jobs. In the near future it will also be a place where you can network with colleagues.
You can read iSGTW via our homepage, RSS, or email. For the complete iSGTW experience, sign up for an account or log in with OpenID and manage your email subscription from your account preferences. If you do not wish to access the website’s features, you can just subscribe to the weekly email.”