From Uncovering Genome Mysteries at WCG: “Seven quadrillion comparisons later, Uncovering Genome Mysteries is just getting started”
By: Wim Degrave, Ph.D.
Laboratório de Genômica Funcional e Bioinformática Instituto Oswaldo Cruz – Fiocruz
26 Feb 2015
The Uncovering Genome Mysteries research team has started analyzing results from their massive ongoing project, which is comparing proteins between diverse organisms from around the world. Better understanding of similarities between proteomes should help scientists develop sustainable technologies, renewable materials, productive crops, and new treatments for stubborn diseases.
Uncovering Genome Mysteries researchers, left-to-right: Wim Degrave – Senior Researcher, Marcos Catanho – Adjunct Researcher and Ana Carolina Guimarães – Adjunct Researcher at the Oswaldo Cruz Foundation
The Uncovering Genome Mysteries (UGM) project started running on World Community Grid on October 16, 2014, with the daunting task of comparing all currently predicted protein sequences encoded in the genomes of a wide variety of living organisms, with special emphasis on microorganisms. The project expects to examine more than 200 million proteins, the majority of which were generated in environmental and ecological studies ranging from bacteria in marine ecosystems in Australia, to Amazon River samples from Brazil. Similarity data from these comparisons will lead to a better understanding of metabolic and structural functions of the predicted proteins in databases, and uncover many new features and cellular processes in microorganisms. Of the expected 20 quadrillion (20,000,000,000,000,000) comparisons in the project, about 36% have been completed thus far, equivalent to almost 8,000 CPU-years of computation.
This project involves cooperation between World Community Grid; the laboratory of Dr. Torsten Thomas and his team in the School of Biotechnology and Biomolecular Sciences & Centre for Marine Bio-Innovation at the University of New South Wales, Sydney, Australia; and the laboratory for Functional Genomics and Bioinformatics of Dr. Wim Degrave and his team at the Oswaldo Cruz Foundation – Fiocruz, in Brazil.
Volunteers participating in the UGM project process work units that contain sets of protein sequences predicted from a variety of organisms, and compare those against each other. Every time a significant similarity between two sequences is detected, a line of output is written that contains the coordinates and information on the statistical significance of the similarity. All of the output data together allow us to trace functional predictions of unknown sequences when they are similar to sequences with known functions, and indicate how organisms and their biochemistry, metabolic functions, and other cellular processes relate to one another.
The data resulting from those calculations are starting to be processed at Fiocruz and the University of New South Wales, and will later be presented in a database that will allow researchers to study the relationships between the proteins of all living things, to help develop a much better understanding of organisms in their (biodiverse) environment. Many applications in health, environment, and agriculture can be attributed to making use of such data. For example, they enabled the development of new strategies to fight pathogens that threaten human and animal health, and development of diagnostics, treatments, and preventions through appropriate design of vaccines. But there are many other applications to be discovered, in agriculture, industry or the environment, through the study of the wide variety of proteins and enzymes. For example, these might function as insecticides, antibiotics or enzymes that can degrade and eliminate waste or industrial pollutants such as oil or organic chemicals. Enzymes can aid in the synthesis and production of “green chemicals” and biotransformation systems, but also in the production of renewable energy such as bio-alcohols, or in more sophisticated systems through synthetic biology, where the engineering of microorganisms can optimize the production of biopharmaceuticals, green plastics and biofuels. A thorough knowledge of biochemical pathways and their regulation is necessary and is being addressed in part through projects like UGM, where the wide variety of enzymatic and biological functions in nature will become more available to the scientific community.
We deeply thank the World Community Grid volunteers who are contributing to this massive effort.
See the full article here.
“World Community Grid (WCG) brings people together from across the globe to create the largest non-profit computing grid benefiting humanity. It does this by pooling surplus computer processing power. We believe that innovation combined with visionary scientific research and large-scale volunteerism can help make the planet smarter. Our success depends on like-minded individuals – like you.”
WCG projects run on BOINC software from UC Berkeley.
BOINC is a leader in the field(s) of Distributed Computing, Grid Computing and Citizen Cyberscience.BOINC is more properly the Berkeley Open Infrastructure for Network Computing.
CAN ONE PERSON MAKE A DIFFERENCE? YOU BETCHA!!
“Download and install secure, free software that captures your computer’s spare power when it is on, but idle. You will then be a World Community Grid volunteer. It’s that simple!” You can download the software at either WCG or BOINC.
Please visit the project pages-
Outsmart Ebola together
World Community Grid is a social initiative of IBM Corporation