From Uncovering Genome Mysteries Project at WCG: “Analysis Underway on 30 Terabytes of Data”

New WCG Logo


World Community Grid (WCG)

24 Nov 2017 [In social media just now]

The Uncovering Genome Mysteries data (all 30 terabytes) was transferred to the research teams in Brazil and Australia this year. Now, the researchers are analyzing this vast amount of data, and looking for ways to make it easy for other scientists and the public to understand.


Last year, World Community Grid volunteers completed the calculations for the Uncovering Genome Mysteries project, which examined approximately 200 million genes from a wide variety of life forms to help discover new protein functions. The project’s main goals include:

Discovering new protein functions and augmenting knowledge about biochemical processes in general
Identifying how organisms interact with each other and the environment
Documenting the current baseline microbial diversity, allowing a better understanding of how microorganisms change under environmental stresses, such as climate change
Understanding and modeling complex microbial systems

The data generated by World Community Grid volunteers has been regrouped on the new bioinformatics server at the Oswaldo Cruz Foundation (Fiocruz), under the direction of Dr. Wim Degrave. Additionally, a full copy of all data has been sent to co-investigator Dr. Torsten Thomas and his team from the Centre for Marine Bio-Innovation & the School of Biological, Earth and Environmental Sciences at the University of New South Wales in Sydney, Australia. At the University of New South Wales, the results from protein comparisons will help to interpret the analyses of marine bacterial ecosystems, where micro-organisms, coral reef, sponges and many other intriguing creatures interact and form their life communities. The dataset, more than 30 terabytes under highly compressed form, took a few months to be transferred from Brazil to Australia.

Data Processing and Analysis at Fiocruz

The Fiocruz team has been busy with the further processing of the primary output of the project. In the workflow, raw data are expanded and deciphered, associated with the correct inter-genome comparisons, checked for errors, tabulated, and associated with many different data objects to transform that into meaningful information.

The team is dealing with the rapidly growing size of the database, and purchased and installed new hardware (600 Tb) to help accommodate all the data. They also wish to build a database interface that appeals to the general public interested in biodiversity, and not only to scientists who specialize in functional analysis of encoded proteins in genomes of particular life forms.

Some of the data are currently being used in projects such as vaccine and drug design against arboviruses such as Zika, dengue, and yellow fever viruses, but also for understanding of the interaction of bacteria with their environment and how this reflects in their metabolic pathways, when free living bacteria are compared with their close relatives that are human pathogens, such as Mycobacterium tuberculosis versus environmental mycobacteria.

Searching for Partnerships

Fiocruz is looking for partnerships that would add extra data analytics and artificial intelligence to the project. The researchers would like to include visualizations of functional connections between organisms as well as particularities from a wide variety of organisms, including deep sea thermal vent archaeal bacteria; bacteria and protists (any one-celled organism that is not an animal, plant or fungus) from soil, water, land, and sea or important for human, animal, or plant health; and highly complex plant, animal, and human genomes.

We thank everyone who participated in the World Community Grid portion of this project, and look forward to sharing more updates as we continue to analyze the data.

See the full article here.

Ways to access the blog:

Please help promote STEM in your local schools.

Stem Education Coalition

World Community Grid (WCG) brings people together from across the globe to create the largest non-profit computing grid benefiting humanity. It does this by pooling surplus computer processing power. We believe that innovation combined with visionary scientific research and large-scale volunteerism can help make the planet smarter. Our success depends on like-minded individuals – like you.”
WCG projects run on BOINC software from UC Berkeley.

BOINC is a leader in the field(s) of Distributed Computing, Grid Computing and Citizen Cyberscience.BOINC is more properly the Berkeley Open Infrastructure for Network Computing.

BOINC WallPaper


“Download and install secure, free software that captures your computer’s spare power when it is on, but idle. You will then be a World Community Grid volunteer. It’s that simple!” You can download the software at either WCG or BOINC.

Please visit the project pages-
Smash Childhood Cancer4

FightAIDS@home Phase II


Rutgers Open Zika

Help Stop TB
WCG Help Stop TB
Outsmart Ebola together

Outsmart Ebola Together

Mapping Cancer Markers

Uncovering Genome Mysteries
Uncovering Genome Mysteries

Say No to Schistosoma

GO Fight Against Malaria

Drug Search for Leishmaniasis

Computing for Clean Water

The Clean Energy Project

Discovering Dengue Drugs – Together

Help Cure Muscular Dystrophy

Help Fight Childhood Cancer

Help Conquer Cancer

Human Proteome Folding




World Community Grid is a social initiative of IBM Corporation
IBM Corporation

IBM – Smarter Planet