From California Institute of Technology (US) : “A Swiss Army Knife for Genomic Data”

Caltech Logo

From California Institute of Technology (US)

April 02, 2021
No writer credit

Credit: California Institute of Technology

A good way to find out what a cell is doing—whether it is growing out of control as in cancers, or is under the control of an invading virus, or is simply going about the routine business of a healthy cell—is to look at its gene expression. Though a vast majority of cells in an organism all contain the same genes, how those genes are expressed is what gives rise to different cell types—the difference between a muscle cell and a neuron, for example.

In the last decade, technologies to measure gene expression in individual cells have revolutionized biology. No longer do biologists need to average out gene expression over many cells within tissues; now they can detect which genes are active in each cell at any time.

Computational power has struggled to keep up with this explosion of data, however. For example, a single experiment can look at 100,000 cells and measure information from hundreds of thousands of transcripts (fragments of RNA produced when a gene is active), resulting in tens of billions of sequenced fragments. Genomic data from single-cell sequencing can take up terabytes of space and take hours or days to process on large computing servers.

Now, a new software tool enables the processing of large sets of genomic data in about 30 minutes using the computing power of an average laptop. Like a Swiss Army knife, the tool can be used in myriad ways for different biological needs, and will help ensure the reproducibility of scientific studies.

The tool, which is available online and open for anyone to use, now is being adapted by another research team to study the SARS-CoV-2 virus in samples collected from screening tests.

The research was conducted as a collaboration between the laboratory of Lior Pachter (BS ’94), Bren Professor of Computational Biology and Computing and Mathematical Sciences, and Páll Melsted, professor of computer science at the University of Iceland (IS). Melsted is a co-first author along with graduate student Sina Booeshaghi (MS ’19). A paper describing the research appears in the journal Nature Biotechnology on April 1.

“There are many examples of different groups using different technologies to study the same tissues, for example, the brain,” says Booeshaghi. “Processing all of this data with the same engine—our technique—facilitates integrating the data. Our tool is fast, efficient, and allows for easy reprocessing, which is very important for consistency and reproducibility in science.”

Developing this complex software tool “in-house” was important for it to actually address potential users’ concerns, because the potential users were right there in the lab.

“The interdisciplinarity of our team was crucial to conceiving of and executing this project,” says Pachter. “There are people in the lab who are computer scientists, biologists, engineers. Sina is in the mechanical engineering department and brings the perspective of his design background and engineering; Páll has a strong background in theoretical computer science and software engineering.”

The ease-of-use, low cost, and modularity of these tools will enable consistent and reproducible preprocessing of genomic data for large consortiums such as the Human Cell Atlas and the Brain Initiative Cell Census Network.

See the full article here .

Please help promote STEM in your local schools.

Stem Education Coalition

Caltech campus

The California Institute of Technology (US) is a private research university in Pasadena, California. The university is known for its strength in science and engineering, and is one among a small group of institutes of technology in the United States which is primarily devoted to the instruction of pure and applied sciences.

Caltech was founded as a preparatory and vocational school by Amos G. Throop in 1891 and began attracting influential scientists such as George Ellery Hale, Arthur Amos Noyes, and Robert Andrews Millikan in the early 20th century. The vocational and preparatory schools were disbanded and spun off in 1910 and the college assumed its present name in 1920. In 1934, Caltech was elected to the Association of American Universities, and the antecedents of National Aeronautics and Space Administration (US)’s Jet Propulsion Laboratory, which Caltech continues to manage and operate, were established between 1936 and 1943 under Theodore von Kármán.

Caltech has six academic divisions with strong emphasis on science and engineering. Its 124-acre (50 ha) primary campus is located approximately 11 mi (18 km) northeast of downtown Los Angeles. First-year students are required to live on campus, and 95% of undergraduates remain in the on-campus House System at Caltech. Although Caltech has a strong tradition of practical jokes and pranks, student life is governed by an honor code which allows faculty to assign take-home examinations. The Caltech Beavers compete in 13 intercollegiate sports in the NCAA Division III’s Southern California Intercollegiate Athletic Conference (SCIAC).

As of October 2020, there are 76 Nobel laureates who have been affiliated with Caltech, including 40 alumni and faculty members (41 prizes, with chemist Linus Pauling being the only individual in history to win two unshared prizes). In addition, 4 Fields Medalists and 6 Turing Award winners have been affiliated with Caltech. There are 8 Crafoord Laureates and 56 non-emeritus faculty members (as well as many emeritus faculty members) who have been elected to one of the United States National Academies. Four Chief Scientists of the U.S. Air Force and 71 have won the United States National Medal of Science or Technology. Numerous faculty members are associated with the Howard Hughes Medical Institute(US) as well as National Aeronautics and Space Administration(US). According to a 2015 Pomona College(US) study, Caltech ranked number one in the U.S. for the percentage of its graduates who go on to earn a PhD.


Caltech is classified among “R1: Doctoral Universities – Very High Research Activity”. Caltech was elected to the Association of American Universities in 1934 and remains a research university with “very high” research activity, primarily in STEM fields. The largest federal agencies contributing to research are National Aeronautics and Space Administration(US); National Science Foundation(US); Department of Health and Human Services(US); Department of Defense(US), and Department of Energy(US).

In 2005, Caltech had 739,000 square feet (68,700 m^2) dedicated to research: 330,000 square feet (30,700 m^2) to physical sciences, 163,000 square feet (15,100 m^2) to engineering, and 160,000 square feet (14,900 m^2) to biological sciences.

In addition to managing JPL, Caltech also operates the Caltech Palomar Observatory(US); the Owens Valley Radio Observatory(US);the Caltech Submillimeter Observatory(US); the W. M. Keck Observatory at the Mauna Kea Observatory(US); the Laser Interferometer Gravitational-Wave Observatory at Livingston, Louisiana and Richland, Washington; and Kerckhoff Marine Laboratory(US) in Corona del Mar, California. The Institute launched the Kavli Nanoscience Institute at Caltech in 2006; the Keck Institute for Space Studies in 2008; and is also the current home for the Einstein Papers Project. The Spitzer Science Center(US), part of the Infrared Processing and Analysis Center(US) located on the Caltech campus, is the data analysis and community support center for NASA’s Spitzer Infrared Space Telescope [no longer in service].

Caltech partnered with University of California at Los Angeles(US) to establish a Joint Center for Translational Medicine (UCLA-Caltech JCTM), which conducts experimental research into clinical applications, including the diagnosis and treatment of diseases such as cancer.

Caltech operates several Total Carbon Column Observing Network(US) stations as part of an international collaborative effort of measuring greenhouse gases globally. One station is on campus.