From Columbia University: “GPT-4 AI Outperforms Experts at Identifying Cell Types”

Columbia U bloc

From Columbia University

3.26.24
Stephanie Berger
sb2247@cumc.columbia.edu

1

GPT-4, large language model created by OpenAI, can accurately interpret types of cells important for the analysis of single-cell RNA sequencing—a sequencing process fundamental to interpreting cell types. It does so with high consistency equivalent to the performance of human experts of gene information doing time-consuming manual annotation. Results of the study by researchers at Columbia University Mailman School of Public Health and Duke University School of Medicine are published in the journal Nature Methods.

Upon assessment across numerous tissue and cell types, GPT-4 demonstrated the ability to produce cell type annotations that closely align with manual annotations of human experts and surpass existing automatic algorithms. This feature has the potential to significantly lessen the amount of effort and expertise needed for annotating cell types, a process that can take months. Moreover, the researchers have developed GPTCelltype, an R software package, to facilitate the automated annotation of cell types using GPT-4.

“The process of annotating cell types for single cells is often time-consuming, requiring human experts to compare genes across cell clusters,” said Wenpin Hou, PhD, assistant professor of Biostatistics at Columbia Mailman School. “Although automated cell type annotation methods have been developed, manual methods to interpret scientific data remain widely used, and such a process can take weeks to months. We hypothesized that GPT-4 can accurately annotate cell types, transitioning the process from manual to a semi- or even fully automated procedure and be cost-efficient and seamless.”

The researchers assessed GPT-4’s performance across ten datasets covering five species, hundreds of tissue and cell types, and including both normal and cancer samples. GPT-4 was queried using GPTCelltype, the software tool developed by the researchers. For competing purposes, they also evaluated other GPT versions and manual methods as a reference tool.

As a first step, the researchers explored the various factors that may affect the annotation accuracy of GPT-4. They found that GPT-4 performs best when using the top 10 different genes and exhibits similar accuracy across various prompt strategies, including a basic prompt strategy, a chain-of-thought-inspired prompt strategy that includes reasoning steps, and a repeated prompt strategy. GPT-4 matched manual analyses in over 75 percent of cell types in most studies and tissues demonstrating its competency in generating expert-comparable cell type annotations. In addition, the low agreement between GPT-4 and manual annotations in some cell types does not necessarily imply that GPT-4’s annotation is incorrect. In an example of stromal or connective tissue cells, GPT-4 provides more accurate cell type annotations. GPT-4 was also notably faster.

Hou and her colleague also assessed GPT-4’s robustness in complex real data scenarios and found that GPT-4 can distinguish between pure and mixed cell types with 93 percent accuracy, and differentiated between known and unknown cell types with 99 percent accuracy. They also evaluated the performance of reproducing GPT-4’s methods using prior simulation studies. GPT-4 generated identical notations for the same marker genes in 85 percent of cases. “All of these results demonstrate GPT-4’s robustness in various scenarios,” observed Hou.

While GPT-4 surpasses existing methods, Hou said there are limitations to consider, including the challenges for verifying GPT-4’s quality and reliability because it discloses little about its training proceedings.

“Since our study focuses on the standard version of GPT-4, fine-tuning GPT-4 could further improve cell type annotation performance,” she said.

Zhicheng Ji of Duke University School of Medicine is a co-author.

The study was supported by the National Institutes of Health (grants AG075936, GM150887). The authors declare no competing interests.

See the full article here .

Comments are invited and will be appreciated, especially if the reader finds any errors which I can correct.

five-ways-keep-your-child-safe-school-shootings

Please help promote STEM in your local schools.

Stem Education Coalition

Columbia U Campus

Columbia University was founded in 1754 as King’s College by royal charter of King George II of England. It is the oldest institution of higher learning in the state of New York and the fifth oldest in the United States.

University Mission Statement

Columbia University is one of the world’s most important centers of research and at the same time a distinctive and distinguished learning environment for undergraduates and graduate students in many scholarly and professional fields. The University recognizes the importance of its location in New York City and seeks to link its research and teaching to the vast resources of a great metropolis. It seeks to attract a diverse and international faculty and student body, to support research and teaching on global issues, and to create academic relationships with many countries and regions. It expects all areas of the University to advance knowledge and learning at the highest level and to convey the products of its efforts to the world.

Columbia University is a private Ivy League research university in New York City. Established in 1754 on the grounds of Trinity Church in Manhattan Columbia is the oldest institution of higher education in New York and the fifth-oldest institution of higher learning in the United States. It is one of nine colonial colleges founded prior to the Declaration of Independence, seven of which belong to the Ivy League. Columbia is ranked among the top universities in the world by major education publications.

Columbia was established as King’s College by royal charter from King George II of Great Britain in reaction to the founding of Princeton College. It was renamed Columbia College in 1784 following the American Revolution, and in 1787 was placed under a private board of trustees headed by former students Alexander Hamilton and John Jay. In 1896, the campus was moved to its current location in Morningside Heights and renamed Columbia University.

Columbia scientists and scholars have played an important role in scientific breakthroughs including brain-computer interface; the laser and maser; nuclear magnetic resonance; the first nuclear pile; the first nuclear fission reaction in the Americas; the first evidence for plate tectonics and continental drift; and much of the initial research and planning for the Manhattan Project during World War II. Columbia is organized into twenty schools, including four undergraduate schools and 15 graduate schools. The university’s research efforts include the Lamont–Doherty Earth Observatory, the Goddard Institute for Space Studies, and accelerator laboratories with major technology firms such as IBM. Columbia is a founding member of the Association of American Universities and was the first school in the United States to grant the M.D. degree. With over 14 million volumes, Columbia University Library is the third largest private research library in the United States.

The university’s endowment stands among the largest of any academic institution. Columbia’s alumni, faculty, and staff have included: Founding Fathers of the United States—among them a co-author of the United States Constitution and a co-author of the Declaration of Independence; U.S. presidents; foreign heads of state; justices of the United States Supreme Court; Nobel laureates; Fields Medalists; many members of National Academy of Sciences; living billionaires; Olympic medalists; Academy Award winners; and Pulitzer Prize recipients.

Leave a comment