Tagged: Genomics Toggle Comment Threads | Keyboard Shortcuts

  • richardmitnick 6:45 pm on June 2, 2016 Permalink | Reply
    Tags: A massive approach to finding what's "real" in genome-wide association data, , , Genomics   

    From Broad Institute: “A massive approach to finding what’s “real” in genome-wide association data” 

    Broad Institute

    Broad Institute

    June 2nd, 2016
    Tom Ulrich

    What could we learn if we probed the subtle effects of thousands of DNA variations on gene expression, all at once? Two recent Cell papers hint at how an assay called MPRA could help us get there.

    MPRA could help reveal DNA variants’ subtle influences on diseases and traits. Image: Sigrid Knemeyer.

    Genome-wide association studies (GWAS) have been a boon for geneticists by revealing thousands of genetic variants associated with human disease. At the same time, GWAS are the bane of geneticists because they reveal thousands of genetic variants associated with human disease. Which variants are the drivers, the ones that truly cause or contribute to disease development and progression?

    “With GWAS, you get a set of signals, which can tell you which regions of the genome are associated with a particular disease or trait,” said Vijay Sankaran, a Broad associate member and a pediatric hematologist/oncologist at Dana-Farber/Boston Children’s Cancer and Blood Disorders Center who studies blood cell disorders. “But it’s hard to know which hits are causal hits, and which are just going along for the ride.”

    The picture gets particularly complicated when talking about variants in non-coding DNA, including the vast stretches of DNA containing sequences that control gene expression. By some estimates, between 85 and 90 percent of the variants picked up by GWAS lie in such regions.

    Many scientists are trying to figure out how to connect the dots between non-coding GWAS variants and human biology, health, and ultimately, disease. Three Broad teams, led by Sankaran, Pardis Sabeti, and Broad alum Tarjei Mikkelsen (now with the biotechnology company 10X Genomics), respectively, have focused their efforts on scaling up a staple of the genomics toolkit — the reporter assay — to create a massively parallel reporter assay (MPRA).

    “We want to move from understanding the component pieces of the genome to understanding what changes in those components do,” said Sabeti, an institute member and Harvard computational geneticist and evolutionary biologist, whose lab probes the role genetic variation writ large plays in human and microbial evolution. “We need very sensitive technology to be able to identify these functional changes, particularly if they’re subtle.”

    Access mp4 video here .

    Going massive

    The reporter assay helps scientists sift through GWAS data to find variants that truly affect gene expression or function. A researcher takes a DNA fragment from what may be an enhancer, couples it within a plasmid to a “reporter” gene that provides a readout (e.g., the luciferase gene), and inserts the plasmid into cells. If the readout materializes (e.g., if the cells glow), the enhancer sequence drove expression of the reporter. By running the assay with different variations of the same fragment, a pattern can emerge suggesting whether certain variants affect expression.

    Such classic reporter assays, however, have one major disadvantage: They don’t scale to the level needed to investigate the thousands to tens of thousands of variants that might turn up in a GWAS.

    Mikkelsen and Broad research scientist Alexandre Melnikov worked out the principles of one flavor of MPRA while working in the lab of Broad founding director and president Eric Lander. In a 2012 Nature Biotechnology paper*, they noted that tagging each plasmid with a short, unique DNA barcode provided a second readout. By sequencing and counting the mRNAs produced from each plasmid, they could easily identify the variant(s) with the greatest influence on gene expression and quantify the magnitude of that influence.

    And because each barcode was unique to each plasmid, Mikkelsen and Melnikov’s team could pool and assay thousands of variants simultaneously.

    Homing in on blood cell traits

    Sankaran’s lab is the latest to make use of Mikkelsen and Melnikov’s MPRA system, harnessing it to scrutinize more than 2,750 non-coding variants in 75 GWAS hits linked to red blood cell traits. And as he, Mikkelsen, and co-first authors Jacob Ulirsch and Satish Nandakumar reported** in Cell, MPRA data pointed to 32 hits that actually had some impact on gene expression. They then used additional computational and functional assays to further probe the effects of a subset of these variants on red blood cell traits, as a result revealing that several known genes may have heretofore-unrecognized roles in blood cell development.

    “One of the unexpected lessons we learned was that many of the variants tweaked a master blood development regulator, GATA1,” said Ulirsch, a staff scientist in Sankaran’s lab. “There was a common pattern. Going one by one, variant by variant, we would never have been able to see this.”

    Clockwise from top left: Pardis Sabeti, Vijay Sankaran, Satish
    Nandakumar, Ryan Tewhey, Jacob Ulirsch. Photo: Megan Purdum

    Building MPRA 2.0

    While Mikkelsen and Melnikov’s original method is quite powerful, Sabeti’s lab wanted to see if they could make it even more robust.

    “The original version of MPRA is limited in how many variants you can test,” said Ryan Tewhey, a postdoctoral fellow in Sabeti’s lab. “We wanted to know, can you expand this technology out? Can you test tens of thousands of variants at once? And can you make it more sensitive?”

    Tewhey, Sabeti, and their team doubled the length of each DNA barcode and upped the number of barcodes to as many as 350 per variant. They then used their enhanced assay to study more than 32,000 possible B cell regulatory variants identified by the 1000 Genomes Project, deeply characterizing one associated with risk of ankylosing spondylitis (an autoimmune disease). They also highlighted another 842 candidate variants, including 53 particularly promising ones associated with human traits and diseases.

    As they discussed in their own Cell paper***, the added barcodes reduced the noise in their data and increased the assay’s overall sensitivity.

    “With more barcodes you can start to detect more subtle changes in expression, including changes that might arise from differences between alleles,” Tewhey added.
    Another view into regulation

    MPRA isn’t the only approach for pulling causal needles out of GWAS haystacks, and Tewhey is realistic that it won’t be a panacea for studying all of the cell’s mechanisms for regulating expression.

    “For promoters and enhancers, we know it works well,” he said. “For things related to long distance connectivity or the genome’s shape, we’re not as confident. ”

    Sankaran points out that MPRA really shines in its ability to find themes in genetic variation that researchers can marry to other genetic, structural, or functional data.

    “When you start to get all these independent pieces together, you get a real fine view of what’s important,” he said.

    Papers cited:

    Melnikov A, Murugan A, et al. Systematic dissection and optimization of inducible enhancers in human cells using a massively parallel reporter assay. Nature Biotechnology. February 26, 2012. DOI: 10.1038/nbt.2137

    Ulirsch JC, Nandakumar SK, et al. Systematic functional dissection of common genetic variation affecting red blood cell traits. Cell. June 2, 2016. DOI: 10:1016/j.Cell.2016.04.048

    Tewhey R, Kotliar D, et al. Direct identification of hundreds of expression-modulating variants using a multiplexed reporter assay. Cell. June 2, 2016. DOI: 10:1016/j.cell.2016.04.027

    *Science paper:
    Systematic dissection and optimization of inducible enhancers in human cells using a massively parallel reporter assay

    **Science paper:
    Systematic Functional Dissection of Common Genetic Variation Affecting Red Blood Cell Traits
    ***Science paper:
    Direct Identification of Hundreds of Expression-Modulating Variants using a Multiplexed Reporter Assay

    See the full article here .

    Please help promote STEM in your local schools.

    STEM Icon

    Stem Education Coalition

    Broad Institute Campus

    The Eli and Edythe L. Broad Institute of Harvard and MIT is founded on two core beliefs:

    This generation has a historic opportunity and responsibility to transform medicine by using systematic approaches in the biological sciences to dramatically accelerate the understanding and treatment of disease.
    To fulfill this mission, we need new kinds of research institutions, with a deeply collaborative spirit across disciplines and organizations, and having the capacity to tackle ambitious challenges.

    The Broad Institute is essentially an “experiment” in a new way of doing science, empowering this generation of researchers to:

    Act nimbly. Encouraging creativity often means moving quickly, and taking risks on new approaches and structures that often defy conventional wisdom.
    Work boldly. Meeting the biomedical challenges of this generation requires the capacity to mount projects at any scale — from a single individual to teams of hundreds of scientists.
    Share openly. Seizing scientific opportunities requires creating methods, tools and massive data sets — and making them available to the entire scientific community to rapidly accelerate biomedical advancement.
    Reach globally. Biomedicine should address the medical challenges of the entire world, not just advanced economies, and include scientists in developing countries as equal partners whose knowledge and experience are critical to driving progress.

    Harvard University

    MIT Widget

  • richardmitnick 11:41 am on December 22, 2015 Permalink | Reply
    Tags: , Genomics,   

    From Harvard: “Researchers help cells forget who they are” 

    Harvard University

    Harvard University

    December 21, 2015
    Hannah Robbins, Harvard Stem Cell Institute Communications

    Erasing a cell’s memory makes it easier to manipulate them into becoming another type of cell

    Induced pluripotent stem cell colonies generated after researchers at Harvard Stem Cell Institute suppressed the CAF1 gene. Photo by Sihem Cheloufi

    They say we can’t escape our past — no matter how much we change, we still have the memory of what came before. The same can be said of our cells.

    Mature cells, such as skin or blood cells, have a cellular “memory,” or record of how the cell changed as it developed from an uncommitted embryonic cell into a specialized adult cell. Now, Harvard Stem Cell Institute researchers at Massachusetts General Hospital (MGH), in collaboration with scientists from the Institutes of Molecular Biotechnology (IMBA) and Molecular Pathology (IMP) in Vienna, have identified genes that, when suppressed effectively, erase a cell’s memory, making it more susceptible to reprogramming and, consequently, making the process of reprogramming quicker and more efficient.

    The study was recently published in Nature.

    “We began this work because we wanted to know why a skin cell is a skin cell, and why does it not change its identity the next day, or the next month, or a year later?” said co-senior author Konrad Hochedlinger, an HSCI principal faculty member at MGH and Harvard’s Department of Stem Cell and Regenerative Biology, and a world expert in cellular reprogramming.

    Every cell in the human body has the same genome, or DNA blueprint, explained Hochedlinger, and it is how those genes are turned on and off during development that determines what kind of adult cell each becomes. By manipulating those genes and introducing new factors, scientists can unlock dormant parts of an adult cell’s genome and reprogram it into another cell type.

    However, “a skin cell knows it is a skin cell,” said IMBA’s Josef Penninger, even after scientists reprogram those skin cells into induced pluripotent stem cells (iPS cells) — a process that would ideally require a cell to “forget” its identity before assuming a new one.

    Cellular memory is often conserved, acting as a roadblock to reprogramming. “We wanted to find out which factors stabilize this memory and what mechanism prevents iPS cells from forming,” Penninger said.

    To identify potential factors, the team established a genetic library targeting known chromatin regulators — genes that control the packaging and bookmarking of DNA, and are involved in creating cellular memory.

    Hochedlinger and Sihem Cheloufi, co-first author and a postdoc in Hochedlinger’s lab, designed a screening approach that tested each of these factors.

    Of the 615 factors screened, the researchers identified four chromatin regulators, three of which had not yet been described, as potential roadblocks to reprogramming. In comparison to the three- to fourfold increase seen by suppressing previously known roadblock factors, inhibiting the newly described chromatin assembly factor 1 (CAF1) made the process 50- to 200-fold more efficient. Moreover, in the absence of CAF1, reprogramming turned out to be much faster: While the process normally takes nine days, the researchers could detect the first iPS cell after four days.

    “The CAF1 complex ensures that during DNA replication and cell division, daughter cells keep their memory, which is encoded on the histones that the DNA is wrapped around,” said Ulrich Elling, a co-first author from IMBA. “When we block CAF1, daughter cells fail to wrap their DNA the same way, lose this information, and covert into blank sheets of paper. In this state, they respond more sensitively to signals from the outside, meaning we can manipulate them much more easily.”

    By suppressing CAF1 the researchers were also able to facilitate the conversion of one type of adult cell directly into another, skipping the intermediary step of forming iPS cells, via a process called direct reprogramming, or transdifferentiation. Thus, CAF1 appears to act as a general guardian of cell identity whose depletion facilitates both the interconversion of one adult cell type to another as well as the conversion of specialized cells into iPS cells.

    In finding CAF1, the researchers identified a complex that allows cell memory to be erased and rewritten. “The cells forget who they are, making it easier to trick them into becoming another type of cell,” said Cheloufi.

    CAF1 may provide a general key to facilitate the “reprogramming” of cells to model disease and test therapeutic agents, IMP’s Johannes Zuber explained. “The best-case scenario,” he said, “is that with this insight, we hold a universal key in our hands that will allow us to model cells at will.”

    See the full article here .

    Please help promote STEM in your local schools.

    STEM Icon

    Stem Education Coalition

    Harvard University campus

    Harvard is the oldest institution of higher education in the United States, established in 1636 by vote of the Great and General Court of the Massachusetts Bay Colony. It was named after the College’s first benefactor, the young minister John Harvard of Charlestown, who upon his death in 1638 left his library and half his estate to the institution. A statue of John Harvard stands today in front of University Hall in Harvard Yard, and is perhaps the University’s best known landmark.

    Harvard University has 12 degree-granting Schools in addition to the Radcliffe Institute for Advanced Study. The University has grown from nine students with a single master to an enrollment of more than 20,000 degree candidates including undergraduate, graduate, and professional students. There are more than 360,000 living alumni in the U.S. and over 190 other countries.

  • richardmitnick 5:49 pm on December 7, 2015 Permalink | Reply
    Tags: , , Genomics   

    From Caltech: “Unlocking the Chemistry of Life” 

    Caltech Logo

    Jessica Stoller-Conrad

    No image credit

    In just the span of an average lifetime, science has made leaps and bounds in our understanding of the human genome and its role in heredity and health—from the first insights about DNA structure in the 1950s to the rapid, inexpensive sequencing technologies of today. However, the 20,000 genes of the human genome are more than DNA; they also encode proteins to carry out the countless functions that are key to our existence. And we know much less about how this collection of proteins supports the essential functions of life.

    In order to understand the role each of these proteins plays in human health—and what goes wrong when disease occurs—biologists need to figure out what these proteins are and how they function. Several decades ago, biologists realized that to answer these questions on the scale of the thousands of proteins in the human body, they would have to leave the comfort of their own discipline to get some help from a standard analytical-chemistry technique: mass spectrometry. Since 2006, Caltech’s Proteome Exploration Laboratory (PEL) has been building on this approach to bridge the gap between biology and chemistry, in the process unlocking important insights about how the human body works.

    Scientists can easily sequence an entire genome in just a day or two, but sequencing a proteome—all of the proteins encoded by a genome—is a much greater challenge says Ray Deshaies, protein biologist and founder of the PEL. “One challenge is the amount of protein. If you want to sequence a person’s DNA from a few of their cheek cells, you first amplify—or make copies of—the DNA so that you’ll have a lot of it to analyze. However, there is no such thing as protein amplification,” Deshaies says. “The number of protein molecules in the cells that you have is the number that you have, so you must use a very sensitive technique to identify those very few molecules.” The best means available for doing this today is called shotgun mass spectrometry, Deshaies says. In general, mass spectrometry allows researchers to identify the amount and types of molecules that are present in a biological sample by separating and analyzing the molecules as gas ions, based on mass and charge; shotgun mass spectrometry—a combination of several techniques—applies this separation process specifically to digested, broken-down proteins, allowing researchers to identify the types and amounts of proteins that are present in a heterogeneous mixture.

    The first step of shotgun mass spectroscopy entails digesting a mixture of proteins into smaller fragments called peptides. The peptides are then separated based on their physical properties, and then they are sprayed into a mass spectrometer and blasted apart via collisions with gas molecules such as helium or nitrogen—a process that creates a unique fragmentation pattern for each peptide. This pattern, or “fingerprint,” of each peptide’s fragmentation can then be searched on a database and used to identify the protein this peptide came from.

    download mp4 video here.

    “Up until this technique was invented, people had to take a mixture of proteins, run a current through a polyacrylamide gel to separate the proteins by size, stain the proteins, and then physically cut the stained bands out of the gel to have each individual protein species sequenced,” says Deshaies. “But mass spectrometry technology has gotten so good that we can now cast a broader net by sequencing everything, then use data analysis to figure out what specific information is of interest after the dust settles down.”

    Deshaies began using this shotgun mass spectrometry in the late 1990s, but because the technology was still very new, all of the protein analysis had to be done at the outside laboratories that were inventing the methodology.

    In 2001, after realizing the potential of this field-changing technology, he and colleague Barbara Wold, the Bren Professor of Molecular Biology, applied for and received a Department of Energy grant for their very own mass spectrometer. When the instrument arrived on campus, demand began to surge. “Barbara and I were first just doing experiments for our own labs, but then other people on campus wanted us to help them apply this technology to their research problems,” Deshaies says.

    So he and Wold began campaigning for a larger, ongoing center where anyone could begin using mass spectrometry resources for protein research. In 2006, Deshaies and then chair of the Division of Biology (now the Division of Biology and Biological Engineering) Elliot Meyerowitz petitioned the Gordon and Betty Moore Foundation to secure funding for a formal Proteome Exploration Laboratory, as part of the foundation’s commitment to Caltech.

    The influx of cash dramatically expanded the capabilities and resources that were available to the PEL, allowing it to purchase the best and fastest mass spectrometry instruments available. But just as importantly, it also meant that the PEL could expand its human resources, Deshaies adds. Mostly students were running the instruments in the Deshaies lab, he says, so when they graduated or moved on, gaps were left in expertise. Sonja Hess came to Caltech in 2007 to fill that gap as director of the PEL.

    Hess, who came from a proteomics lab at the National Institutes of Health, knew the challenges of running an interdisciplinary center such as the PEL. Although the field of proteomics holds great promise for understanding big questions in many fields, including biology and medicine, mass spectrometry is still a highly technical method involving analytical chemistry and data science—and it’s a technique that many biologists were never trained in. Conversely, many chemists and mass spectrometry technicians don’t necessarily understand how to apply the technique to biological processes.

    By encouraging dialogue between these two sides, Hess says that the PEL crosses that barrier, helping apply mass spectrometry techniques to diverse research questions from more than 20 laboratories on campus. Creating this interdisciplinary and resource-rich environment has enabled a wide breadth of discoveries, says Hess. One major user of the PEL, chemist David Tirrell, has used the center for many collaborations involving a technique he developed with former colleagues Erin Schuman and Daniela Dieterich called BONCAT (for “bioorthogonal noncanonical amino-acid tagging”). BONCAT uses synthetic molecules that are not normally found in proteins in nature and that carry particular chemical tags. When these artificial amino acids are incubated with certain cells, they are taken up by the cells and incorporated into all newly formed proteins in those cells.

    The tags then allow researchers to identify and pull out proteins from the cells, thus enabling them to wash away all of the other untagged proteins from other cells that aren’t of interest. When this method is combined with mass spectrometry techniques, it enables researchers to achieve specificity in their results and determine which proteins are produced in a particular subset of cells during a particular time. “In my own laboratory, we work at making sure the method is adapted appropriately to the specifics of a biological problem. But we rely on collaborations with other laboratories to help us understand what the demands on the method are and what kinds of questions would be interesting to people in those fields,” Tirrell says.

    For example, Tirrell collaborated with biologist Paul Sternberg and the PEL, using BONCAT and mass spectrometry to analyze specific proteins from a few cells within a whole organism, a feat that had never been accomplished before. Using the nematode C. elegans, Sternberg and his team applied the BONCAT technique to tag proteins in the 20 cells of the worm’s pharynx, and then used the PEL resources to analyze proteome-wide information from just those 20 cells. The results, including identification of proteins that were not previously associated with the pharynx, were published in PNAS in 2014.

    The team is now trying to target the experiment to a single pair of neurons that help the worm to sense and avoid harmful chemicals—a first step in learning which proteins are essential to producing this responsive behavior. But analyzing protein information from just two cells is a difficult experiment, says Tirrell. “The challenge comes in separating out the proteins that are made in those two cells from the proteins in the rest of the hundreds of cells in the worm’s body. You’re only interested in two cells, but to get the proteins from those two cells, you’re essentially trying to wash away everything else— about 500 times as much ‘junk’ protein as the protein that you’re really interested in,” he says. “We’re working on these separation methods now because the ultimate experiment would be to find a way to use BONCAT and mass spec to pull out proteomic information from a single cell in an animal.”

    This next step is a big one, but Tirrell says that an advantage of the PEL is that the laboratory’s staff can focus on optimizing the very technical mass spectrometry aspects of an experiment, while researchers using the PEL can focus more holistically on the question they’re trying to answer. This was also true for biologist Mitch Guttman, who asked the laboratory to help him develop a mass spectrometry–based technique for identifying the proteins that hitchhike on a class of RNA genes called lncRNAs. Long noncoding RNAs—or lncRNAs (pronounced “link RNAs”) for short—are abundant in the human genome, but scientists know very little about how they work or what they do.

    Although it’s known that protein-coding genes start out as DNA, which is transcribed into RNA, which is then translated into the gene product, a protein, lncRNAs are never translated into proteins. Instead, they’re thought to act as scaffolds, corralling important proteins and bringing them to where they’re needed in the cell. In a study published in April 2015 in Nature, Guttman used a specific example of a lncRNA, a gene called Xist, to learn more about these hitchhiking proteins.

    “The big challenge to doing this was technical; we’ve never had a way to identify what proteins are actually interacting with a lncRNA molecule. By working with the PEL, we were able to develop a method based on mass spectrometry to actually purify and identify this complex of proteins interacting with a lncRNA in living cells,”Guttman says. “Once we had that information, we could really start to ask ourselves questions about these proteins and how are they working.”

    Using this new method, called RNA antisense purification with mass spectrometry (RAP-MS), Guttman’s lab determined that 10 proteins associate with the lncRNA Xist, and that three of those 10 are essential to the gene’s function—inactivating the second X chromosome in women, a necessary process that, if interrupted, results in the death of female embryos early in development. Guttman’s findings marked the first time that anyone had uncovered the detailed mechanism of action for an lncRNA gene. For decades, other research groups had been trying to solve this problem; however, the collaborative development of RAP-MS in the PEL provided the missing piece.

    Even Deshaies, who began doing shotgun mass spectrometry experiments in his own laboratory, now exclusively uses the PEL’s resources and says that the laboratory has played an essential support role in his work. He studies the normal balance of proteins in a cell and how this balance changes during disease. In a 2013 study published in Cell, his laboratory focused on a dynamic network of protein complexes called SCF complexes, which go through cycles of assembly and dissociation in a cell, depending on when they are needed.

    Because there was no insight into how these complexes form and disassemble, Deshaies and his colleagues used the PEL to quantitatively monitor how this protein network’s dynamics were changing within cells. They determined that SCF complexes are normally very stable, but in the presence of a protein called Cand1 they become very dynamic and rapidly exchange subunits. Because some components of the SCF complex have been implicated in the development of human diseases such as cancers, work is now being done to see if Cand1 holds promise as a target for a cancer therapeutic.

    Although Deshaies says that the PEL resources have become invaluable to his work, he adds that what makes the laboratory unique is how it benefits the entire institute—a factor that he hopes will encourage further support for its mission. “The value of the PEL is not just about what it contributes to my lab or to Dave Tirrell’s lab or to anyone else’s,” he says. “It’s about the breadth of PEL’s impact—the 20 or so labs that are bringing in samples and using this operation every year to do important work, like solving the mechanism of X-chromosome inactivation in females.”

    See the full article here .

    Please help promote STEM in your local schools.

    STEM Icon

    Stem Education Coalition

    The California Institute of Technology (commonly referred to as Caltech) is a private research university located in Pasadena, California, United States. Caltech has six academic divisions with strong emphases on science and engineering. Its 124-acre (50 ha) primary campus is located approximately 11 mi (18 km) northeast of downtown Los Angeles. “The mission of the California Institute of Technology is to expand human knowledge and benefit society through research integrated with education. We investigate the most challenging, fundamental problems in science and technology in a singularly collegial, interdisciplinary atmosphere, while educating outstanding students to become creative members of society.”
    Caltech buildings

  • richardmitnick 11:12 am on October 19, 2015 Permalink | Reply
    Tags: 4D, , Genomics,   

    From U Washington: “Researchers win $12-million to study the human genome in 4-D” 

    U Washington

    University of Washington

    Michael McCarthy

    A computer-generated three-dimensional model of the yeast genome, which UW researchers described in a paper in the journal Nature in 2010.

    In order to fit within the nucleus of a cell, the human genome must bend, fold and coil into an unimaginably compact shape – and still function. This is no mean feat: The human genome is about 6.5 feet long, and the average cell nucleus is only 6 to10 micrometers (one-millionth of a meter) in diameter.

    How this happens and the genome’s three-dimensional shape within the nucleus are unknown. Nor is it known how the shape changes over time – the fourth dimension – as a cell develops, grows and goes about its specialized functions.

    “There’s a tendency to talk about the genome as a linear sequence and to forget about the fact that it’s folded,” said Dr. Jay Shendure, University of Washington associate professor of genome sciences and investigator with the Howard Hughes Medical Institute.

    William Noble, left, and Jay Shendure will co-direct the UW Center for Nuclear Organization and Function.

    “To understand how the different parts of the genome talk to each other to control gene expression, we need to understand how the different elements are arranged in relation to each other in three-dimensional space.”

    To puzzle out this information and its effect on cell function in health and disease, UW researchers will join peers at five other academic institutions to create the Nuclear Organization and Function Interdisciplinary Consortium.

    Underwriting the consortium is the National Institutes of Health’s 4D Nucleome program. The UW was awarded $12 million over five years to conduct research in its new Center for Nuclear Organization and Function. Shendure and William Stafford Noble, a professor of genome sciences and computer science, will co-lead.

    UW researchers will first develop tools to work out the three- and four-dimensional architecture of the nucleome and to create computer models that predict changes in the architecture as cells grow, divide and differentiate into different types.

    The results of this work will then be tested in mouse and human cell lines and, if confirmed, be used to understand how changes in nuclear architecture affect development of normal and abnormal heart muscle.

    All tools and data developed by the project will be shared with researchers in and outside of the 4D Nucleome network of researchers and with the public.

    Other investigators who will be working on the project include: Cole Trapnell, assistant professor of genome sciences; Christine Disteche, professor of pathology; Zhijun Duan, research assistant professor of medicine (hematology); and Dr. Charles Murry, professor of pathology and interim director of the UW Institute of Institute for Stem Cell and Regenerative Medicine.

    See the full article here .

    Please help promote STEM in your local schools.

    STEM Icon

    Stem Education Coalition

    The University of Washington is one of the world’s preeminent public universities. Our impact on individuals, on our region, and on the world is profound — whether we are launching young people into a boundless future or confronting the grand challenges of our time through undaunted research and scholarship. Ranked number 10 in the world in Shanghai Jiao Tong University rankings and educating more than 54,000 students annually, our students and faculty work together to turn ideas into impact and in the process transform lives and our world. For more about our impact on the world, every day.

    So what defines us — the students, faculty and community members at the University of Washington? Above all, it’s our belief in possibility and our unshakable optimism. It’s a connection to others, both near and far. It’s a hunger that pushes us to tackle challenges and pursue progress. It’s the conviction that together we can create a world of good. Join us on the journey.

  • richardmitnick 12:28 pm on October 16, 2015 Permalink | Reply
    Tags: , , Genomics, MIT Whitehead Institute   

    From Broad Institute: “Screen of human genome reveals set of genes essential for cellular viability” 

    Broad Institute

    Broad Institute

    October 15th, 2015

    Whitehead Institute MIT

    Whitehead Institute Communications

    Using two complementary analytical approaches, scientists at Whitehead Institute of MIT and Broad Institute of MIT and Harvard have for the first time identified the universe of genes in the human genome essential for the survival and proliferation of human cell lines or cultured human cells.

    Their findings and the materials they developed in conducting the research will not only serve as invaluable resources for the global research community but should also have application in the discovery of drug-targetable genetic vulnerabilities in a variety of human cancers.

    Scientists have long known the essential genes in microorganisms, such as yeast, whose genomes are smaller and more easily manipulated. Most common yeast strains, for example, are haploid, meaning that genes exist in single copies, making it fairly simple for researchers to eliminate or “knock out” individual genes and assess the impact on the organism. However, owing to their greater complexity, diploid mammalian genomes, including the human genome, have been resistant to such knockout techniques—including RNA interference, which is hampered by off-target effects and incomplete gene silencing.

    Diploid cells have two homologous copies of each chromosome.

    Now, however, through use of the breakthrough CRISPR (for clustered regularly interspersed short palindromic repeats) genome editing system , researchers in the labs of Whitehead Member David Sabatini and Broad Institute Director Eric Lander have been able to generate a genome-wide library of single-guide RNAs (sgRNAs) to screen for and identify the genes required for cellular viability.

    Diagram of the CRISPR prokaryotic viral defense mechanism

    The sgRNA library targeted slightly more than 18,000 genes, of which approximately 10% proved to be essential. These findings are reported online this week in the journal Science.

    “This is the first report of human cell-essential genes,” says Tim Wang, a graduate student in the Sabatini and Lander labs and first author of the Science paper. “This answers a question people have been asking for quite a long time.”

    As might have been expected, Wang says that many of the essential genes are involved in fundamental biological processes, including DNA replication, RNA transcription, and translation of messenger RNA. But, as Wang also notes, approximately 300 of these essential genes are of a class not previously characterized, are largely located in the cellular compartment known as the nucleolus, and are associated with RNA processing. Wang says the precise function of these genes is the subject of future investigation.

    Nucleus. The nucleolus is contained within the cell nucleus.

    To validate the results of the CRISPR screens, the group took the added step of screening for essential genes in a unique line of haploid human cells. Using an approach known as gene-trap mutagenesis (a method pioneered in part by former Whitehead Fellow Thijn Brummelkamp) in the haploid cells and comparing it to the CRISPR results, the researchers found significant, consistent overlap in the gene sets found to be essential. In a final step, the group tested their approaches in cell lines derived from two cancers, chronic myelogenous leukemia (CML) and Burkitt’s lymphoma, both of which have been extensively studied. The novel method not only identified the essentiality of the known genes—in the case of CML, it hit on the BCR and ABL1 genes, whose translocation is the target of the successful drug Gleevec—but also highlighted additional genes that may be therapeutic targets in these cancers.

    “The ability to zero in on the essential genes in the highly complex human system will give us new insight into how diseases, such as cancer, continue to resist efforts to defeat them,” Lander says.

    Wang, Lander, and Sabatini are enthusiastic about the potential applications of their work, as it should accelerate the identification of cancer drug targets while enhancing our understanding of the evolution of drug resistance, a major contributor to therapeutic failure. The researchers attribute this vast potential to the rigor that CRISPR brings to human genetics.

    “This is really the first time we can reliably, accurately, and systematically study genetics in mammalian cells,” Sabatini says. “It’s remarkable how well it’s working.”

    This work was supported by the National Institutes of Health (grant CA103866), the National Human Genome Research Institute (grant 2U54HG003067-10), the National Science Foundation, the MIT Whitaker Health Sciences Fund, and the Howard Hughes Medical Institute.

    About Whitehead Institute

    The Whitehead Institute is a world-renowned non-profit research institution dedicated to improving human health through basic biomedical research. Wholly independent in its governance, finances, and research programs, Whitehead shares a close affiliation with Massachusetts Institute of Technology through its faculty, who hold joint MIT appointments. For more information about the Whitehead Institute, go to wi.mit.edu.

    See the full article here .

    Please help promote STEM in your local schools.

    STEM Icon

    Stem Education Coalition

    Broad Institute Campus

    The Eli and Edythe L. Broad Institute of Harvard and MIT is founded on two core beliefs:

    This generation has a historic opportunity and responsibility to transform medicine by using systematic approaches in the biological sciences to dramatically accelerate the understanding and treatment of disease.
    To fulfill this mission, we need new kinds of research institutions, with a deeply collaborative spirit across disciplines and organizations, and having the capacity to tackle ambitious challenges.

    The Broad Institute is essentially an “experiment” in a new way of doing science, empowering this generation of researchers to:

    Act nimbly. Encouraging creativity often means moving quickly, and taking risks on new approaches and structures that often defy conventional wisdom.
    Work boldly. Meeting the biomedical challenges of this generation requires the capacity to mount projects at any scale — from a single individual to teams of hundreds of scientists.
    Share openly. Seizing scientific opportunities requires creating methods, tools and massive data sets — and making them available to the entire scientific community to rapidly accelerate biomedical advancement.
    Reach globally. Biomedicine should address the medical challenges of the entire world, not just advanced economies, and include scientists in developing countries as equal partners whose knowledge and experience are critical to driving progress.

    Harvard University

    MIT Widget

  • richardmitnick 10:16 am on October 15, 2015 Permalink | Reply
    Tags: , Genomics   

    From The Uncovering Genome Mysteries project at WCG: “Analyzing a wealth of data about the natural world” 

    New WCG Logo

    14 Oct 2015
    Wim Degrave, Ph.D.
    Laboratório de Genômica Funcional e Bioinformática Instituto Oswaldo Cruz – Fiocruz

    The Uncovering Genome Mysteries project has already amassed data on over 200 million proteins, with the goal of understanding the common features of life everywhere on earth. There are tens of millions of calculations still to run, but the team is also making preparations for analysis and eventual publication of the data.


    For almost a year now, Uncovering Genome Mysteries has been comparing protein sequences derived from the genomes of nearly all living organisms analyzed to date. Thanks to the volunteers that contribute computer time to World Community Grid, more than 34 million results have been returned with data on functional identification and protein similarities. Along with our collaborators in Australia, we’ve paid particular attention to microorganisms from different ecosystems, with special emphasis on marine organisms. More than 200 million proteins have been compared thus far, during the equivalent of 15,000 years of computation. The resulting data are sent to our computer servers at the Fiocruz Foundation in Rio de Janeiro, Brazil and now also to the University of New South Wales, Sydney, Australia. A last set of around 20 million protein sequences, determined over the last year, is now being added to the dataset and will be run on World Community Grid in the coming months.

    However, the task of functional mapping and comparison between proteins from all these organisms does not end there. Our team of scientists is, in the meantime, investing more efforts to optimize the algorithms for further analysis and representation of the data generated by World Community Grid volunteers, and preparing for the database systems that will make the results available to the scientific community. Once our data is public, we expect that the scientific community’s understanding of the intricate network of life will gain a completely new perspective, and that results will also contribute to the development of many new applications in health, agriculture and life sciences in general.

    This project is a cooperation between World Community Grid, the laboratory of Dr. Torsten Thomas and his team from the School of Biotechnology and Biomolecular Sciences & Centre for Marine Bio-Innovation at the University of New South Wales, Sydney, Australia, and our team at the Laboratory for Functional Genomics and Bioinformatics, at the Oswaldo Cruz Foundation – Fiocruz, in Brazil.

    See the full article here.

    Please help promote STEM in your local schools.
    STEM Icon

    Stem Education Coalition

    World Community Grid (WCG) brings people together from across the globe to create the largest non-profit computing grid benefiting humanity. It does this by pooling surplus computer processing power. We believe that innovation combined with visionary scientific research and large-scale volunteerism can help make the planet smarter. Our success depends on like-minded individuals – like you.”

    WCG projects run on BOINC software from UC Berkeley.

    BOINC is a leader in the field(s) of Distributed Computing, Grid Computing and Citizen Cyberscience.BOINC is more properly the Berkeley Open Infrastructure for Network Computing.

    BOINC WallPaper


    “Download and install secure, free software that captures your computer’s spare power when it is on, but idle. You will then be a World Community Grid volunteer. It’s that simple!” You can download the software at either WCG or BOINC.

    Please visit the project pages-
    Outsmart Ebola together

    Outsmart Ebola Together

    Mapping Cancer Markers

    Uncovering Genome Mysteries
    Uncovering Genome Mysteries

    Say No to Schistosoma

    GO Fight Against Malaria

    Drug Search for Leishmaniasis

    Computing for Clean Water

    The Clean Energy Project

    Discovering Dengue Drugs – Together

    Help Cure Muscular Dystrophy

    Help Fight Childhood Cancer

    Help Conquer Cancer

    Human Proteome Folding


    World Community Grid is a social initiative of IBM Corporation
    IBM Corporation

    IBM – Smarter Planet

  • richardmitnick 10:52 am on August 21, 2015 Permalink | Reply
    Tags: , Genomics,   

    From UCSC: Seagate gift supports UC Santa Cruz research on genomic data storage 

    UC Santa Cruz

    UC Santa Cruz

    August 20, 2015
    Tim Stephens

    Researchers in the Baskin School of Engineering at UC Santa Cruz are working with industry partner Seagate Technologies on new ways to structure and store massive amounts of genomic data. Seagate has donated data storage devices with a total capacity of 2.5 petabytes to support this effort.

    “This gift provides the basis for a major research program on storage of genomic data,” said Andy Hospodor, executive director of the Storage Systems Research Center (SSRC) at UC Santa Cruz.

    “Seagate is pleased to be a part of this important research effort. The storage requirements for genomics are staggering and the potential for medical breakthroughs even larger,” said Mark Re, senior vice president and CTO at Seagate.

    The gift, valued at $250,000, includes 1 petabyte of Seagate’s new Kinetic disk drives for object-based storage, plus an additional 1.5 petabytes of traditional Seagate SATA disk drives for use in existing clusters within the UC Santa Cruz Genomics Institute.

    Ethan Miller, professor of computer science, directs the Center for Research in Storage Systems (CRSS). (Photo by Elena Zhukova)

    Large-scale test bed

    “This gives us a large-scale test bed that we can use to explore the organization of data for large-scale disk-based storage systems. We need to develop better ways to store and organize the vast quantities of data we’re generating,” said Ethan Miller, professor of computer science and director of the Center for Research in Storage Systems (CRSS) at UCSC.

    Miller and other storage systems researchers at UC Santa Cruz work closely with industry partners such as Seagate, and several of the center’s alumni and graduate students have been working at Seagate on the company’s latest disk technology. The Seagate storage donation will support research on new ways to structure and store genomic data using object stores and newly proposed open-source standards (APIs) for genomic data that are being developed by the Global Alliance for Genomics and Health.

    “Genomic data storage is one of several areas of emerging interest where we’ll be looking at using Seagate’s new intelligent disks to build large-scale storage systems,” Miller said.

    Genomics Institute

    The donation also adds over a petabyte of storage capacity to the genomics data storage cluster maintained by the UC Santa Cruz Genomics Institute at the San Diego Supercomputing Center. For Benedict Paten, a research scientist at the Genomics Institute, it’s all about speeding up the processing of genomic data.

    “We in genomics know that we have a big data problem,” Paten said. “We need to be able to compute on much larger volumes of data than we have before. The amount of genomic data is growing exponentially, and we haven’t been keeping up.”

    Part of the solution, he said, is distributed processing of large data sets in which the processing is done where the data are stored, instead of downloading the data over a network for processing. “Now we can put a lot of disks on the compute nodes for efficient distributed computation over large amounts of data. This donation is really important for our big data genomics efforts at UC Santa Cruz,” Paten said.

    See the full article here.

    Please help promote STEM in your local schools.

    STEM Icon

    Stem Education Coalition
    The University of California, Santa Cruz, opened in 1965 and grew, one college at a time, to its current (2008-09) enrollment of more than 16,000 students. Undergraduates pursue more than 60 majors supervised by divisional deans of humanities, physical & biological sciences, social sciences, and arts. Graduate students work toward graduate certificates, master’s degrees, or doctoral degrees in more than 30 academic fields under the supervision of the divisional and graduate deans. The dean of the Jack Baskin School of Engineering oversees the campus’s undergraduate and graduate engineering programs.

  • richardmitnick 11:02 am on July 23, 2015 Permalink | Reply
    Tags: , Genomics,   

    From UCSC: “Keck Foundation awards UC Santa Cruz $2 million for human genome variation project” 

    UC Santa Cruz

    UC Santa Cruz

    July 22, 2015
    Tim Stephens

    The UC Santa Cruz Genomics Institute has received a $2 million grant from the W. M. Keck Foundation for ongoing research to develop a comprehensive map of human genetic variation. The Human Genome Variation Map will be a valuable new resource for medical researchers, as well as for basic research on human evolution and diversity.

    Human Genome Variation Map

    The Keck grant provides funding over two years for UC Santa Cruz researchers to create a full-scale map, building on the results of a one-year pilot project funded by the Simons Foundation.

    “We’ve been experimenting with pilot regions of the genome and evaluating a variety of methods. The next steps will be to take it from a prototype to a full-scale genome reference that we can release to the community,” said Benedict Paten, a research scientist at the Genomics Institute and co-principal investigator of the project.

    Benedict Paten (Photo by Summer Stiegman)

    The Human Genome Variation Map is needed to overcome the limitations of using a single reference sequence for the human genome. Currently, new data from sequencing human genomes is analyzed by mapping the new sequences to one reference set of 24 human chromosomes to identify variants. But this approach leads to biases and mapping ambiguities, and some variants simply cannot be described with respect to the reference genome, according to David Haussler, distinguished professor of biomolecular engineering and scientific director of the Genomics Institute at UC Santa Cruz.

    Global Alliance

    Haussler and Paten are coordinating their work on the new map with the Global Alliance for Genomics and Health (GA4GH), which involves more than 300 collaborating institutions that have agreed to work together to enable secure sharing of genomic and clinical data. The overall vision of the global alliance includes a genomics platform based on something akin to the planned Human Genome Variation Map, along with open-source software tools to enable researchers to mine the data for new scientific and medical breakthroughs. In the long run, the map will be used to identify genomic variants encountered in precision medical care as well, Haussler said.

    The UCSC team has been collaborating with leading genomics researchers at other institutions to develop the map, which Paten began working on in 2014 as co-chair of the GA4GH Reference Variation Task Team. The new Human Genome Variation Map will replace the current assortment of isolated, incompatible databases of human genetic variation with a single, fundamental representation formalized as a very large mathematical graph. The clean mathematical formulation is a major strength of this new approach, Paten said.

    The primary reference genome is a linear sequence of DNA bases (represented by the letters A, C, T, and G). To build the Human Genome Variation Map, each new genome will be merged into the reference genome at the points where it matches the primary sequence, with variations appearing as additional alternate paths in the map.

    Mathematical structure

    This mathematical graph-based structure will augment the existing human reference genome with all common human variations, providing a means to name, identify, and analyze variations precisely and reproducibly. “The original human reference genome project gave us a detailed picture of one human genome. This map will give us a detailed picture of the world’s variety of human genomes,” Paten said.

    In the spirit of the original human genome project, the Human Genome Variation Map will be publicly and freely available to all. Haussler’s team at UC Santa Cruz made the first human genome sequence publicly available on the Internet 15 years ago. This new project has many parallels with that earlier work, in which UCSC genomics researchers assembled and posted the first human genome sequence and went on to create the widely used UCSC Genome Browser.

    “This is an infrastructure project for genomics that everyone agrees is important,” Paten said. “It is ambitious, and it requires a fundamental shift from thinking of the reference as one sequence to thinking of it as this structure that incorporates all variation. But now is the time to do it. We need to build a model that works, and make it easy enough to use to get community acceptance.”

    The UC Santa Cruz Genomics Institute is a fundraising priority of the $300-million Campaign for UC Santa Cruz.

    W. M. Keck Foundation

    Based in Los Angeles, the W. M. Keck Foundation was established in 1954 by the late W. M. Keck, founder of the Superior Oil Company. The Foundation’s grant making is focused primarily on pioneering efforts in the areas of medical, science and engineering research. The Foundation also maintains an undergraduate education program that promotes distinctive learning and research experiences for students in the sciences and in the liberal arts, and a Southern California Grant Program that provides support for the Los Angeles community, with a special emphasis on children and youth from low-income families, special needs populations and safety-net services. For more information, please visit www. wmkeck.org.

    See the full article here.

    Please help promote STEM in your local schools.

    STEM Icon

    Stem Education Coalition
    The University of California, Santa Cruz, opened in 1965 and grew, one college at a time, to its current (2008-09) enrollment of more than 16,000 students. Undergraduates pursue more than 60 majors supervised by divisional deans of humanities, physical & biological sciences, social sciences, and arts. Graduate students work toward graduate certificates, master’s degrees, or doctoral degrees in more than 30 academic fields under the supervision of the divisional and graduate deans. The dean of the Jack Baskin School of Engineering oversees the campus’s undergraduate and graduate engineering programs.

  • richardmitnick 7:50 am on May 2, 2015 Permalink | Reply
    Tags: , Genomics,   

    From Princeton: “Digging for Meaning in the Big Data of Human Biology” 

    Princeton University
    Princeton University

    April 28, 2015
    No Writer Credit


    Since the Human Genome Project drafted the human body’s genetic blueprint more than a decade ago, researchers around the world have generated a deluge of information related to genes and the role they play in diseases like hypertension, diabetes, and various cancers.

    Although thousands of studies have made discoveries that promise a healthier future, crucial questions remain. An especially vexing challenge has been to identify the function of genes in specific cells, tissues, and organs. Because tissues cannot be studied by direct experimentation (in living people), and many disease-relevant cell types cannot be isolated for analysis, the data have emerged in bits and pieces through studies that produced mountains of disparate signals.

    A multi-year effort by researchers from Princeton and other universities and medical schools has taken a big step toward extracting knowledge from these big data collections and opening the door to new understanding of human illnesses. Their paper, published online by the prestigious biology journal Nature Genetics, demonstrates how computer science and statistical methods can comb broad expanses of diverse data to identify how genetic circuits function and change in different tissues relevant to disease.

    Led by Olga Troyanskaya, professor in the Department of Computer Science and the Lewis-Sigler Institute of Integrative Genomics and deputy director for genomics at the Simons Center for Data Analysis in New York, the team used integrative computational analysis to dig out interconnections and relationships buried in the data pile. The study collected and integrated about 38,000 genome-wide experiments from an estimated 14,000 publications. Their findings produced molecular-level functional maps for 144 different human tissues and cell types, including many that are difficult or impossible to uncover experimentally.

    “A key challenge in human biology is that genetic circuits in human tissues and cell types are very difficult to study experimentally,” Troyanskaya said. “For example, the podocyte cells in the kidneys, which are the cells that perform the filtering that the kidneys are responsible for, cannot be isolated and studied experimentally. Yet we must understand how proteins interact in these cells if we want to understand and treat chronic kidney disease. Our approach mines big data collections to build a map of how genetic circuits function in the podocyte cells, as well as in many other disease-relevant tissues and cell types.”

    These networks allow biomedical researchers to understand the function and interactions of genes in specific cellular contexts and can illuminate the molecular basis of many complex human diseases. The researchers developed an algorithm, which they call a network-guided association study, or NetWAS, that combines these tissue-specific functional maps with standard genome-wide association studies (GWAS) in order to identify genes that are causal drivers of human disease. Because the technique is completely data-driven, NetWAS avoids biases toward well-studied genes and diseases — enabling discovery of completely new disease-associated genes, processes, and pathways.

    To put NetWAS and the tissue-specific networks in the hands of biomedical researchers around the world, the team created an interactive server called GIANT (for Genome-scale Integrated Analysis of Networks in Tissues). GIANT allows users to explore these networks, compare how genetic circuits change across tissues, and analyze data from genetic studies to find genes that cause disease.

    Aaron K. Wong, a data scientist at the Simons Center for Data Analysis and formerly a graduate student in the computer science department at Princeton, played the lead role in creating GIANT. “Our goal was to develop a resource that was accessible to biomedical researchers,” he said. “For example, with GIANT, researchers studying Parkinson’s disease can search the substantia nigra network, which represents the brain region affected by Parkinson’s, to identify new genes and pathways involved in the disease.” Wong is one of three co-first authors of the paper.

    The paper’s other two co-first authors are Arjun Krishnan, a postdoctoral fellow at the Lewis-Sigler Institute; and Casey Greene, an assistant professor of genetics at Dartmouth College, who was a postdoctoral fellow at Lewis-Sigler from 2009 to 2012. The team also included Ran Zhang, a graduate student in Princeton’s Department of Molecular Biology, and Kara Dolinski, assistant director of the Lewis-Sigler Institute.

    Looking to the future, Troyanskaya sees practical therapeutic uses for the group’s findings about the interrelatedness of genetic actions. “Biomedical researchers can use these networks and the pathways that they uncover to understand drug action and side effects, and to repurpose drugs,” she said. “They can also be useful for understanding how various therapies work and how to develop new ones.”

    Other contributors to the study were Emanuela Ricciotti, Garret A. FitzGerald, and Tilo Grosser of the Department of Pharmacology and the Institute for Translational Medicine and Therapeutics at the Perelman School of Medicine, University of Pennsylvania; Rene A. Zelaya, of Dartmouth; Daniel S. Himmelstein, of the University of California, San Francisco; Boris M. Hartmann, Elena Zaslavsky, and Stuart C. Sealfon, of the Department of Neurology at the Icahn School of Medicine at Mount Sinai, in New York; and Daniel I. Chasman, of Brigham and Women’s Hospital and Harvard Medical School in Boston.

    The Simons Center for Data Analysis was formed in 2013 by the Simons Foundation, a private organization dedicated to advancing research in mathematics and the basic sciences.

    See the full article here.

    Please help promote STEM in your local schools.

    STEM Icon

    Stem Education Coalition
    Princeton University Campus

    About Princeton: Overview

    Princeton University is a vibrant community of scholarship and learning that stands in the nation’s service and in the service of all nations. Chartered in 1746, Princeton is the fourth-oldest college in the United States. Princeton is an independent, coeducational, nondenominational institution that provides undergraduate and graduate instruction in the humanities, social sciences, natural sciences and engineering.

    As a world-renowned research university, Princeton seeks to achieve the highest levels of distinction in the discovery and transmission of knowledge and understanding. At the same time, Princeton is distinctive among research universities in its commitment to undergraduate teaching.

    Today, more than 1,100 faculty members instruct approximately 5,200 undergraduate students and 2,600 graduate students. The University’s generous financial aid program ensures that talented students from all economic backgrounds can afford a Princeton education.

    Princeton Shield

  • richardmitnick 4:04 pm on April 16, 2015 Permalink | Reply
    Tags: , , , Genomics,   

    From Quanta: “How Structure Arose in the Primordial Soup” 

    Quanta Magazine
    Quanta Magazine

    Life’s first epoch saw incredible advances — cells, metabolism and DNA, to name a few. Researchers are resurrecting ancient proteins to illuminate the biological dark ages.

    April 16, 2015
    Emily Singer

    Olena Shmahalo/Quanta Magazine

    About 4 billion years ago, molecules began to make copies of themselves, an event that marked the beginning of life on Earth. A few hundred million years later, primitive organisms began to split into the different branches that make up the tree of life. In between those two seminal events, some of the greatest innovations in existence emerged: the cell, the genetic code and an energy system to fuel it all. All three of these are essential to life as we know it, yet scientists know disappointingly little about how any of these remarkable biological innovations came about.

    “It’s very hard to infer even the relative ordering of evolutionary events before the last common ancestor,” said Greg Fournier, a geobiologist at the Massachusetts Institute of Technology. Cells may have appeared before energy metabolism, or perhaps it was the other way around. Without fossils or DNA preserved from organisms living during this period, scientists have had little data to work from.

    Fournier is leading an attempt to reconstruct the history of life in those evolutionary dark ages — the hundreds of millions of years between the time when life first emerged and when it split into what would become the endless tangle of existence.

    He is using genomic data from living organisms to infer the DNA sequence of ancient genes as part of a growing field known as paleogenomics. In research published online in March in the Journal of Molecular Evolution, Fournier showed that the last chemical letter added to the code was a molecule called tryptophan — an amino acid most famous for its presence in turkey dinners. The work supports the idea that the genetic code evolved gradually.

    Using similar methods, he hopes to decipher the temporal order of more of the code — determining when each letter was added to the genetic alphabet — and to date key events in the origins of life, such as the emergence of cells.

    Dark Origins

    Life emerged so long ago that even the rock formations covering the planet at that time have been destroyed — and with them, most chemical and geological clues to early evolution. “There’s a huge chasm between the origins of life and the last common ancestor,” said Eric Gaucher, a biologist at the Georgia Institute of Technology in Atlanta.

    The stretch of time between the origins of life and the last universal common ancestor saw a series of remarkable innovations — the origins of cells, metabolism and the genetic code. But scientists know little about when they happened or the order in which they occurred. Olena Shmahalo/Quanta Magazine

    Scientists do know that at some point in that time span, living creatures began using a genetic code, a blueprint for making complex proteins. It is those proteins that carry out the vital functions of the cell. (The structure of DNA and RNA also enables genetic information to be replicated and passed on from generation to generation, but that’s a separate process from the creation of proteins.) The components of the code and the molecular machinery that assembles them “are some of the oldest and most universal aspects of cells, and biologists are very interested in understanding the mechanisms by which they evolved,” said Paul Higgs, a biophysicist at McMaster University in Hamilton, Ontario.

    How the code came into being presents a chicken-and-egg problem. The key players in the code — DNA, RNA, amino acids, and proteins — are chemically complicated structures that work together to make proteins. But in modern cells, proteins are used to make the components of the code. So how did a highly structured code emerge?

    Most researchers believe that the code began simply with basic proteins made from a limited alphabet of amino acids. It then grew in complexity over time, as these proteins learned to make more sophisticated molecules. Eventually, it developed into a code capable of creating all the diversity we see today. “It’s long been hypothesized that life’s ‘standard alphabet’ of 20 amino acids evolved from a simpler, earlier alphabet, much as the English alphabet has accumulated extra letters over its history,” said Stephen Freeland, a biologist at the University of Maryland, Baltimore County.

    The earliest amino acid letters in the code were likely the simplest in structure, those that can be made from purely chemical means, without the assistance of a protein helper. (For example, the amino acids glycine, alanine and glutamic acid have been found on meteorites, suggesting they can form spontaneously in a variety of environments.) These are like the letters A, E and S — primordial units that served as the foundation for what came later.

    Tryptophan, in comparison, has a complex structure and is comparatively rare in the protein code, like a Y or Z, leading scientists to theorize that it was one of the latest additions to the code.

    That chemical evidence is compelling, but circumstantial. Enter Fournier. He suspected that by extending his work on paleogenomics, he would be able to prove tryptophan’s status as the last letter added to the code.

    The Last Letter

    Scientists have been reconstructing ancient proteins for more than a decade, primarily to figure out how ancient proteins differed from modern ones — what they looked like and how they functioned. But these efforts have focused on the period of evolution after the last universal common ancestor (or LUCA, as researchers call it). Fournier’s work delves further back than any other previous efforts. To do so, he had to move beyond the standard application of comparative genomics, which analyzes the differences between branches on the tree of life. “By definition, anything pre-LUCA lies beyond the deepest split in the tree,” he said.

    Fournier started with two related proteins, TrpRS (tryptophanyl tRNA synthetase) and TyrRS (tyrosyl tRNA synthetase), which help decode RNA letters into the amino acids tryptophan and tyrosine. TrpRS and TyrRS are more closely related to each other than to any other protein, indicating that they evolved from the same ancestor protein. Sometime before LUCA, that parent protein mutated slightly to produce these two new proteins with distinct functions. Fournier used computational techniques to decipher what that ancestral protein must look like.

    Greg Fournier, a geobiologist at MIT, is searching for the origins of the genetic code. Helen Hill

    He found that the ancestral protein has all the amino acids but tryptophan, suggesting that its addition was the finishing touch to the genetic code. “It shows convincingly that tryptophan was the last amino acid added, as has been speculated before but not really nailed as has been done here,” said Nigel Goldenfeld, a physicist at the University of Illinois, Urbana-Champaign, who was not involved in the study.

    Fournier now plans to use tryptophan as a marker to date other major pre-LUCA events such as the evolution of metabolism, cells and cell division, and the mechanisms of inheritance. These three processes form a sort of biological triumvirate that laid the foundation for life as we know it today. But we know little about how they came into existence. “If we understand the order of those basic steps, it creates an arrow pointing to possible scenarios for the origins of life,” Fournier said.

    For example, if the ancestral proteins involved in metabolism lack tryptophan, some form of metabolism probably evolved early. If proteins that direct cell division are studded with tryptophan, it suggests those proteins evolved comparatively late.

    Different models for the origins of life make different predictions for which of these three processes came first. Fournier hopes his approach will provide a way to rule out some of these models. However, he cautions that it won’t definitively sort out the timing of these events.

    Fournier plans to use the same techniques to figure out the order in which other amino acids were added to the code. “It really reinforces the idea that evolution of the code itself was a progressive process,” said Paul Schimmel, a professor of molecular and cell biology at the Scripps Research Institute, who was not involved in the study. “It speaks to the refinement and subtlety that nature was using to perfect these proteins and the diversity it needed to form this vast tree of life.”

    See the full article here.

    Please help promote STEM in your local schools.

    STEM Icon

    Stem Education Coalition

    Formerly known as Simons Science News, Quanta Magazine is an editorially independent online publication launched by the Simons Foundation to enhance public understanding of science. Why Quanta? Albert Einstein called photons “quanta of light.” Our goal is to “illuminate science.” At Quanta Magazine, scientific accuracy is every bit as important as telling a good story. All of our articles are meticulously researched, reported, edited, copy-edited and fact-checked.

Compose new post
Next post/Next comment
Previous post/Previous comment
Show/Hide comments
Go to top
Go to login
Show/Hide help
shift + esc

Get every new post delivered to your Inbox.

Join 581 other followers

%d bloggers like this: