Tagged: Supercomputing Toggle Comment Threads | Keyboard Shortcuts

  • richardmitnick 2:51 pm on August 27, 2015 Permalink | Reply
    Tags: , , , Supercomputing   

    From NERSC: “NERSC, Cray Move Forward With Next-Generation Scientific Computing” 

    NERSC Logo

    April 22, 2015
    Jon Bashor, jbashor@lbl.gov, 510-486-5849

    The Cori Phase 1 system will be the first supercomputer installed in the new Computational Research and Theory Facility now in the final stages of construction at Lawrence Berkeley National Laboratory.

    The U.S. Department of Energy’s (DOE) National Energy Research Scientific Computing (NERSC) Center and Cray Inc. announced today that they have finalized a new contract for a Cray XC40 supercomputer that will be the first NERSC system installed in the newly built Computational Research and Theory facility at Lawrence Berkeley National Laboratory.


    This supercomputer will be used as Phase 1 of NERSC’s next-generation system named “Cori” in honor of bio-chemist and Nobel Laureate Gerty Cori. Expected to be delivered this summer, the Cray XC40 supercomputer will feature the Intel Haswell processor. The second phase, the previously announced Cori system, will be delivered in mid-2016 and will feature the next-generation Intel Xeon Phi™ processor “Knights Landing,” a self-hosted, manycore processor with on-package high bandwidth memory that offers more than 3 teraflop/s of double-precision peak performance per single socket node.

    NERSC serves as the primary high performance computing facility for the Department of Energy’s Office of Science, supporting some 6,000 scientists annually on more than 700 projects. This latest contract represents the Office of Science’s ongoing commitment to supporting computing to address challenges such as developing new energy sources, improving energy efficiency, understanding climate change and analyzing massive data sets from observations and experimental facilities around the world.

    “This is an exciting year for NERSC and for NERSC users,” said Sudip Dosanjh, director of NERSC. “We are unveiling a brand new, state-of-the-art computing center and our next-generation supercomputer, designed to help our users begin the transition to exascale computing. Cori will allow our users to take their science to a level beyond what our current systems can do.”

    “NERSC and Cray share a common vision around the convergence of supercomputing and big data, and Cori will embody that overarching technical direction with a number of unique, new technologies,” said Peter Ungaro, president and CEO of Cray. “We are honored that the first supercomputer in NERSC’s new center will be our flagship Cray XC40 system, and we are also proud to be continuing and expanding our longstanding partnership with NERSC and the U.S. Department of Energy as we chart our course to exascale computing.”
    Support for Data-Intensive Science

    A key goal of the Cori Phase 1 system is to support the increasingly data-intensive computing needs of NERSC users. Toward this end, Phase 1 of Cori will feature more than 1,400 Intel Haswell compute nodes, each with 128 gigabytes of memory per node. The system will provide about the same sustained application performance as NERSC’s Hopper system, which will be retired later this year. The Cori interconnect will have a dragonfly topology based on the Aries interconnect, identical to NERSC’s Edison system.

    However, Cori Phase 1 will have twice as much memory per node than NERSC’s current Edison supercomputer (a Cray XC30 system) and will include a number of advanced features designed to accelerate data-intensive applications:

    Large number of login/interactive nodes to support applications with advanced workflows
    Immediate access queues for jobs requiring real-time data ingestion or analysis
    High-throughput and serial queues can handle a large number of jobs for screening, uncertainty qualification, genomic data processing, image processing and similar parallel analysis
    Network connectivity that allows compute nodes to interact with external databases and workflow controllers
    The first half of an approximately 1.5 terabytes/sec NVRAM-based Burst Buffer for high bandwidth low-latency I/O
    A Cray Lustre-based file system with over 28 petabytes of capacity and 700 gigabytes/second I/O bandwidth

    In addition, NERSC is collaborating with Cray on two ongoing R&D efforts to maximize Cori’s data potential by enabling higher bandwidth transfers in and out of the compute node, high-transaction rate data base access, and Linux container virtualization functionality on Cray compute nodes to allow custom software stack deployment.

    “The goal is to give users as familiar a system as possible, while also allowing them the flexibility to explore new workflows and paths to computation,” said Jay Srinivasan, the Computational Systems Group lead. “The Phase 1 system is designed to enable users to start running their workload on Cori immediately, while giving data-intensive workloads from other NERSC systems the ability to run on a Cray platform.”
    Burst Buffer Enhances I/O

    A key element of Cori Phase 1 is Cray’s new DataWarp technology, which accelerates application I/O and addresses the growing performance gap between compute resources and disk-based storage. This capability, often referred to as a “Burst Buffer,” is a layer of NVRAM designed to move data more quickly between processor and disk and allow users to make the most efficient use of the system. Cori Phase 1 will feature approximately 750 terabytes of capacity and approximately 750 gigabytes/second of I/O bandwidth. NERSC, Sandia and Los Alamos national laboratories and Cray are collaborating to define use cases and test early software that will provide the following capabilities:

    Improve application reliability (checkpoint-restart)
    Accelerate application I/O performance for small blocksize I/O and analysis files
    Enhance quality of service by providing dedicated I/O acceleration resources
    Provide fast temporary storage for out-of-core applications
    Serve as a staging area for jobs requiring large input files or persistent fast storage between coupled simulations
    Support post-processing analysis of large simulation data as well asin situandin transitvisualization and analysis using the Burst Buffer nodes

    Combining Extreme Scale Data Analysis and HPC on the Road to Exascale

    As previously announced, Phase 2 of Cori will be delivered in mid-2016 and will be combined with Phase 1 on the same high speed network, providing a unique resource. When fully deployed, Cori will contain more than 9,300 Knights Landing compute nodes and more than 1,900 Haswell nodes, along with the file system and a 2X increase in the applications I/O acceleration.

    “In the scientific computing community, the line between large scale data analysis and simulation and modeling is really very blurred,” said Katie Antypas, head of NERSC’s Scientific Computing and Data Services Department. “The combined Cori system is the first system to be specifically designed to handle the full spectrum of computational needs of DOE researchers, as well as emerging needs in which data- and compute-intensive work are part of a single workflow. For example, a scientist will be able to run a simulation on the highly parallel Knights Landing nodes while simultaneously performing data analysis using the Burst Buffer on the Haswell nodes. This is a model that we expect to be important on exascale-era machines.”

    NERSC is funded by the Office of Advanced Scientific Computing Research in the DOE’s Office of Science.

    See the full article here.

    Please help promote STEM in your local schools.

    STEM Icon

    Stem Education Coalition

    The National Energy Research Scientific Computing Center (NERSC) is the primary scientific computing facility for the Office of Science in the U.S. Department of Energy. As one of the largest facilities in the world devoted to providing computational resources and expertise for basic scientific research, NERSC is a world leader in accelerating scientific discovery through computation. NERSC is a division of the Lawrence Berkeley National Laboratory, located in Berkeley, California. NERSC itself is located at the UC Oakland Scientific Facility in Oakland, California.

    More than 5,000 scientists use NERSC to perform basic scientific research across a wide range of disciplines, including climate modeling, research into new materials, simulations of the early universe, analysis of data from high energy physics experiments, investigations of protein structure, and a host of other scientific endeavors.

    The NERSC Hopper system, a Cray XE6 with a peak theoretical performance of 1.29 Petaflop/s. To highlight its mission, powering scientific discovery, NERSC names its systems for distinguished scientists. Grace Hopper was a pioneer in the field of software development and programming languages and the creator of the first compiler. Throughout her career she was a champion for increasing the usability of computers understanding that their power and reach would be limited unless they were made to be more user friendly.

    (Historical photo of Grace Hopper courtesy of the Hagley Museum & Library, PC20100423_201. Design: Caitlin Youngquist/LBNL Photo: Roy Kaltschmidt/LBNL)

    NERSC is known as one of the best-run scientific computing facilities in the world. It provides some of the largest computing and storage systems available anywhere, but what distinguishes the center is its success in creating an environment that makes these resources effective for scientific research. NERSC systems are reliable and secure, and provide a state-of-the-art scientific development environment with the tools needed by the diverse community of NERSC users. NERSC offers scientists intellectual services that empower them to be more effective researchers. For example, many of our consultants are themselves domain scientists in areas such as material sciences, physics, chemistry and astronomy, well-equipped to help researchers apply computational resources to specialized science problems.

  • richardmitnick 4:13 pm on August 17, 2015 Permalink | Reply
    Tags: , , , Supercomputing   

    From isgtw: “Simplifying and accelerating genome assembly” 

    international science grid this week

    August 12, 2015
    Linda Vu

    To extract meaning from a genome, scientists must reconstruct portions — a time consuming process akin to rebuilding the sentences and paragraphs of a book from snippets of text. But by applying novel algorithms and high-performance computational techniques to the cutting-edge de novogenome assembly tool Meraculous, a team of scientists have simplified and accelerated genome assembly — reducing a months-long process to mere minutes.

    Temp 1
    “The new parallelized version of Meraculous shows unprecedented performance and efficient scaling up to 15,360 processor cores for the human and wheat genomes on NERSC’s Edison supercomputer,” says Evangelos Georganas. “This performance improvement sped up the assembly workflow from days to seconds.” Courtesy NERSC.

    Researchers from the Lawrence Berkeley National Laboratory (Berkeley Lab) and UC Berkeley have made this gain by ‘parallelizing’ the DNA code — sometimes billions of bases long — to harness the processing power of supercomputers, such as the US Department of Energy’s National Energy Research Scientific Computing Center’s (NERSC’s) Edison system. (Parallelizing means splitting up tasks to run on the many nodes of a supercomputer at once.)

    “Using the parallelized version of Meraculous, we can now assemble the entire human genome in about eight minutes,” says Evangelos Georganas, a UC Berkeley graduate student. “With this tool, we estimate that the output from the world’s biomedical sequencing capacity could be assembled using just a portion of the Berkeley-managed NERSC’s Edison supercomputer.”

    Supercomputers: A game changer for assembly

    High-throughput next-generation DNA sequencers allow researchers to look for biological solutions — and for the most part, these machines are very accurate at recording the sequence of DNA bases. Sometimes errors do occur, however. These errors complicate analysis by making it harder to assemble genomes and identify genetic mutations. They can also lead researchers to misinterpret the function of a gene.

    Researchers use a technique called shotgun sequencing to identify these errors. This involves taking numerous copies of a DNA strand, breaking it up into random smaller pieces and then sequencing each piece separately. For a particularly complex genome, this process can generate several terabytes of data.

    To identify data errors quickly and effectively, the Berkeley Lab and UC Berkeley team use ‘Bloom filters‘ and massively parallel supercomputers. “Applying Bloom filters has been done before, but what we have done differently is to get Bloom filters to work with distributed memory systems,” says Aydin Buluç, a research scientist in Berkeley Lab’s Computational Research Division (CRD). “This task was not trivial; it required some computing expertise to accomplish.”

    The team also developed solutions for parallelizing data input and output (I/O). “When you have several terabytes of data, just getting the computer to read your data and output results can be a huge bottleneck,” says Steven Hofmeyr, a research scientist in CRD who developed these solutions. “By allowing the computer to download the data in multiple threads, we were able to speed up the I/O process from hours to minutes.”

    The assembly process

    Once errors are removed, researchers can begin the genome assembly. This process relies on computer programs to join k-mers — short DNA sequences consisting of a fixed number (K) of bases — at overlapping regions, so they form a continuous sequence, or contig. If the genome has previously been sequenced, scientists can use reference recorded gene annotations to align the reads. If not, they need to create a whole new catalog of contigs through de novo assembly.

    Temp 1
    “If assembling a single genome is like piecing together one novel, then assembling metagenomic data is like rebuilding the Library of Congress,” says Jarrod Chapman. Pictured: Human Chromosomes. Courtesy Jane Ades, National Human Genome Research Institute.

    De novoassembly is memory-intensive, and until recently was resistant to parallelization in distributed memory. Many researchers turned to specialized large memory nodes, several terabytes in size, to do this work, but even the largest commercially available memory nodes are not big enough to assemble massive genomes. Even with supercomputers, it still took several hours, days or even months to assemble a single genome.

    To make efficient use of massively parallel systems, Georganas created a novel algorithm for de novo assembly that takes advantage of the one-sided communication and Partitioned Global Address Space (PGAS) capabilities of the UPC (Unified Parallel C) programming language. PGAS lets researchers treat the physically separate memories of each supercomputer node as one address space, reducing the time and energy spent swapping information between nodes.

    Tackling the metagenome

    Now that computation is no longer a bottleneck, scientists can try a number of different parameters and run as many analyses as necessary to produce very accurate results. This breakthrough means that Meraculous could also be used to analyze metagenomes — microbial communities recovered directly from environmental samples. This work is important because many microbes exist only in nature and cannot be grown in a laboratory. These organisms may be the key to finding new medicines or viable energy sources.

    “Analyzing metagenomes is a tremendous effort,” says Jarrod Chapman, who developed Meraculous at the US Department of Energy’s Joint Genome Institute (managed by the Berkeley Lab). “If assembling a single genome is like piecing together one novel, then assembling metagenomic data is like rebuilding the Library of Congress. Using Meraculous to effectively do this analysis would be a game changer.”

    –iSGTW is becoming the Science Node. Watch for our new branding and website this September.

    See the full article here.

    Please help promote STEM in your local schools.
    STEM Icon

    Stem Education Coalition

    iSGTW is an international weekly online publication that covers distributed computing and the research it enables.

    “We report on all aspects of distributed computing technology, such as grids and clouds. We also regularly feature articles on distributed computing-enabled research in a large variety of disciplines, including physics, biology, sociology, earth sciences, archaeology, medicine, disaster management, crime, and art. (Note that we do not cover stories that are purely about commercial technology.)

    In its current incarnation, iSGTW is also an online destination where you can host a profile and blog, and find and disseminate announcements and information about events, deadlines, and jobs. In the near future it will also be a place where you can network with colleagues.

    You can read iSGTW via our homepage, RSS, or email. For the complete iSGTW experience, sign up for an account or log in with OpenID and manage your email subscription from your account preferences. If you do not wish to access the website’s features, you can just subscribe to the weekly email.”

  • richardmitnick 11:24 am on August 11, 2015 Permalink | Reply
    Tags: , IBM Watson, , Supercomputing   

    From MIT Tech Review: “Why IBM Just Bought Billions of Medical Images for Watson to Look At” 

    MIT Technology Review
    M.I.T. Technology Review

    August 11, 2015
    Mike Orcut

    IBM seeks to transform image-based diagnostics by combining its cognitive computing technology with a massive collection of medical images.

    IBM says that Watson, its artificial-intelligence technology, can use advanced computer vision to process huge volumes of medical images. Now Watson has its sights set on using this ability to help doctors diagnose diseases faster and more accurately.


    Last week IBM announced it would buy Merge Healthcare for a billion dollars. If the deal is finalized, this would be the third health-care data company IBM has bought this year (see “Meet the Health-Care Company IBM Needed to Make Watson More Insightful”). Merge specializes in handling all kinds of medical images, and its service is used by more than 7,500 hospitals and clinics in the United States, as well as clinical research organizations and pharmaceutical companies. Shahram Ebadollahi, vice president of innovation and chief science officer for IBM’s Watson Health Group, says the acquisition is part of an effort to draw on many different data sources, including anonymized, text-based medical records, to help physicians make treatment decisions.

    Merge’s data set contains some 30 billion images, which is crucial to IBM because its plans for Watson rely on a technology, called deep learning, that trains a computer by feeding it large amounts of data.

    Watson won Jeopardy! by using advanced natural-language processing and statistical analysis to interpret questions and provide the correct answers. Deep learning was added to Watson’s skill set more recently (see “IBM Pushes Deep Learning with a Watson Upgrade”). This new approach to artificial intelligence involves teaching computers to spot patterns in data by processing it in ways inspired by networks of neurons in the brain (see “Breakthrough Technologies 2013: Deep Learning”). The technology has already produced very impressive results in speech recognition (see “Microsoft Brings Star Trek’s Voice Translator to Life”) and image recognition (see “Facebook Creates Software That Matches Faces Almost as Well as You Do”).

    IBM’s researchers think medical image processing could be next. Images are estimated to make up as much as 90 percent of all medical data today, but it can be difficult for physicians to glean important information from them, says John Smith, senior manager for intelligent information systems at IBM Research.

    One of the most promising near-term applications of automated image processing, says Smith, is in detecting melanoma, a type of skin cancer. Diagnosing melanoma can be difficult, in part because there is so much variation in the way it appears in individual patients. By feeding a computer many images of melanoma, it is possible to teach the system to recognize very subtle but important features associated with the disease. The technology IBM envisions might be able to compare a new image from a patient with many others in a database and then rapidly give the doctor important information, gleaned from the images as well as from text-based records, about the diagnosis and potential treatments.

    Finding cancer in lung CT scans is another good example of how such technology could help diagnosis, says Jeremy Howard, CEO of Enlitic, a one-year-old startup that is also using deep learning for medical image processing (see “A Startup Hopes to Teach Computers to Spot Tumors in Medical Scans”). “You have to scroll through hundreds and hundreds of slices looking for a few little glowing pixels that appear and disappear, and that takes a long time, and it is very easy to make a mistake,” he says. Howard says his company has already created an algorithm capable of identifying relevant characteristics of lung tumors more accurately than radiologists can.​​

    Howard says the biggest barrier to using deep learning in medical diagnostics is that so much of the data necessary for training the systems remains isolated in individual institutions, and government regulations can make it difficult to share that information. IBM’s acquisition of Merge, with its billions of medical images, could help address that problem.


    See the full article here.

    Please help promote STEM in your local schools.

    STEM Icon

    Stem Education Coalition

    The mission of MIT Technology Review is to equip its audiences with the intelligence to understand a world shaped by technology.

  • richardmitnick 2:23 pm on July 31, 2015 Permalink | Reply
    Tags: , , Supercomputing   

    From NSF: “Super news for supercomputers” 

    National Science Foundation

    July 30, 2015
    No Writer Credit


    This week, President Obama issued an executive order establishing the National Strategic Computing Initiative (NSCI) to ensure that the United States continues its leadership in high-performance computing over the coming decades.

    The National Science Foundation is proud to serve as one of the three lead agencies for the NSCI, working alongside the Department of Energy (DOE) and the Department of Defense (DOD) to maximize the benefits of high-performance computing research, development, and deployment across the federal government and in collaboration with academia and industry.

    NSF has been a leader in high-performance computing, and advanced cyber infrastructure more generally, for nearly four decades.

    That was then… A Cray supercomputer in the mid-1980s at the National Center for Supercomputer Applications, located at the University of Illinois, Urbana-Champaign.Credit: NCSA, University of Illinois at Urbana-Champ

    This is now… Blue Waters, launched in 2013, is one of the most powerful supercomputers in the world, and the fastest supercomputer on a university campus. Credit: NCSA, University of Illinois at Urbana-Champaign

    (The term “high-performance computing” refers to systems that, through a combination of processing capability and storage capacity, can solve computational problems that are beyond the capability of small- to medium-scale systems.)

    Over the last four decades, the benefits of advanced computing to our nation have been great.

    Whether helping to solve fundamental mysteries of the Universe…
    Simulations show the evolution of a white dwarf star as it is being disrupted by a massive black hole. Credit: Tamara Bogdanovic, Georgia Tech

    determining the underlying mechanisms of disease and prevention…
    Simulations of the human immunodeficiency virus (HIV) help researchers develop new antiretroviral drugs that suppress the HIV virus. Credit: Theoretical and Computational Biophysics Group, University of Illinois at Urbana-Champaign

    or improving the prediction of natural disasters and saving lives…
    3-D supercomputer simulations of earthquake data has found hidden rock structures deep under East Asia. Credit: Min Song, Rice University

    high-performance computing has been a necessary tool in the toolkit of scientists and engineers.

    NSF has the unique ability to ensure that our nation’s computing infrastructure is guided by the problems that scientists face working at the frontiers of science and engineering, and that our investments are informed by advances in state-of-the-art technologies and groundbreaking computer science research.

    By providing researchers and educators throughout the U.S. with access to cyberinfrastructure – the hardware, software, networks and people that make massive computing possible – NSF has accelerated the pace of discovery and innovation in all fields of inquiry. This holistic and collaborative high-performance computing ecosystem has transformed all areas of science and engineering and society at-large.

    In the new Strategic Initiative, NSF will continue to play a central role in computationally-enabled scientific advances, the development of the broader HPC ecosystem for making those scientific discoveries, and the development of a high-skill, high-tech workforce who can use high-performance computing for the good of the nation.

    We at NSF recognize that advancing discoveries and innovations demands a bold, sustainable, and comprehensive national strategy that is responsive to increasing computing demands, emerging technological challenges, and growing international competition.

    The National Strategic Computing Initiative paves the way toward a concerted, collective effort to examine the opportunities and challenges for the future of HPC.

    We look forward to working with other federal agencies, academic institutions, industry, and the scientific community to realize a vibrant future for HPC over the next 15 years, and to continue to power our nation’s ability to be the discovery and innovation engine of the world!

    See the full article here.

    Please help promote STEM in your local schools.
    STEM Icon

    Stem Education Coalition

    The National Science Foundation (NSF) is an independent federal agency created by Congress in 1950 “to promote the progress of science; to advance the national health, prosperity, and welfare; to secure the national defense…we are the funding source for approximately 24 percent of all federally supported basic research conducted by America’s colleges and universities. In many fields such as mathematics, computer science and the social sciences, NSF is the major source of federal backing.


  • richardmitnick 10:27 am on July 29, 2015 Permalink | Reply
    Tags: , , , , Supercomputing   

    From isgtw: “Supercomputers listen for extraterrestrial life” 

    international science grid this week

    July 29, 2015
    Lance Farrell

    Last week, NASA’s New Horizons spacecraft thrilled us with images from its close encounter with Pluto.

    NASA New Horizons spacecraft II
    NASA/New Horizons

    New Horizons now heads into the Kuiper belt and to points spaceward. Will it find life?

    Known objects in the Kuiper belt beyond the orbit of Neptune (scale in AU; epoch as of January 2015).

    That’s the question motivating Aline Vidotto, scientific collaborator at the Observatoire de Genève in Switzerland. Her recent study harnesses supercomputers to find out how to tune our radio dials to listen in on other planets.

    Model of an interplanetary medium. Stellar winds stream from the star and interact with the magnetosphere of the hot-Jupiters. Courtesy Vidotto

    Vidotto has been studying interstellar environments for a while now, focusing on the interplanetary atmosphere surrounding so-called hot-Jupiter exoplanets since 2009. Similar in size to our Jupiter, these exoplanets orbit their star up to 20 times as closely as Earth orbits the sun, and are considered ‘hot’ due to the extra irradiation they receive.

    Every star generates a stellar wind, and the characteristics of this wind depend on the star from which it originates. The speed of its rotation, its magnetism, its gravity, or how active it is are among the factors affecting this wind. These variables also modify the effect this wind will have on planets in its path.

    Since the winds of different star systems are likely to be very different from our own, we need computers to help us boldly go where no one has ever gone before. “Observationally, we know very little about the winds and the interplanetary space of other stars,” Vidotto says. “This is why we need models and numerical simulations.”

    Vidotto’s research focuses on planets four to nine times closer to their host star than Mercury is to the sun. She takes observations of the magnetic fields around five stars from astronomers at the Canada-France-Hawaii Telescope (CFHT) in Hawaii and the Bernard-Lyot Telescope in France and feeds them into 3D simulations. For her most recent study, she divided the computational load between the Darwin cluster (part of the DiRAC network) at the University of Cambridge (UK) and the Piz Daint at the Swiss National Supercomputing Center.

    Canada-France-Hawaii Telescope
    CFHT nterior

    Bernard Lyot telescope
    Bernard Lyot telescope interior
    Bernard Lyot

    The Darwin cluster consists of 9,728 cores, with a theoretical peak in excess of 202 teraFLOPS. Piz Daint consists of 5,272 compute nodes with 32 GB of RAM per node, and is capable of 7.8 petaFLOPS — that’s more computation in a day than a typical laptop could manage in a millennium.

    Vidotto’s analysis of the DiRAC simulations reveals a much different interplanetary medium than in our home solar system, with an overall interplanetary magnetic field 100 times larger than ours, and stellar wind pressures at the point of orbit in excess of 10,000 times ours.

    This immense pressure means these planets must have a very strong magnetic shield (magnetosphere) or their atmospheres would be blown away by the stellar wind, as we suspect happened on Mars. A planet’s atmosphere is thought to be initimately related to its habitability.

    A planet’s magnetism can also tell us something about the interior properties of the planet such as its thermal state, composition, and dynamics. But since the actual magnetic fields of these exoplanets have not been observed, Vidotto is pursuing a simple hypothesis: What if they were similar to our own Jupiter?

    Temp 1
    A model of an exoplanet magnetosphere interacting with an interstellar wind. Knowing the characteristics of the interplanetary medium and the flux of the exoplanet radio emissions in this medium can help us tune our best telescopes to listen for distant signs of life. Courtesy Vidotto.

    If this were the case, then the magnetosphere around these planets would extend five times the radius of the planet (Earth’s magnetosphere extends 10-15 times). Where it mingles with the onrushing stellar winds, it creates the effect familiar to us as an aurora display. Indeed, Vidotto’s research reveals the auroral power in these exoplanets is more impressive than Jupiter’s. “If we were ever to live on one of these planets, the aurorae would be a fantastic show to watch!” she says.

    Knowing this auroral power enables astronomers to realistically characterize the interplanetary medium around the exoplanets, as well as the auroral ovals through which cosmic and stellar particles can penetrate the exoplanet atmosphere. This helps astronomers correctly estimate the flux of exoplanet radio emissions and how sensitive equipment on Earth would have to be to detect them. In short, knowing how to listen is a big step toward hearing.

    Radio emissions from these hot-Jupiters would present a challenge to our current class of radio telescopes, such as the Low Frequency Array for radio astronomy (LOFAR). However, “there is one radio array that is currently being designed where these radio fluxes could be detected — the Square Kilometre Array (SKA),” Vidotto says. The SKA is set for completion in 2023, and in the DiRAC clusters Vidotto finds some of the few supercomputers in the world capable of testing correlation software solutions.

    Lofar radio telescope

    While there’s much more work ahead of us, Vidotto’s research presents a significant advance in radio astronomy and is helping refine our ability to detect signals from beyond. With her 3D exoplanet simulations, the DiRAC computation power, and the ears of SKA, it may not be long before we’re able to hear radio signals from distant worlds.

    Stay tuned!

    See the full article here.

    Please help promote STEM in your local schools.
    STEM Icon

    Stem Education Coalition

    iSGTW is an international weekly online publication that covers distributed computing and the research it enables.

    “We report on all aspects of distributed computing technology, such as grids and clouds. We also regularly feature articles on distributed computing-enabled research in a large variety of disciplines, including physics, biology, sociology, earth sciences, archaeology, medicine, disaster management, crime, and art. (Note that we do not cover stories that are purely about commercial technology.)

    In its current incarnation, iSGTW is also an online destination where you can host a profile and blog, and find and disseminate announcements and information about events, deadlines, and jobs. In the near future it will also be a place where you can network with colleagues.

    You can read iSGTW via our homepage, RSS, or email. For the complete iSGTW experience, sign up for an account or log in with OpenID and manage your email subscription from your account preferences. If you do not wish to access the website’s features, you can just subscribe to the weekly email.”

  • richardmitnick 1:47 pm on July 21, 2015 Permalink | Reply
    Tags: , , , Supercomputing   

    From isgtw: “Simulations reveal a less crowded universe” 

    international science grid this week

    July 15, 2015
    Jan Zverina

    Blue Waters supercomputer

    Simulations conducted on the Blue Waters supercomputer at the National Center for Supercomputing Applications (NCSA) suggest there may be far fewer galaxies in the universe than expected.

    The study, published this week in Astrophysical Journal Letters, shows the first results from the Renaissance Simulations, a suite of extremely high-resolution adaptive mesh refinement calculations of high redshift galaxy formation. Taking advantage of data transferred to SDSC Cloud at the San Diego Supercomputer Center (SDSC), these simulations show hundreds of well-resolved galaxies.

    “Most critically, we show that the ultraviolet luminosity function of our simulated galaxies is consistent with observations of redshift galaxy populations at the bright end of the luminosity function, but at lower luminosities is essentially flat rather than rising steeply,” says principal investigator and lead author Brian W. O’Shea, an associate professor at Michigan State University.

    This discovery allows researchers to make several novel and verifiable predictions ahead of the October 2018 launch of the James Webb Space Telescope, a new space observatory succeeding the Hubble Space Telescope.

    NASA Webb Telescope

    NASA Hubble Telescope
    NASA/ESA Hubble

    “The Hubble Space Telescope can only see what we might call the tip of the iceberg when it comes to taking inventory of the most distant galaxies,” said SDSC director Michael Norman. “A key question is how many galaxies are too faint to see. By analyzing these new, ultra-detailed simulations, we find that there are 10 to 100 times fewer galaxies than a simple extrapolation would predict.”

    The simulations ran on the National Science Foundation (NSF) funded Blue Waters supercomputer, one of the largest and most powerful academic supercomputers in the world. “These simulations are physically complex and very large — we simulate thousands of galaxies at a time, including their interactions through gravity and radiation, and that poses a tremendous computational challenge,” says O’Shea.

    Blue Waters, based at the University of Illinois, is used to tackle a wide range of challenging problems, from predicting the behavior of complex biological systems to simulating the evolution of the cosmos. The supercomputer has more than 1.5 petabytes of memory — enough to store 300 million images from a digital camera — and can achieve a peak performance level of more than 13 quadrillion calculations per second.

    “The flattening at lower luminosities is a key finding and significant to researchers’ understanding of the reionization of the universe, when the gas in the universe changed from being mostly neutral to mostly ionized,” says John H. Wise, Dunn Family assistant professor of physics at the Georgia Institute of Technology.

    Temp 1
    Matter overdensity (top row) and ionized fraction (bottom row) for the regions simulated in the Renaissance Simulations. The red triangles represent locations of galaxies detectable with the Hubble Space Telescope. The James Webb Space Telescope will detect many more distant galaxies, shown by the blue squares and green circles. These first galaxies reionized the universe shown in the image with blue bubbles around the galaxies. Courtesy Brian W. O’Shea (Michigan State University), John H. Wise (Georgia Tech); Michael Norman and Hao Xu (UC San Diego). Click for larger image.

    The term ‘reionized’ is used because the universe was ionized immediately after the fiery big bang. During that time, ordinary matter consisted mostly of hydrogen atoms with positively charged protons stripped of their negatively charged electrons. Eventually, the universe cooled enough for electrons and protons to combine and form neutral hydrogen. They didn’t give off any optical or UV light — and without it, conventional telescopes are of no use in finding traces of how the cosmos evolved during these Dark Ages. The light returned when reionization began.

    In an earlier paper, previous simulations concluded that the universe was 20 percent ionized about 300 million years after the Big Bang; 50 percent ionized at 550 million years after; and fully ionized at 860 million years after its creation.

    “Our work suggests that there are far fewer faint galaxies than one could previously infer,” says O’Shea. “Observations of high redshift galaxies provide poor constraints on the low-luminosity end of the galaxy luminosity function, and thus make it challenging to accurately account for the full budget of ionizing photons during that epoch.”

    See the full article here.

    Please help promote STEM in your local schools.
    STEM Icon

    Stem Education Coalition

    iSGTW is an international weekly online publication that covers distributed computing and the research it enables.

    “We report on all aspects of distributed computing technology, such as grids and clouds. We also regularly feature articles on distributed computing-enabled research in a large variety of disciplines, including physics, biology, sociology, earth sciences, archaeology, medicine, disaster management, crime, and art. (Note that we do not cover stories that are purely about commercial technology.)

    In its current incarnation, iSGTW is also an online destination where you can host a profile and blog, and find and disseminate announcements and information about events, deadlines, and jobs. In the near future it will also be a place where you can network with colleagues.

    You can read iSGTW via our homepage, RSS, or email. For the complete iSGTW experience, sign up for an account or log in with OpenID and manage your email subscription from your account preferences. If you do not wish to access the website’s features, you can just subscribe to the weekly email.”

  • richardmitnick 3:49 pm on July 10, 2015 Permalink | Reply
    Tags: , , , , Supercomputing   

    From BNL: “Big PanDA and Titan Merge to Tackle Torrent of LHC’s Full-Energy Collision Data” 

    Brookhaven Lab

    July 7, 2015
    Karen McNulty Walsh

    Workload handling software has broad potential to maximize use of available supercomputing resources

    The PanDA workload management system developed at Brookhaven Lab and the University of Texas, Arlington, has been integrated on the Titan supercomputer at the Oak Ridge Leadership Computing Facility at Oak Ridge National Laboratory.

    With the successful restart of the Large Hadron Collider (LHC), now operating at nearly twice its former collision energy, comes an enormous increase in the volume of data physicists must sift through to search for new discoveries.

    CERN LHC Map
    CERN LHC Grand Tunnel
    CERN LHC particles
    LHC at CERN

    Thanks to planning and a pilot project funded by the offices of Advanced Scientific Computing Research and High-Energy Physics within the Department of Energy’s Office of Science, a remarkable data-management tool developed by physicists at DOE’s Brookhaven National Laboratory and the University of Texas at Arlington is evolving to meet the big-data challenge.

    The workload management system, known as PanDA (for Production and Distributed Analysis), was designed by high-energy physicists to handle data analysis jobs for the LHC’s ATLAS collaboration.


    During the LHC’s first run, from 2010 to 2013, PanDA made ATLAS data available for analysis by 3000 scientists around the world using the LHC’s global grid of networked computing resources. The latest rendition, known as Big PanDA, schedules jobs opportunistically on Titan—the world’s most powerful supercomputer for open scientific research, located at the Oak Ridge Leadership Computing Facility (OLCF), a DOE Office of Science User Facility at Oak Ridge National Laboratory—in a manner that does not conflict with Titan’s ability to schedule its traditional, very large, leadership-class computing jobs.

    This integration of the workload management system on Titan—the first large-scale use of leadership class supercomputing facilities fully integrated with PanDA to assist in the analysis of experimental high-energy physics data—will have immediate benefits for ATLAS.

    “Titan is ready to help with new discoveries at the LHC,” said Brookhaven physicist Alexei Klimentov, a leader on the development of Big PanDA.

    The workload management system will likely also help meet big data challenges in many areas of science by maximizing the use of limited supercomputing resources.

    “As a DOE leadership computing facility, OLCF was designed to tackle large complex computing problems that cannot be readily performed using smaller facilities—things like modeling climate and nuclear fusion,” said Jack Wells, Director of Science for the National Center for Computational Science at ORNL. OLCF prioritizes the scheduling of these leadership jobs, which can take up 20, 60, or even greater than 90 percent of Titan’s computational resources. One goal is to make the most of the available running time and get as close to 100 percent utilization of the system as possible.

    “But even when Titan is fully loaded and large jobs are standing in the queue to run, we are typically using about 90 percent of the machine averaging over long periods of time,” Wells said. “That means, on average, there’s 10 percent of the machine that we are unable to use that could be made available to handle a mix of smaller jobs, essentially ‘filling in the cracks’ between the very large jobs.”

    As Klimentov explained, “Applications from high-energy physics don’t require a huge allocation of resources on a supercomputer. If you imagine a glass filled with stones to represent the supercomputing capacity and how much ‘space’ is taken up by the big computing jobs, we use the small spaces between the stones.”

    A workload-management system like PanDA could help fill those spaces with other types of jobs as well.

    Brookhaven physicists Alexei Klimentov and Torre Wenaus have helped to design computational strategies for handling a torrent of data from the ATLAS experiment at the LHC.

    New territory for experimental physicists

    While supercomputers have been absolutely essential for the complex calculations of theoretical physics, distributed grid resources have been the workhorses for analyzing experimental high-energy physics data. PanDA, as designed by Kaushik De, a professor of physics at UT, Arlington, and Torre Wenaus of Brookhaven Lab, helped to integrate these worldwide computing centers by introducing common workflow protocols and access to the entire ATLAS data set.

    But as the volume of data increases with the LHC collision energy, so does the need for running simulations that help scientists interpret their experimental results, Klimentov said. These simulations are perfectly suited for running on supercomputers, and Big PanDA makes it possible to do so without eating up valuable computing time.

    The cutting-edge prototype Big PanDA software, which has been significantly modified from its original design, “backfills” simulations of the collisions taking place at the LHC into spaces between typically large supercomputing jobs.

    “We can insert jobs at just the right time and in just the right size chunks so they can run without competing in any way with the mission leadership jobs, making use of computing power that would otherwise sit idle,” Wells said.

    In early June, as the LHC ramped up to 13 trillion electron volts of energy per proton, Titan ramped up to 10,000 core processing units (CPUs) simultaneously calculating LHC collisions, and has tested scalability successfully up to 90,000 concurrent cores.

    “These simulations provide a clear path to understanding the complex physical phenomena recorded by the ATLAS detector,” Klimentov said.

    He noted that during one 10-day period just after the LHC restart, the group ran ATLAS simulations on Titan for 60,000 Titan core-hours in backfill mode. (30 Titan cores used over a period of one hour consume 30 Titan core-hours of computing resource.)

    “This is a great achievement of the pilot program,” said De of UT Arlington, co-leader of the Big PanDA project.

    “We’ll be able to reach far greater heights when the pilot matures into daily operations at Titan in the next phase of this project,” he added.

    The Big PanDA team is now ready to bring its expertise to advancing the use of supercomputers for fields beyond high-energy physics. Already they have plans to use Big PanDA to help tackle the data challenges presented by the LHC’s nuclear physics research using the ALICE detector—a program that complements the exploration of quark-gluon plasma and the building blocks of visible matter at Brookhaven’s Relativistic Heavy Ion Collider (RHIC).

    ALICE - EMCal supermodel

    Brookhaven RHIC

    But they see widespread applicability in other data-intensive fields, including molecular dynamics simulations and studies of genes and proteins in biology, the development of new energy technologies and materials design, and understanding global climate change.

    “Our goal is to work with Jack and our other colleagues at OLCF to develop Big PanDA as a general workload tool available to all users of Titan and other supercomputers to advance fundamental discovery and understanding in a broad range of scientific and engineering disciplines,” Klimentov said. Supercomputing groups in the Czech Republic, UK, and Switzerland have already been making inquiries.

    Brookhaven’s role in this work was supported by the DOE Office of Science. The Oak Ridge Leadership Computing Facility is supported by the DOE Office of Science.

    Brookhaven National Laboratory and Oak Ridge National Laboratory are supported by the Office of Science of the U.S. Department of Energy. The Office of Science is the single largest supporter of basic research in the physical sciences in the United States, and is working to address some of the most pressing challenges of our time. For more information, please visit science.energy.gov.

    See the full article here.

    Please help promote STEM in your local schools.

    STEM Icon

    Stem Education Coalition
    BNL Campus

    One of ten national laboratories overseen and primarily funded by the Office of Science of the U.S. Department of Energy (DOE), Brookhaven National Laboratory conducts research in the physical, biomedical, and environmental sciences, as well as in energy technologies and national security. Brookhaven Lab also builds and operates major scientific facilities available to university, industry and government researchers. The Laboratory’s almost 3,000 scientists, engineers, and support staff are joined each year by more than 5,000 visiting researchers from around the world.Brookhaven is operated and managed for DOE’s Office of Science by Brookhaven Science Associates, a limited-liability company founded by Stony Brook University, the largest academic user of Laboratory facilities, and Battelle, a nonprofit, applied science and technology organization.

  • richardmitnick 1:56 pm on July 9, 2015 Permalink | Reply
    Tags: , , , Network Computing, Supercomputing,   

    From Symmetry: “More data, no problem” 


    July 09, 2015
    Katie Elyce Jones

    Scientists are ready to handle the increased data of the current run of the Large Hadron Collider.

    Photo by Reidar Hahn, Fermilab

    Physicist Alexx Perloff, a graduate student at Texas A&M University on the CMS experiment, is using data from the first run of the Large Hadron Collider for his thesis, which he plans to complete this year.

    CERN LHC Map
    CERN LHC Grand Tunnel
    CERN LHC particles

    CERN CMS Detector

    When all is said and done, it will have taken Perloff a year and a half to conduct the computing necessary to analyze all the information he needs—not unusual for a thesis.

    But had he used the computing tools LHC scientists are using now, he estimates he could have finished his particular kind of analysis in about three weeks. Although Perloff represents only one scientist working on the LHC, his experience shows the great leaps scientists have made in LHC computing by democratizing their data, becoming more responsive to popular demand and improving their analysis software.

    A deluge of data

    Scientists estimate the current run of the LHC could create up to 10 times more data than the first one. CERN already routinely stores 6 gigabytes (or 6 billion units of digital information) per second, up from 1 gigabyte per second in the first run.

    The second run of the LHC is more data-intensive because the accelerator itself is more intense: The collision energy is 60 percent greater, resulting in “pile-up” or more collisions per proton bunch. Proton bunches are also injected into the ring closer together, resulting in more collisions per second.

    On top of that, the experiments have upgraded their triggers, which automatically choose which of the millions of particle events per second to record. The CMS trigger will now record more than twice as much data per second as it did in the previous run.

    Had CMS and ATLAS scientists relied only on adding more computers to make up for the data hike, they would likely have needed about four to six times more computing power in CPUs and storage than they used in the first run of the LHC.


    To avoid such a costly expansion, they found smarter ways to share and analyze the data.

    Flattening the hierarchy

    Over a decade ago, network connections were less reliable than they are today, so the Worldwide LHC Computing Grid was designed to have different levels, or tiers, that controlled data flow.

    All data recorded by the detectors goes through the CERN Data Centre, known as Tier-0, where it is initially processed, then to a handful of Tier-1 centers in different regions across the globe.

    CERN DATA Center
    One view of the Cern Data Centre

    During the last run, the Tier-1 centers served Tier-2 centers, which were mostly the smaller university computing centers where the bulk of physicists do their analyses.

    “The experience for a user on Run I was more restrictive,” says Oliver Gutsche, assistant head of the Scientific Computing Division for Science Workflows and Operations at Fermilab, the US Tier-1 center for CMS*. “You had to plan well ahead.”

    Now that the network has proved reliable, a new model “flattens” the hierarchy, enabling a user at any ATLAS or CMS Tier-2 center to access data from any of their centers in the world. This was initiated in Run I and is now fully in place for Run II.

    Through a separate upgrade known as data federation, users can also open a file from another computing center through the network, enabling them to view the file without going through the process of transferring it from center to center.

    Another significant upgrade affects the network stateside. Through its Energy Sciences Network, or ESnet, the US Department of Energy increased the bandwidth of the transatlantic network that connects the US CMS and ATLAS Tier-1 centers to Europe. A high-speed network, ESnet transfers data 15,000 times faster than the average home network provider.

    Dealing with the rush

    One of the thrilling things about being a scientist on the LHC is that when something exciting shows up in the detector, everyone wants to talk about it. The downside is everyone also wants to look at it.

    “When data is more interesting, it creates high demand and a bottleneck,” says David Lange, CMS software and computing co-coordinator and a scientist at Lawrence Livermore National Laboratory. “By making better use of our resources, we can make more data available to more people at any time.”

    To avoid bottlenecks, ATLAS and CMS are now making data accessible by popularity.

    “For CMS, this is an automated system that makes more copies when popularity rises and reduces copies when popularity declines,” Gutsche says.

    Improving the algorithms

    One of the greatest recent gains in computing efficiency for the LHC relied on the physicists who dig into the data. By working closely with physicists, software engineers edited the algorithms that describe the physics playing out in the LHC, thereby significantly improving processing time for reconstruction and simulation jobs.

    “A huge amount of effort was put in, primarily by physicists, to understand how the physics could be analyzed while making the computing more efficient,” says Richard Mount, senior research scientist at SLAC National Accelerator Laboratory who was ATLAS computing coordinator during the recent LHC upgrades.

    CMS tripled the speed of event reconstruction and halved simulation time. Similarly, ATLAS quadrupled reconstruction speed.

    Algorithms that determine data acquisition on the upgraded triggers were also improved to better capture rare physics events and filter out the background noise of routine (and therefore uninteresting) events.

    “More data” has been the drumbeat of physicists since the end of the first run, and now that it’s finally here, LHC scientists and students like Perloff can pick up where they left off in the search for new physics—anytime, anywhere.

    *While not noted in the article, I believe that Brookhaven National Laboratory is the Tier 1 site for Atlas in the United States.

    See the full article here.

    Please help promote STEM in your local schools.

    STEM Icon

    Stem Education Coalition

    Symmetry is a joint Fermilab/SLAC publication.

  • richardmitnick 7:33 pm on April 15, 2015 Permalink | Reply
    Tags: , , , Supercomputing   

    From isgtw: “Supercomputing enables researchers in Norway to tackle cancer” 

    international science grid this week

    April 15, 2015
    Yngve Vogt

    Cancer researchers are using the Abel supercomputer at the University of Oslo in Norway to detect which versions of genes are only found in cancer cells. Every form of cancer, even every tumour, has its own distinct variants.

    “This charting may help tailor the treatment to each patient,” says Rolf Skotheim, who is affiliated with the Centre for Cancer Biomedicine and the research group for biomedical informatics at the University of Oslo, as well as the Department of Molecular Oncology at Oslo University Hospital.

    Temp 0
    “Charting the versions of the genes that are only found in cancer cells may help tailor the treatment offered to each patient,” says Skotheim. Image courtesy Yngve Vogt.

    His research group is working to identify the genes that cause bowel and prostate cancer, which are both common diseases. There are 4,000 new cases of bowel cancer in Norway every year. Only six out of ten patients survive the first five years. Prostate cancer affects 5,000 Norwegians every year. Nine out of ten survive.

    Comparisons between healthy and diseased cells

    In order to identify the genes that lead to cancer, Skotheim and his research group are comparing genetic material in tumours with genetic material in healthy cells. In order to understand this process, a brief introduction to our genetic material is needed:

    Our genetic material consists of just over 20,000 genes. Each gene consists of thousands of base pairs, represented by a specific sequence of the four building blocks, adenine, thymine, guanine, and cytosine, popularly abbreviated to A, T, G, and C. The sequence of these building blocks is the very recipe for the gene. Our whole DNA consists of some six billion base pairs.

    The DNA strand carries the molecular instructions for activity in the cells. In other words, DNA contains the recipe for proteins, which perform the tasks in the cells. DNA, nevertheless, does not actually produce proteins. First, a copy of DNA is made: this transcript is called RNA and it is this molecule that is read when proteins are produced.

    RNA is only a small component of DNA, and is made up of its active constituents. Most of DNA is inactive. Only 1–2 % of the DNA strand is active.

    In cancer cells, something goes wrong with the RNA transcription. There is either too much RNA, which means that far too many proteins of a specific type are formed, or the composition of base pairs in the RNA is wrong. The latter is precisely the area being studied by the University of Oslo researchers.

    Wrong combinations

    All genes can be divided into active and inactive parts. A single gene may consist of tens of active stretches of nucleotides (exons). “RNA is a copy of a specific combination of the exons from a specific gene in DNA,” explains Skotheim. There are many possible combinations, and it is precisely this search for all of the possible combinations that is new in cancer research.

    Different cells can combine the nucleotides in a single gene in different ways. A cancer cell can create a combination that should not exist in healthy cells. And as if that didn’t make things complicated enough, sometimes RNA can be made up of stretches of nucleotides from different genes in DNA. These special, complex genes are called fusion genes.

    Temp 0
    “We need powerful computers to crunch the enormous amounts of raw data,” says Skotheim. “Even if you spent your whole life on this task, you would not be able to find the location of a single nucleotide.”

    In other words, researchers must look for errors both inside genes and between the different genes. “Fusion genes are usually found in cancer cells, but some of them are also found in healthy cells,” says Skotheim. In patients with prostate cancer, researchers have found some fusion genes that are only created in diseased cells. These fusion genes may then be used as a starting-point in the detection of and fight against cancer.

    The researchers have also found fusion genes in bowel cells, but they were not cancer-specific. “For some reason, these fusion genes can also be found in healthy cells,” adds Skotheim. “This discovery was a let-down.”
    Improving treatment

    There are different RNA errors in the various cancer diseases. The researchers must therefore analyze the RNA errors of each disease.

    Among other things, the researchers are comparing RNA in diseased and healthy tissue from 550 patients with prostate cancer. The patients that make up the study do not receive any direct benefits from the results themselves. However, the research is important in order to be able to help future patients.

    “We want to find the typical defects associated with prostate cancer,” says Skotheim. “This will make it easier to understand what goes wrong with healthy cells, and to understand the mechanisms that develop cancer. Once we have found the cancer-specific molecules, they can be used as biomarkers.” In some cases, the biomarkers can be used to find cancer, determine the level of severity of the cancer and the risk of spreading, and whether the patient should be given a more aggressive treatment.

    Even though the researchers find deviations in the RNA, there is no guarantee that there is appropriate, targeted medicine available. “The point of our research is to figure out more of the big picture,” says Skotheim. “If we identify a fusion gene that is only found in cancer cells, the discovery will be so important in itself that other research groups around the world will want to begin working on this straight away. If a cure is found that counteracts the fusion genes, this may have enormous consequences for the cancer treatment.”

    Laborious work

    Recreating RNA is laborious work. The set of RNA molecules consists of about 100 million bases, divided into a few thousand bases from each gene.

    The laboratory machine reads millions of small nucleotides. Each one is only 100 base pairs long. In order for the researchers to be able to place them in the right location, they must run large statistical analyses. The RNA analysis of a single patient can take a few days.

    All of the nucleotides must be matched with the DNA strand. Unfortunately the researchers do not have the DNA strands of each patient. In order to learn where the base pairs come from in the DNA strand, they must therefore use the reference genome of the human species. “This is not ideal, because there are individual differences,” explains Skotheim. The future potentially lies in fully sequencing the DNA of each patient when conducting medical experiments.

    There is no way this research could be carried out using pen and paper. “We need powerful computers to crunch the enormous amounts of raw data. Even if you spent your whole life on this task, you would not be able to find the location of a single nucleotide. This is a matter of millions of nucleotides that must be mapped correctly in the system of coordinates of the genetic material. Once we have managed to find the RNA versions that are only found in cancer cells, we will have made significant progress. However, the work to get that far requires advanced statistical analyses and supercomputing,” says Skotheim.

    The analyses are so demanding that the researchers must use the University of Oslo’s Abel supercomputer, which has a theoretical peak performance of over 250 teraFLOPS. “With the ability to run heavy analyses on such large amounts of data, we have an enormous advantage not available to other cancer researchers,” explains Skotheim. “Many medical researchers would definitely benefit from this possibility. This is why they should spend more time with biostatisticians and informaticians. RNA samples are taken from the patients only once. The types of analyses that can be run are only limited by the imagination.”

    “We need to be smart in order to analyze the raw data.” He continues: “There are enormous amounts of data here that can be interpreted in many different ways. We just got started. There is lots of useful information that we have not seen yet. Asking the right questions is the key. Most cancer researchers are not used to working with enormous amounts of data, and how to best analyze vast data sets. Once researchers have found a possible answer, they must determine whether the answer is chance or if it is a real finding. The solution is to find out whether they get the same answers from independent data sets from other parts of the world.”

    See the full article here.

    Please help promote STEM in your local schools.
    STEM Icon

    Stem Education Coalition

    iSGTW is an international weekly online publication that covers distributed computing and the research it enables.

    “We report on all aspects of distributed computing technology, such as grids and clouds. We also regularly feature articles on distributed computing-enabled research in a large variety of disciplines, including physics, biology, sociology, earth sciences, archaeology, medicine, disaster management, crime, and art. (Note that we do not cover stories that are purely about commercial technology.)

    In its current incarnation, iSGTW is also an online destination where you can host a profile and blog, and find and disseminate announcements and information about events, deadlines, and jobs. In the near future it will also be a place where you can network with colleagues.

    You can read iSGTW via our homepage, RSS, or email. For the complete iSGTW experience, sign up for an account or log in with OpenID and manage your email subscription from your account preferences. If you do not wish to access the website’s features, you can just subscribe to the weekly email.”

  • richardmitnick 9:35 am on March 17, 2015 Permalink | Reply
    Tags: , , , Supercomputing   

    From CBS: “Scientists mapping Earth in 3D, from the inside out’ 

    CBS News

    CBS News

    March 16, 2015
    Michael Casey

    Using a technique that is similar to a medical CT (“CAT”) scan, researchers at Princeton are using seismic waves from earthquakes to create images of the Earth’s subterranean structures — such as tectonic plates, magma reservoirs and mineral deposits — which will help better understand how earthquakes and volcanoes occur. Ebru Bozdağ, University of Nice Sophia Antipolis, and David Pugmire, Oak Ridge National Laboratory

    The wacky adventures of scientists traveling to the Earth’s core have been a favorite plot line in Hollywood over the decades, but actually getting there is mostly science fiction.

    Now, a group of scientists is using some of the world’s most powerful supercomputers to do what could be the next best thing.

    Princeton’s Jeroen Tromp and colleagues are eavesdropping on the seismic vibrations produced by earthquakes, and using the data to create a map of the Earth’s mantle, the semisolid rock that stretches to a depth of 1,800 miles, about halfway down to the planet’s center and about 300 times deeper than humans have drilled. The research could help understand and predict future earthquakes and volcanic eruptions.

    “We need to scour the maps for interesting and unexpected features,” Tromp told CBS News. “But it’s really a 3D mapping expedition.”

    To do this, Tromp and his colleagues will exploit an interesting phenomenon related to seismic activity below the surface of the Earth. As seismic waves travel, they change speed depending on the density, temperature and type of rock they’re moving through, for instance slowing down when traveling through an underground aquifer or magma.

    This three-dimensional image displays contours of locations where seismic wave speeds are faster than average.
    Ebru Bozdağ, University of Nice Sophia Antipolis, and David Pugmire, Oak Ridge National Laboratory

    Thousands of seismographic stations worldwide make recordings, or seismograms, that detail the movement produced by seismic waves, which typically travel at speeds of several miles per second and last several minutes. By combining seismographic readings of roughly 3,000 quakes of magnitude 5.5 and greater the geologists can produce a three-dimensional model of the structures under the Earth’s surface.

    For the task, Tromp’s team will use the supercomputer called Titan, which can perform more than 20 quadrillion calculations per second and is located at the Department of Energy’s Oak Ridge National Laboratory in Tennessee.

    ORNL Titan Supercomputer

    The technique, called seismic tomography, has been compared to the computerized tomography used in medical CAT scans, in which a scanner captures a series of X-ray images from different viewpoints, creating cross-sectional images that can be combined into 3D images.

    Tromp acknowledged he doesn’t think his research could one day lead to a scientist actually reaching the mantle. But he said it could help seismologists do a better job of predicting the damage from future earthquakes and the possibility of volcanic activity.

    For example, they might find a fragment of a tectonic plate that broke off and sank into the mantle. The resulting map could tell seismologists more about the precise locations of underlying tectonic plates, which can trigger earthquakes when they shift or slide against each other. The maps could also reveal the locations of magma that, if it comes to the surface, causes volcanic activity.

    See the full article here.

    Please help promote STEM in your local schools.

    STEM Icon

    Stem Education Coalition

Compose new post
Next post/Next comment
Previous post/Previous comment
Show/Hide comments
Go to top
Go to login
Show/Hide help
shift + esc

Get every new post delivered to your Inbox.

Join 475 other followers

%d bloggers like this: