Tagged: ASCRDiscovery Toggle Comment Threads | Keyboard Shortcuts

  • richardmitnick 10:30 am on May 16, 2018 Permalink | Reply
    Tags: , ASCRDiscovery, , The incredible shrinking data   

    From ASCRDiscovery: “The incredible shrinking data” 

    From ASCRDiscovery
    ASCR – Advancing Science Through Computing

    May 2018

    An Argonne National Laboratory computer scientist finds efficiencies for extracting knowledge from a data explosion.

    1
    Volume rendering of a large-eddy simulation for the turbulent mixing and thermal striping that occurs in the upper plenum of liquid sodium fast reactors. Original input data (left) and reconstructed data (right) from data-shrinking multivariate functional approximation model. Data generated by the Nek5000 solver are courtesy of Aleksandr Obabko and Paul Fischer of Argonne National Laboratory (ANL). Image courtesy of Tom Peterka, ANL.

    Tom Peterka submitted his Early Career Research Program proposal to the Department of Energy (DOE) last year with a sensible title: “A Continuous Model of Discrete Scientific Data.” But a Hollywood producer might have preferred, “User, I Want to Shrink the Data.”

    Downsizing massive scientific data streams seems a less fantastic voyage than science fiction’s occasional obsession with shrinking human beings, but it’s still quite a challenge. The $2.5 million, five-year early-career award will help Peterka accomplish that goal.

    Researchers find more to do with each generation of massive and improved supercomputers. “We find bigger problems to run, and every time we do that, the data become larger,” says Peterka, a computer scientist at the DOE’s Argonne National Laboratory.

    His project is addressing these problems by transforming data into a different form that is both smaller and more user-friendly for scientists who need to analyze that information.

    “I see a large gap between the data that are computed and the knowledge that we get from them,” Peterka says. “We tend to be data-rich but information-poor. If science is going to advance, then this information that we extract from data must somehow keep up with the data being collected or produced. That, to me, is the fundamental challenge.”

    2
    Tom Peterka. Image by Wes Agresta courtesy of Argonne National Laboratory.

    Computers have interested Peterka since he was a teenager in the 1980s, at the dawn of the personal computer era. “I’ve never really left the field. A background in math and science, an interest in technology – these are crosscutting areas that carry through all of my work.”

    The problems when Peterka got into the field dealt with gigabytes of data, one gigabyte exceeding the size of a single compact disc. The hurdle now is measured in petabytes – about 1.5 million CDs of data.

    Since completing his doctorate in computer science at the University of Illinois at Chicago in 2007, Peterka has focused on scientific data and their processing and analysis. His works with some of DOE’s leading-edge supercomputers, including Mira and Theta at Argonne and Cori and Titan at, respectively, Lawrence Berkeley and Oak Ridge national laboratories.

    MIRA IBM Blue Gene Q supercomputer at the Argonne Leadership Computing Facility

    ANL ALCF Theta Cray XC40 supercomputer

    NERSC CRAY Cori II supercomputerat NERSC at LBNL, named after Gerty Cori, the first American woman to win a Nobel Prize in science

    ORNL Cray XK7 Titan Supercomputer

    The early-career award is helping Peterka develop a multivariate functional approximation tool that reduces a mass of data at the expense of just a bit of accuracy. He’s designing his new method with the flexibility to operate on a variety of supercomputer architectures, including the next-generation exascale machines whose development DOE is leading.

    “We want this method to be available on all of them,” Peterka says, “because computational scientists often will run their projects on more than one machine.”

    His new, ultra-efficient way of representing data eliminates the need to revert to the original data points. He compares the process to the compression algorithms used to stream video or open a jpeg, but with an important difference. Those compress data to store the information or transport it to another computer. But the data must be decompressed to their original form and size for viewing. With Peterka’s method, the data need not be decompressed before reuse.

    “We have to decide how much error we can tolerate,” he says. “Can we throw away a percent of accuracy? Maybe, maybe not. It all depends on the problem.”

    Peterka’s Argonne Mathematics and Computer Science Division collaborators are Youssef Nashed, assistant computer scientist; Iulian Grindeanu, software engineer; and Vijay Mahadevan, assistant computational scientist. They have already produced some promising early results and submitted them for publication.

    The problems – from computational fluid dynamics and astrophysics to climate modeling and weather prediction – are “of global magnitude, or they’re some of the largest problems that we face in our world, and they require the largest resources,” Peterka says. “I’m sure that we can find similarly difficult problems in other domains. We just haven’t worked with them yet.”

    The Large Hadron Collider, the Dark Energy Survey and other major experiments and expansive observations generate and accumulate enormous amounts of data.

    LHC

    CERN/LHC Map

    CERN LHC Tunnel

    CERN LHC particles

    Dark Energy Survey


    Dark Energy Camera [DECam], built at FNAL


    NOAO/CTIO Victor M Blanco 4m Telescope which houses the DECam at Cerro Tololo, Chile, housing DECam at an altitude of 7200 feet

    Processing the data has become vital to the discovery process, Peterka says – becoming the fourth pillar of scientific inquiry, alongside theory, experiment and computation. “This is what we face today. In many ways, it’s no different from what industry and enterprise face in the big-data world today as well.”

    Peterka and his team work on half a dozen or more projects at a given time. Some sport memorable monikers, such as CANGA (Coupling Approaches for Next-Generation Architectures), MAUI (Modeling, Analysis and Ultrafast Imaging) and RAPIDS (Resource and Application Productivity through computation, Information and Data Science). Another project, called Decaf (for decoupled data flows), allows “users to allocate resources and execute custom code – creating a much better product,” Peterka says.

    The projects cover a range of topics, but they all fit into three categories: software or middleware solutions; algorithms built on top of that middleware; or applications developed with domain scientists – all approaches necessary for solving the big-data science problem.

    Says Peterka, “The takeaway message is that when you build some software component – and the multivariate functional analysis is no different – you want to build something that can work with other tools in the DOE software stack.”

    Argonne is managed by UChicago Argonne LLC for the DOE Office of Science. The Office of Science is the single largest supporter of basic research in the physical sciences in the United States and is working to address some of the most pressing challenges of our time.

    Now in its eighth year, the DOE Office of Science’s Early Career Research Program for researchers in universities and DOE national laboratories supports the development of individual research programs of outstanding scientists early in their careers and stimulates research careers in the disciplines supported by the Office of Science. For more information, please visit science.energy.gov.

    See the full article here.

    Please help promote STEM in your local schools.

    stem

    Stem Education Coalition

    ASCRDiscovery is a publication of The U.S. Department of Energy

     
  • richardmitnick 1:55 pm on December 20, 2017 Permalink | Reply
    Tags: A dynamic multi-objective problem – finding the best possible route for data one that is fast and less congested, , ASCRDiscovery, , early-careerResearch award from DOE’s Office of Science to develop methods combining machine-learning algorithms with parallel computing to optimize such networks, , Mariam Kiran, This type of science and the problems it can address can make a real impact,   

    From ASCRDiscovery: Women in STEM -“Thinking networks” Mariam Kiran 

    ASCRDiscovery
    Advancing Science Through Computing


    ESnet map

    December 2017

    ESnet’s DOE early-career awardee works to overcome roadblocks in computational networks.

    1
    ESnet’s Mariam Kiran. Image courtesy of ESnet.

    2
    The Atlas detector at CERN, in Switzerland. Users of it and other linked research facilities stand to benefit from ESnet’s efforts to reduce bottlenecks in intercontinental scientific data flow. Image courtesy of ESnet.

    CERN/ATLAS detector

    Like other complex systems, computer networks can break down and suffer bottlenecks. Keeping such systems running requires algorithms that can identify problems and find solutions on the fly so information moves quickly and on time.

    Mariam Kiran – a network engineer for the Energy Sciences Network (ESnet), a DOE Office of Science user facility managed by Lawrence Berkeley National Laboratory – is using an early-career research award from DOE’s Office of Science to develop methods combining machine-learning algorithms with parallel computing to optimize such networks.

    Kiran’s interest in science and mathematics was fueled by Doctor Who and other popular television shows she watched in her native United Kingdom. At 15 she got her first taste of computer programming, through a school project, using the BASIC programming language to create an airline database system. “I added a lot of graphics so that if you entered a wrong password, two airplanes would come across (the screen) and crash,” she says. It felt great to use a computer to create something out of nothing.

    Kiran’s economist father and botanist mother encouraged her interests and before long she was studying computer science at the University of Sheffield. Pop culture also influenced her interests there, at a time when many students dressed in long black coats like those seen in the blockbuster movie The Matrix. The core computer science concept from that film – using computer simulations to test complex theories – was appealing.

    She started coding such simulations, but along the way discovered another interest: developing ways around computer science roadblocks in those experiments. With simulations “you have potentially too much data to be processed, so you need a very fast and good system on the back end to make sure that the simulation goes as fast as it can,” she says. That challenge got her interested in computing and network infrastructure such as high-performance computing systems and cloud computing. She wanted to understand the problems and find strategies that help software run correctly and smoothly.

    Kiran’s interest led her to join the software engineering and testing group at the University of Sheffield, where she also completed her master’s degree and Ph.D. She was part of a team that assembled a simulation platform for coding interacting components of a complex system – or agent-based modeling, used widely in Europe to calculate problems in economics or biology. Each agent could represent a government, a person, an individual organism, or a cell. “You code everything up as an agent and then let them interact with other agents, randomly or by following certain rules, and see how the system reacts overall.”

    In 2014, she joined the UK’s University of Bradford as an associate professor and taught software engineering and machine learning. However, her research interests in performance optimization of computing and networks led her to investigate new projects that examined similar problems in applications that run over distributed compute and network resources. As a result, in 2016 she joined ESnet, which supports international science research computing networks and has produced a variety of innovations such as TCP and high-speed connections.

    With her early career grant, Kiran has five years of support to pursue software innovations that can manage the efficiency of today’s computer networks and take them to the next level. Machine learning algorithms – such as deep neural networks used for image recognition and analysis – can be exploited to understand user behavior and data-movement patterns across the network. A computer networks is a complex distributed system. How one heals itself or performs corrective measures at the edge while operating optimally overall is an interesting challenge to understand and solve, Kiran says.

    Managing information across networks is like transporting cargo on a highway system, she says. “You’re moving data from one building to the next building, and you have to find the shortest possible route.” The fastest path might depend on the time of day and traffic patterns.

    Some science applications, however, are deadline-driven and require data to arrive by certain times to succeed. Short routes might become overly congested, whereas slightly longer paths may be under-used.

    In the end, it’s a dynamic, multi-objective problem – finding the best possible route for data, one that is fast and less congested.

    “Throughout the day, the state of the network changes depending on the users and applications interacting on it,” Kiran notes. “Understanding these complex relationships is a challenge. I’m interested in seeing whether machine learning can help us understand these more and allow networks to automate corrective measures in near-real time to prevent outages and application failures.”

    She’s now identifying main problems along autonomous networks and applying those lessons to analogous computational and network problems. For example, she’s examining how engineers deal with outage-triggering bottlenecks and how bandwidth is controlled across links. Being at ESnet, which has led networking research for years, provides immense experience and capabilities to learn and apply solutions to a high-speed network that is built to think, she says.

    Better-functioning networks could speed computational research on a range of topics, including climate, weather and nuclear energy. High performance computing boosts these calculations by rapidly distributing them across multiple computers and processors, sometimes across the world. It also allows international scientists to collaborate quickly. Researchers at diverse locations from Berkeley Lab to Switzerland’s CERN to labs in South America can interact with data quickly and seamlessly and develop new theories and findings.

    This type of science and the problems it can address can make a real impact, Kiran says. “That’s what excites me about research – that we can improve or provide solutions to real-world problems.”

    Now in its eighth year, the DOE Office of Science’s Early Career Research Program for researchers in universities and DOE national laboratories supports the development of individual research programs of outstanding scientists early in their careers and stimulates research careers in the disciplines supported by the Office of Science. The Office of Science is the single largest supporter of basic research in the physical sciences in the United States and is working to address some of the most pressing challenges of our time. For more information, please visit htp://science.energy.gov.

    See the full article here.

    Please help promote STEM in your local schools.

    STEM Icon

    Stem Education Coalition

    ASCRDiscovery is a publication of The U.S. Department of Energy

     
  • richardmitnick 12:10 pm on November 29, 2017 Permalink | Reply
    Tags: , ASCRDiscovery, Bridging gaps in high-performance computing languages, , Generative programming, Programming languages, Tiark Rompf   

    From ASCRDiscovery: “Language barrier” 

    ASCRDiscovery
    Advancing Science Through Computing

    November 2017
    No writer credit

    A Purdue University professor is using a DOE early-career award to bridge gaps in high-performance computing languages.

    1
    Detail from an artwork made through generative programming. Purdue University’s Tiark Rompf is investigating programs that create new ones to bring legacy software up to speed in the era of exascale computing. Painting courtesy of Freddbomba via Wikimedia Commons.

    A Purdue University assistant professor of computer science leads a group effort to find new and better ways to generate high-performance computing codes that run efficiently on as many different kinds of supercomputer architectures as possible.

    That’s the challenging goal Tiark Rompf has set for himself with his recent Department of Energy Early Career Research Program award – to develop what he calls “program generators” for exascale architectures and beyond.

    “Programming supercomputers is hard,” Rompf says. Coders typically write software in so-called general-purpose languages. The languages are low-level, meaning “specialized to a given machine architecture. So when a machine is upgraded or replaced, one has to rewrite most of the software.”

    As an alternative to this rewriting, which involves tediously translating low-level code from one supercomputer platform into another, programmers would prefer to use high-level languages “written in a way that feels natural” to them, Rompf says, and “closer to the way a programmer thinks about the computation.”

    But high-level and low-level languages are far apart, with a steel wall of differences between the ways the two types of languages are written, interpreted and executed. In particular, high-level languages rarely perform as well as desired. Executing them requires special so-called smart compilers that must use highly specialized analysis to figure out what the program “really means and how to match it to machine instructions.”

    2
    Tiark Rompf. Photo courtesy of Purdue University.

    Rompf and his group propose avoiding that with something called generative programming, which he has worked on since before he received his 2012 Ph.D. from Ecole Polytechnique Federale de Lausanne (EPFL) in Switzerland. The idea is to create special programs structured so they’re able to make additional programs where needed.

    In a 2015 paper, Rompf and research colleagues at EPFL, Stanford University and ETH Zurich also called for a radical reassessment of high-level languages. “We really need to think about how to design programming languages and (software) libraries that embrace this generative programming idea,” he adds.

    Program generators “are attractive because they can automate the process of producing very efficient code,” he says. But building them “has also been very hard, and therefore only a few exist today. We’re planning to build the necessary infrastructure to make it an order of magnitude easier.”

    As he noted in his early-career award proposal, progress building program generators is extremely difficult for more reasons than just programmer-computer disharmony. Other obstacles include compiler limitations, differing capabilities of supercomputer processors, the changing ways data are stored and the ways software libraries are accessed. Rompf plans to use his five-year, $750,000 award to evaluate generative programming as a way around some of those roadblocks.

    One idea, for instance, is to identify and create an extensible stack of intermediate languages that could serve as transitional steps when high-level codes must be translated into machine code. These also are described as “domain-specific languages” or DSLs, as they encode more knowledge about the application subject than general-purpose languages.

    Eventually, programmers hope to entirely phase out legacy languages such as C and Fortran, substituting only high-level languages and DSLs. Rompf points out that legacy codes can be decades older than the processors they run on, and some have been heavily adapted to run on new generations of machines, an investment that can make legacy codes difficult to jettison.

    __________________________________________________
    Rompf started Project Lancet to integrate generative approaches into a virtual machine for high-level languages.
    __________________________________________________

    Generative programming was the basis for Rompf’s doctoral research. It was described as an approach called Lightweight Modular Staging, or LMS, in a 2010 paper he wrote with his EPFL Ph.D. advisor, Martin Odersky. That’s “a software platform that provides capabilities for other programmers to develop software in a generative style,” Rompf says.

    LMS also underpins Delite, a software framework Rompf later developed in collaboration with a Stanford University group to build DSLs targeting parallel processing in supercomputer architectures – “very important for the work I’m planning to do,” he says.

    While working at Oracle Labs between 2012 and 2014, Rompf started Project Lancet to integrate generative approaches into a virtual machine for high-level languages. Virtual machines are code that can induce real computers to run selected programs. In the case of Lancet, software executes high-level languages and then performs selective compilations in machine code.

    Born and raised in Germany, Rompf joined Purdue in the fall of 2014. It’s “a great environment for doing this kind of research,” he says. “We have lots of good students in compilers, high-performance and databases. We’ve been hiring many new assistant professors. There are lots of young people who all want to accomplish things.”

    He calls his DOE Early Career award a great honor. “I think there are many opportunities for future work in getting more of the DOE community in the interaction.” Although he is the project’s only principal investigator, he is collaborating with other groups at Purdue, ETH Zurich and Stanford and has received recent and related National Science Foundation research grants.

    As a busy assistant professor, he has six graduate students on track to get their doctorates, plus a varying number of undergraduate assistants. Rompf also is a member of the Purdue Research on Programming Languages group (PurPL), with 10 faculty members and their students.

    “It’s a very vibrant group, which like the Purdue computer science department has been growing a lot in recent years,” he says.

    Now in its eighth year, the DOE Office of Science’s Early Career Research Program for researchers in universities and DOE national laboratories supports the development of individual research programs of outstanding scientists early in their careers and stimulates research careers in the disciplines supported by the Office of Science. The Office of Science is the single largest supporter of basic research in the physical sciences in the United States and is working to address some of the most pressing challenges of our time. For more information, please visit science.energy.gov.

    See the full article here.

    Please help promote STEM in your local schools.

    STEM Icon

    Stem Education Coalition

    ASCRDiscovery is a publication of The U.S. Department of Energy

     
  • richardmitnick 12:48 pm on November 2, 2017 Permalink | Reply
    Tags: Apache Spark open-source software, ASCRDiscovery, , ,   

    From ASCRDiscovery: “A Spark in the dark” 

    ASCRDiscovery
    Advancing Science Through Computing

    October 2017

    The cosmological search in the dark is no walk in the park. With help from Berkeley Lab’s NERSC, Fermilab [FNAL] aims open-source software at data from high-energy physics.

    NERSC Cray Cori II supercomputer

    LBL NERSC Cray XC30 Edison supercomputer


    The Genepool system is a cluster dedicated to the DOE Joint Genome Institute’s computing needs. Denovo is a smaller test system for Genepool that is primarily used by NERSC staff to test new system configurations and software.

    NERSC PDSF


    PDSF is a networked distributed computing cluster designed primarily to meet the detector simulation and data analysis requirements of physics, astrophysics and nuclear science collaborations.

    1
    Proposed filaments of dark matter surrounding Jupiter could be part of the mysterious 95 percent of the universe’s mass-energy. Image courtesy of NASA/JPL-Caltech.

    Most of the universe is dark, with dark matter and dark energy comprising more than 95 percent of its mass-energy. Yet we know little about dark matter and energy. To find answers, scientists run huge high-energy physics experiments. Analyzing the results demands high-performance computing – sometimes balanced with industrial trends.

    After four years of running computing for the Large Hadron Collider CMS experiment at CERN near Geneva, Switzerland – part of the work that revealed the Higgs boson – Oliver Gutsche, a scientist at Department of Energy’s (DOE) Fermi National Accelerator Laboratory, turned to the search for dark matter.

    CERN CMS Higgs Event


    CERN/CMS Detector


    “The Higgs boson had been predicted, and we knew approximately where to look,” he says. “With dark matter, we don’t know what we’re looking for.”

    To learn about dark matter, Gutsche needs more data. Once that information is available, physicists must mine it. They are exploring computational tools for the job, including Apache Spark open-source software.

    In searching for dark matter, physicists study results from colliding particles. “This is trivial to parallelize,” breaking the job into pieces to get answers faster, Gutsche explains. “Two PCs can each process a collision,” meaning researchers can employ a computer grid to analyze data.

    Much of the work in high-energy physics, though, depends on software the scientists develop. “If our graduate students and postdocs only know our proprietary tools, then they’ll have trouble if they go to industry,” where such software is unavailable, Gutsche notes. “So I started to look into Spark.”

    3
    To search for dark matter, scientists collect and analyze results from colliding particles, an extremely computationally intense process. Image courtesy of CMS CERN.

    Spark is a data-reduction tool made for unstructured text files. That creates a challenge – accessing the high-energy physics data, which are in an object-oriented format. Fermilab computer science researchers Saba Sehrish and Jim Kowalkowski are tackling the task.

    Spark offered promise from the beginning, with some particularly interesting features, Sehrish says. “One was in-memory, large-scale distributed processing” through high-level interfaces, which makes it easy to use. “You don’t want scientists to worry about how to distribute data and write parallel code,” she says. Spark takes care of that.

    Another attractive feature: Spark is a supported research platform at the National Energy Research Scientific Computing Center (NERSC), a DOE Office of Science user facility at the DOE’s Lawrence Berkeley National Laboratory.

    “This gives us a support team that can tune it,” Kowalkowski says. Computer scientists like Sehrish and Kowalkowski can add capabilities, but making the underlying code work as efficiently as possible requires Spark specialists, some of whom work at NERSC.

    Kowalkowski summarizes Spark’s desirable features as “automated scaling, automated parallelism and a reasonable programming model.”

    In short, he and Sehrish want to build a system allowing researchers to run an analysis that performs extremely well on large-scale machines without complications and through an easy user interface.

    Just being easy to use, though, is not enough when dealing with data from high-energy physics. Spark appears to satisfy both ease-of-use and performance goals to some degree. Researchers are still investigating some aspects of its performance for high-energy physics applications, but computer scientists can’t have everything. “There is a compromise,” Sehrish states. “When you’re looking for more performance, you don’t get ease of use.”

    The Fermilab scientists selected Spark as an initial choice for exploring big-data science, and dark matter is just the first application under testing. “We need several real-use cases to understand the feasibility of using Spark for an analysis task,” Sehrish says. With scientists like Gutsche at Fermilab, dark matter was a good place to start. Sehrish and Kowalkowski want to simplify the lives of scientists running the analysis. “We work with scientists to understand their data and work with their analysis,” Sehrish says. “Then we can help them better organize data sets, better organize analysis tasks.”

    As a first step in that process, Sehrish and Kowalkowski must get data from high-energy physics experiments into Spark. Notes Kowalkowski, “You have petabytes of data in specific experimental formats that you have to turn into something useful for another platform.”

    The starting data for the dark-matter implementation are formatted for high-throughput computing platforms, but Spark doesn’t handle that configuration. So software must read the original data format and convert it to something that works well with Spark.

    In doing this, Sehrish explains, “you have to consider every decision at every step, because how you structure the data, how you read it into memory and design and implement operations for high performance is all linked.”

    Each of those data-handling steps affects Spark’s performance. Although it’s too early to tell how much performance can be pulled from Spark when analyzing dark-matter data, Sehrish and Kowalkowski see that Spark can provide user-friendly code that allows high-energy physics researchers to launch a job on hundreds of thousands of cores. “Spark is good in that respect,” Sehrish says. “We’ve also seen good scaling – not wasting computing resources as we increase the dataset size and the number of nodes.”

    No one knows if this will be a viable approach until determining Spark’s peak performance for these applications. “The main key,” Kowalkowski says, “is that we are not convinced yet that this is the technology to go forward.”

    In fact, Spark itself changes. Its extensive open-source use creates a constant and rapid development cycle. So Sehrish and Kowalkowski must keep their code up with Spark’s new capabilities.

    “The constant cycle of growth with Spark is the cost of working with high-end technology and something with a lot of development interests,” Sehrish says.

    It could be a few years before Sehrish and Kowalkowski make a decision on Spark. Converting software created for high-throughput computing into good high-performance computing tools that are easy to use requires fine tuning and team work between experimental and computational scientists. Or, you might say, it takes more than a shot in the dark.

    A DOE Office of Science laboratory, Fermilab [FNAL] is located near Chicago, Illinois, and operated under contract by the Fermi Research Alliance LLC. The DOE Office of Science is the single largest supporter of basic research in the physical sciences in the United States, and is working to address some of the most pressing challenges of our time. For more information, please visit http://science.energy.gov.

    See the full article here.

    Please help promote STEM in your local schools.

    STEM Icon

    Stem Education Coalition

    ASCRDiscovery is a publication of The U.S. Department of Energy

     
  • richardmitnick 1:37 pm on January 12, 2017 Permalink | Reply
    Tags: Argo project, ASCRDiscovery, , , Hobbes project, , , XPRESS project   

    From ASCRDiscovery via D.O.E. “Upscale computing” 

    DOE Main

    Department of Energy

    ASCRDiscovery

    ASCRDiscovery

    January 2017
    No writer credit

    National labs lead the push for operating systems that let applications run at exascale.

    2
    Image courtesy of Sandia National Laboratories.

    For high-performance computing (HPC) systems to reach exascale – a billion billion calculations per second – hardware and software must cooperate, with orchestration by the operating system (OS).

    But getting from today’s computing to exascale requires an adaptable OS – maybe more than one. Computer applications “will be composed of different components,” says Ron Brightwell, R&D manager for scalable systems software at Sandia National Laboratories.

    “There may be a large simulation consuming lots of resources, and some may integrate visualization or multi-physics.” That is, applications might not use all of an exascale machine’s resources in the same way. Plus, an OS aimed at exascale also must deal with changing hardware. HPC “architecture is always evolving,” often mixing different kinds of processors and memory components in heterogeneous designs.

    As computer scientists consider scaling up hardware and software, there’s no easy answer for when an OS must change. “It depends on the application and what needs to be solved,” Brightwell explains. On top of that variability, he notes, “scaling down is much easier than scaling up.” So rather than try to grow an OS from a laptop to an exascale platform, Brightwell thinks the other way. “We should try to provide an exascale OS and runtime environment on a smaller scale – starting with something that works at a higher scale and then scale down.”

    To explore the needs of an OS and conditions to run software for exascale, Brightwell and his colleagues conducted a project called Hobbes, which involved scientists at four national labs – Oak Ridge (ORNL), Lawrence Berkeley, Los Alamos and Sandia – plus seven universities. To perform the research, Brightwell – with Terry Jones, an ORNL computer scientist, and Patrick Bridges, a University of New Mexico associate professor of computer science – earned an ASCR Leadership Computing Challenge allocation of 30 million processor hours on Titan, ORNL’s Cray XK7 supercomputer.

    ORNL Cray Titan Supercomputer
    ORNL Cray XK7 Titan Supercomputer

    2
    The Hobbes OS supports multiple software stacks working together, as indicated in this diagram of the Hobbes co-kernel software stack. Image courtesy of Ron Brightwell, Sandia National Laboratories.

    Brightwell made a point of including the academic community in developing Hobbes. “If we want people in the future to do OS research from an HPC perspective, we need to engage the academic community to prepare the students and give them an idea of what we’re doing,” he explains. “Generally, OS research is focused on commercial things, so it’s a struggle to get a pipeline of students focusing on OS research in HPC systems.”

    The Hobbes project involved a variety of components, but for the OS side, Brightwell describes it as trying to understand applications as they become more sophisticated. They may have more than one simulation running in a single OS environment. “We need to be flexible about what the system environment looks like,” he adds, so with Hobbes, the team explored using multiple OSs in applications running at extreme scale.

    As an example, Brightwell notes that the Hobbes OS envisions multiple software stacks working together. The OS, he says, “embraces the diversity of the different stacks.” An exascale system might let data analytics run on multiple software stacks, but still provide the efficiency needed in HPC at extreme scales. This requires a computer infrastructure that supports simultaneous use of multiple, different stacks and provides extreme-scale mechanisms, such as reducing data movement.

    Part of Hobbes also studied virtualization, which uses a subset of a larger machine to simulate a different computer and operating system. “Virtualization has not been used much at extreme scale,” Brightwell says, “but we wanted to explore it and the flexibility that it could provide.” Results from the Hobbes project indicate that virtualization for extreme scale can provide performance benefits at little cost.

    Other HPC researchers besides Brightwell and his colleagues are exploring OS options for extreme-scale computing. For example, Pete Beckman, co-director of the Northwestern-Argonne Institute of Science and Engineering at Argonne National Laboratory, runs the Argo project.

    A team of 25 collaborators from Argonne, Lawrence Livermore National Laboratory and Pacific Northwest National Laboratory, plus four universities created Argo, an OS that starts with a single Linux-based OS and adapts it to extreme scale.

    When comparing the Hobbes OS to Argo, Brightwell says, “we think that without getting in that Linux box, we have more freedom in what we do, other than design choices already made in Linux. Both of these OSs are likely trying to get to the same place but using different research vehicles to get there.” One distinction: The Hobbes project uses virtualization to explore the use of multiple OSs working on the same simulation at extreme scale.

    As the scale of computation increases, an OS must also support new ways of managing a systems’ resources. To explore some of those needs, Thomas Sterling, director of Indiana University’s Center for Research in Extreme Scale Technologies, developed ParalleX, an advanced execution model for computations. Brightwell leads a separate project called XPRESS to support the ParalleX execution model. Rather than computing’s traditional static methods, ParalleX implementations use dynamic adaptive techniques.

    More work is always necessary as computation works toward extreme scales. “The important thing in going forward from a runtime and OS perspective is the ability to evaluate technologies that are developing in terms of applications,” Brightwell explains. “For high-end applications to pursue functionality at extreme scales, we need to build that capability.” That’s just what Hobbes and XPRESS – and the ongoing research that follows them – aim to do.

    See the full article here .

    Please help promote STEM in your local schools.

    STEM Icon

    Stem Education Coalition

    The mission of the Energy Department is to ensure America’s security and prosperity by addressing its energy, environmental and nuclear challenges through transformative science and technology solutions.

     
  • richardmitnick 12:28 pm on September 4, 2016 Permalink | Reply
    Tags: ASCRDiscovery, ,   

    From DOE: “Packaging a wallop” 

    DOE Main

    Department of Energy

    ASCRDiscovery

    August 2016
    No writer credit found

    Lawrence Livermore National Laboratory’s time-saving HPC tool eases the way for next era of scientific simulations.

    1
    Technicians prepare the first row of cabinets for the pre-exascale Trinity supercomputer at Los Alamos National Laboratory, where a team from Lawrence Livermore National Laboratory deployed its new Spack software packaging tool. Photo courtesy of Los Alamos National Laboratory.

    From climate-change predictions to models of the expanding universe, simulations help scientists understand complex physical phenomena. But simulations aren’t easy to deploy. Computational models comprise millions of lines of code and rely on many separate software packages. For the largest codes, configuring and linking these packages can require weeks of full-time effort.

    Recently, a Lawrence Livermore National Laboratory (LLNL) team deployed a multiphysics code with 47 libraries – software packages that today’s HPC programs rely on – on Trinity, the Cray XC30 supercomputer being assembled at Los Alamos National Laboratory. A code that would have taken six weeks to deploy on a new machine required just a day and a half during an early-access period on part of Trinity, thanks to a new tool that automates the hardest parts of the process.

    LANL Cray XC30 Trinity supercomputer
    LANL Cray XC30 Trinity supercomputer

    This leap in efficiency was achieved using the Spack package manager. Package management tools are used frequently to deploy web applications and desktop software, but they haven’t been widely used to deploy high-performance computing (HPC) applications. Few package managers handle the complexities of an HPC environment and application developers frequently resort to building by hand. But as HPC systems and software become ever more complex, automation will be critical to keep things running smoothly on future exascale machines, capable of one million trillion calculations per second. These systems are expected to have an even more complicated software ecosystem.

    “Spack is like an app store for HPC,” says Todd Gamblin, its creator and lead developer. “It’s a bit more complicated than that, but it simplifies life for users in a similar way. Spack allows users to easily find the packages they want, it automates the installation process, and it allows contributors to easily share their own build recipes with others.” Gamblin is a computer scientist in LLNL’s Center for Applied Scientific Computing and works with the Development Environment Group at Livermore Computing. Spack was developed with support from LLNL’s Advanced Simulation and Computing program.

    Spack’s success relies on contributions from its burgeoning open-source community. To date, 71 scientists at more than 20 organizations are helping expand Spack’s growing repository of software packages, which number more than 500 so far. Besides LLNL, participating organizations include six national laboratories – Argonne, Brookhaven, Fermilab, Lawrence Berkeley (through the National Energy Research Scientific Computing Center), Los Alamos, Oak Ridge and Sandia – plus NASA, CERN and many other institutions worldwide.

    Spack is more than a repository for sharing applications. In the iPhone and Android app stores, users download pre-built programs that work out of the box. HPC applications often must be built directly on the supercomputer, letting programmers customize them for maximum speed. “You get better performance when you can optimize for both the host operating system and the specific machine you’re running on,” Gamblin says. Spack automates the process of fine-tuning an application and its libraries over many iterations, allowing users to quickly build many custom versions of codes and rapidly converge on a fast one.

    2
    Applications can share libraries when the applications are compatible with the same versions of their libraries (top). But if one application is updated and another is not, the first application won’t work with the second. Spack (bottom) allows multiple versions to coexist on the same system; here, for example, it simply builds a new version of the physics library and installs it alongside the old one. Schematic courtesy of Lawrence Livermore National Laboratory.

    Each new version of a large code may require rebuilding 70 or more libraries, also called dependencies. Traditional package managers typically allow installation of only one version of a package, to be shared by all installed software. This can be overly restrictive for HPC, where codes are constantly changed but must continue to work together. Picture two applications that share two dependencies: one for math and another for physics. They can share because the applications are compatible with the same versions of their dependencies. Suppose that application 2 is updated, and now requires version 2.0 of the physics library, but application 1 still only works with version 1.0. In a typical package manager, this would cause a conflict, because the two versions of the physics package cannot be installed at once. Spack allows multiple versions to coexist on the same system and simply builds a new version of the physics library and installs it alongside the old one.

    This four-package example is simple, Gamblin notes, but imagine a similar scenario with 70 packages, each with conflicting requirements. Most application users are concerned with generating scientific results, not with configuring software. With Spack, they needn’t have detailed knowledge of all packages and their versions, let alone where to find the optimal version of each, to begin the build. Instead, Spack handles the details behind the scenes and ensures that dependencies are built and linked with their proper relationships. It’s like selecting a CD player and finding it’s already connected to a compatible amplifier, speakers and headphones.

    Gamblin and his colleagues call Spack’s dependency configuration process concretization – filling in “the details to make an abstract specification concrete,” Gamblin explains. “Most people, when they say they want to build something, they have a very abstract idea of what they want to build. The main complexity of building software is all the details that arise when you try to hook different packages together.”

    During concretization, the package manager runs many checks, flagging inconsistencies among packages, such as conflicting versions. Spack also compares the user’s expectations against the properties of the actual codes and their versions and calls out and helps to resolve any mismatches. These automated checks save untold hours of frustration, avoiding cases in which a package wouldn’t have run properly.

    The complexity of building modern HPC software leads some scientists to avoid using libraries in their codes. They opt instead to write complex algorithms themselves, Gamblin says. This is time consuming and can lead to sub-optimal performance or incorrect implementations. Package management simplifies the process of sharing code, reducing redundant effort and increasing software reuse.

    Most important, Spack enables users to focus on the science they set out to do. “Users really want to be able to install an application and get it working quickly,” Gamblin says. “They’re trying to do science, and Spack frees them from the meta-problem of building and configuring the code.”

    See the full article here .

    Please help promote STEM in your local schools.

    STEM Icon

    Stem Education Coalition

    The mission of the Energy Department is to ensure America’s security and prosperity by addressing its energy, environmental and nuclear challenges through transformative science and technology solutions.

     
c
Compose new post
j
Next post/Next comment
k
Previous post/Previous comment
r
Reply
e
Edit
o
Show/Hide comments
t
Go to top
l
Go to login
h
Show/Hide help
shift + esc
Cancel
%d bloggers like this: