Tagged: Exascale computing Toggle Comment Threads | Keyboard Shortcuts

  • richardmitnick 3:21 pm on July 11, 2018 Permalink | Reply
    Tags: , Exascale computing   

    From M I T Technology Review: “The US may have just pulled even with China in the race to build supercomputing’s next big thing” 

    MIT Technology Review
    From M.I.T Technology Review

    July 11, 2018
    Martin Giles

    1
    Ms. Tech

    The US may have just pulled even with China in the race to build supercomputing’s next big thing.

    The two countries are vying to create an exascale computer that could lead to significant advances in many scientific fields.

    There was much celebrating in America last month when the US Department of Energy unveiled Summit, the world’s fastest supercomputer.

    ORNL IBM AC922 SUMMIT supercomputer. Credit: Carlos Jones, Oak Ridge National Laboratory/U.S. Dept. of Energy

    Now the race is on to achieve the next significant milestone in processing power: exascale computing.

    This involves building a machine within the next few years that’s capable of a billion billion calculations per second, or one exaflop, which would make it five times faster than Summit (see chart). Every person on Earth would have to do a calculation every second of every day for just over four years to match what an exascale machine will be able to do in a flash.

    2
    Top500 / MIT Technology Review

    This phenomenal power will enable researchers to run massively complex simulations that spark advances in many fields, from climate science to genomics, renewable energy, and artificial intelligence. “Exascale computers are powerful scientific instruments, much like [particle] colliders or giant telescopes,” says Jack Dongarra, a supercomputing expert at the University of Tennessee.

    The machines will also be useful in industry, where they will be used for things like speeding up product design and identifying new materials. The military and intelligence agencies will be keen to get their hands on the computers, which will be used for national security applications, too.

    The race to hit the exascale milestone is part of a burgeoning competition for technological leadership between China and the US. (Japan and Europe are also working on their own computers; the Japanese hope to have a machine running in 2021 and the Europeans in 2023.)

    In 2015, China unveiled a plan to produce an exascale machine by the end of 2020, and multiple reports over the past year or so have suggested it’s on track to achieve its ambitious goal. But in an interview with MIT Technology Review, Depei Qian, a professor at Beihang University in Beijing who helps manage the country’s exascale effort, explained it could fall behind schedule. “I don’t know if we can still make it by the end of 2020,” he said. “There may be a year or half a year’s delay.”

    Teams in China have been working on three prototype exascale machines, two of which use homegrown chips derived from work on existing supercomputers the country has developed. The third uses licensed processor technology. Qian says that the pros and cons of each approach are still being evaluated, and that a call for proposals to build a fully functioning exascale computer has been pushed back.

    Given the huge challenges involved in creating such a powerful computer, timetables can easily slip, which could make an opening for the US. China’s initial goal forced the American government to accelerate its own road map and commit to delivering its first exascale computer in 2021, two years ahead of its original target. The American machine, called Aurora, is being developed for the Department of Energy’s Argonne National Laboratory in Illinois. Supercomputing company Cray is building the system for Argonne, and Intel is making chips for the machine.

    Depiction of ANL ALCF Cray Shasta Aurora supercomputer

    To boost supercomputers’ performance, engineers working on exascale systems around the world are using parallelism, which involves packing many thousands of chips into millions of processing units known as cores. Finding the best way to get all these to work in harmony requires time-consuming experimentation.

    Moving data between processors, and into and out of storage, also soaks up a lot of energy, which means the cost of operating a machine over its lifetime can exceed the cost of building it. The DoE has set an upper limit of 40 megawatts of power for an exascale computer, which would roughly translate into an electricity budget of $40 million a year.

    To lower power consumption, engineers are placing three-dimensional stacks of memory chips as close as possible to compute cores to reduce the distance data has to travel, explains Steve Scott, the chief technology officer of Cray. And they’re increasingly using flash memory, which uses less power than alternative systems such as disk storage. Reducing these power needs makes it cheaper to store data at various points during a calculation, and that saved data can help an exascale machine recover quickly if a glitch occurs.

    Such advances have helped the team behind Aurora. “We’re confident of [our] ability to deliver it in 2021,” says Scott.

    More US machines will follow. In April the DoE announced a request for proposals worth up to $1.8 billion for two more exascale computers to come online between 2021 and 2023. These are expected to cost $400 million to $600 million each, with the remaining money being used to upgrade Aurora or even create a follow-on machine.

    Both China and America are also funding work on software for exascale machines. China reportedly has teams working on some 15 application areas, while in the US, teams are working on 25, including applications in fields such as astrophysics and materials science. “Our goal is to deliver as many breakthroughs as possible,” says Katherine Yelick, the associate director for computing sciences at Lawrence Berkeley National Laboratory, who is part of the leadership team coordinating the US initiative.

    While there’s plenty of national pride wrapped up in the race to get to exascale first, the work Yelick and other researchers are doing is a reminder that raw exascale computing power isn’t the true test of success here; what really matters is how well it’s harnessed to solve some of the world’s toughest problems.

    See the full article here .


    five-ways-keep-your-child-safe-school-shootings

    Please help promote STEM in your local schools.

    Stem Education Coalition

    The mission of MIT Technology Review is to equip its audiences with the intelligence to understand a world shaped by technology.

    Advertisements
     
  • richardmitnick 1:57 pm on July 7, 2018 Permalink | Reply
    Tags: , , , Exascale computing, ,   

    From MIT News: “Project to elucidate the structure of atomic nuclei at the femtoscale” 

    MIT News
    MIT Widget

    From MIT News

    July 6, 2018
    Scott Morley | Laboratory for Nuclear Science

    1
    The image is an artist’s visualization of a nucleus as studied in numerical simulations, created using DeepArt neural network visualization software. Image courtesy of the Laboratory for Nuclear Science.

    Laboratory for Nuclear Science project selected to explore machine learning for lattice quantum chromodynamics.

    The Argonne Leadership Computing Facility (ALCF), a U.S. Department of Energy (DOE) Office of Science User Facility, has selected 10 data science and machine learning projects for its Aurora Early Science Program (ESP). Set to be the nation’s first exascale system upon its expected 2021 arrival, Aurora will be capable of performing a quintillion calculations per second, making it 10 times more powerful than the fastest computer that currently exists.

    Depiction of ANL ALCF Cray Shasta Aurora supercomputer

    The Aurora ESP, which commenced with 10 simulation-based projects in 2017, is designed to prepare key applications, libraries, and infrastructure for the architecture and scale of the exascale supercomputer. Researchers in the Laboratory for Nuclear Science’s Center for Theoretical Physics have been awarded funding for one of the projects under the ESP. Associate professor of physics William Detmold, assistant professor of physics Phiala Shanahan, and principal research scientist Andrew Pochinsky will use new techniques developed by the group, coupling novel machine learning approaches and state-of-the-art nuclear physics tools, to study the structure of nuclei.

    Shanahan, who began as an assistant professor at MIT this month, says that the support and early access to frontier computing that the award provides will allow the group to study the possible interactions of dark matter particles with nuclei from our fundamental understanding of particle physics for the first time, providing critical input for experimental searches aiming to unravel the mysteries of dark matter while simultaneously giving insight into fundamental particle physics.

    “Machine learning coupled with the exascale computational power of Aurora will enable spectacular advances in many areas of science,” Detmold adds. “Combining machine learning to lattice quantum chromodynamics calculations of the strong interactions between the fundamental particles that make up protons and nuclei, our project will enable a new level of understanding of the femtoscale world.”

    See the full article here .


    five-ways-keep-your-child-safe-school-shootings
    Please help promote STEM in your local schools.


    Stem Education Coalition

    MIT Seal

    The mission of MIT is to advance knowledge and educate students in science, technology, and other areas of scholarship that will best serve the nation and the world in the twenty-first century. We seek to develop in each member of the MIT community the ability and passion to work wisely, creatively, and effectively for the betterment of humankind.

    MIT Campus

     
  • richardmitnick 11:18 am on June 3, 2018 Permalink | Reply
    Tags: , , Exascale computing, ,   

    From Science Node: “Full speed ahead” 

    Science Node bloc
    From Science Node

    23 May, 2018
    Kevin Jackson

    US Department of Energy recommits to the exascale race.

    1

    The US was once a leader in supercomputing, having created the first high-performance computer (HPC) in 1964. But as of November 2017, TOP500 ranked Titan, the fastest American-made supercomputer, only fifth on its list of the most powerful machines in the world. In contrast, China holds the first and second spots by a whopping margin.

    ORNL Cray Titan XK7 Supercomputer

    Sunway TaihuLight, China

    Tianhe-2 supercomputer China

    But it now looks like the US Department of Energy (DoE) is ready to commit to taking back those top spots. In a CNN opinion article, Secretary of Energy Rick Perry proclaims that “the future is in supercomputers,” and we at Science Node couldn’t agree more. To get a better understanding of the DoE’s plans, we sat down for a chat with Under Secretary for Science Paul Dabbar.

    Why is it important for the federal government to support HPC rather than leaving it to the private sector?

    A significant amount of the Office of Science and the rest of the DoE has had and will continue to have supercomputing needs. The Office of Science produces tremendous amounts of data like at Argonne, and all of our national labs produce data of increasing volume. Supercomputing is also needed in our National Nuclear Security Administration (NNSA) mission, which fulfills very important modeling needs for Department of Defense (DoD) applications.

    But to Secretary Perry’s point, we’re increasingly seeing a number of private sector organizations building their own supercomputers based on what we had developed and built a few generations ago that are now used for a broad range of commercial purposes.

    At the end of the day, we know that a secondary benefit of this push is that we’re providing the impetus for innovation within supercomputing.

    We assist the broader American economy by helping to support science and technology innovation within supercomputing.

    How are supercomputers used for national security?

    The NNSA arm, which is one of the three major arms of the three Under Secretaries here at the department, is our primary area of support for the nation’s defense. And as various testing treaties came into play over time, having the computing capacity to conduct proper testing and security of our stockpiled weapons was key. And that’s why if you look at our three exascale computers that we’re in the process of executing, two of them are on behalf of the Office of Science and one of them is on behalf of the NNSA.

    One of these three supercomputers is the Aurora exascale machine currently being built at Argonne National Laboratory, which Secretary Perry believes will be finished in 2021. Where did this timeline come from, and why Argonne?

    Argonne National Laboratory ALCF

    ANL ALCF Cetus IBM supercomputer

    ANL ALCF Theta Cray supercomputer

    ANL ALCF MIRA IBM Blue Gene Q supercomputer at the Argonne Leadership Computing Facility

    Depiction of ANL ALCF Cray Shasta Aurora supercomputer

    There was a group put together across different areas of DoE, primarily the Office of Science and NNSA. When we decided to execute on building the next wave of top global supercomputers, an internal consortium named the Collaboration of Oak Ridge, Argonne, and Livermore (CORAL) was formed.

    That consortium developed exactly how to fund the technologies, how to issue requests, and what the target capabilities for the machines should be. The 2021 timeline was based on the CORAL group, the labs, and the consortium in conjunction with the Department of Energy headquarters here, the Office of Advanced Computing, and ultimately talking with the suppliers.

    The reason Argonne was selected for the first machine was that they already have a leadership computing facility there. They have a long history of other machines of previous generations, and they were already in the process of building out an exascale machine. So they were already looking at architecture issues, talking with Intel and others on what could be accomplished, and taking a look at how they can build on what they already had in terms of their capabilities and physical plant and user facilities.

    Why now? What’s motivating the push for HPC excellence at this precise moment?

    A lot of this is driven by where the technology is and where the capabilities are for suppliers and the broader HPC market. We’re part of a constant dialogue with the Nvidias, Intels, IBMs, and Crays of the world in what we think is possible in terms of the next step in supercomputing.

    Why now? The technology is available now, and the need is there for us considering the large user facilities coming online across the whole of the national lab complex and the need for stronger computing power.

    The history of science, going back to the late 1800s and early 1900s, was about competition along strings of types of research, whether it was chemistry or physics. If you take any of the areas of science, including high-performance computing, anything that’s being done by anyone out there along any of these strings causes us all to move us along. However, we at the DoE believe America must and should be in the lead of scientific advances across all different areas, and certainly in the area of computing.

    See the full article here .


    five-ways-keep-your-child-safe-school-shootings

    Please help promote STEM in your local schools.

    Stem Education Coalition

    Science Node is an international weekly online publication that covers distributed computing and the research it enables.

    “We report on all aspects of distributed computing technology, such as grids and clouds. We also regularly feature articles on distributed computing-enabled research in a large variety of disciplines, including physics, biology, sociology, earth sciences, archaeology, medicine, disaster management, crime, and art. (Note that we do not cover stories that are purely about commercial technology.)

    In its current incarnation, Science Node is also an online destination where you can host a profile and blog, and find and disseminate announcements and information about events, deadlines, and jobs. In the near future it will also be a place where you can network with colleagues.

    You can read Science Node via our homepage, RSS, or email. For the complete iSGTW experience, sign up for an account or log in with OpenID and manage your email subscription from your account preferences. If you do not wish to access the website’s features, you can just subscribe to the weekly email.”

     
  • richardmitnick 9:26 am on January 10, 2018 Permalink | Reply
    Tags: , , Exascale computing,   

    From HPC Wire: “Momentum Builds for US Exascale” 

    HPC Wire

    January 9, 2018
    Alex R. Larzelere

    1

    2018 looks to be a great year for the U.S. exascale program. The last several months of 2017 revealed a number of important developments that help put the U.S. quest for exascale on a solid foundation. In my last article, I provided a description of the elements of the High Performance Computing (HPC) ecosystem and its importance for advancing and sustaining this strategically important technology. It is good to report that the U.S. exascale program seems to be hitting the full range of ecosystem elements.

    As a reminder, the National Strategic Computing Initiative (NSCI) assigned the U.S. Department of Energy (DOE) Office of Science (SC) and the National Nuclear Security Administration (NNSA) to execute a joint program to deliver capable exascale computing that emphasizes sustained performance on relevant applications and analytic computing to support their missions. The overall DOE program is known as the Exascale Computing Initiative (ECI) and is funded by the SC Advanced Scientific Computing Research (ASCR) program and the NNSA Advanced Simulation and Computing (ASC) program.

    Elements of the ECI include the procurement of exascale class systems and the facility investments in site preparations and non-recurring engineering. Also, ECI includes the Exascale Computing Project (ECP) that will conduct the Research and Development (R&D) in the areas of middleware (software stack), applications, and hardware to ensure that exascale systems will be productively usable to address Office of Science and NNSA missions.

    In the area of hardware – the last part of 2017 revealed a number of important developments. First and most visible, is the initial installation of the SC Summit system at Oak Ridge National Laboratory (ORNL) and the NNSA Sierra system at Lawrence Livermore National Laboratory (LLNL).

    ORNL IBM Summit Supercomputer

    LLNL IBM Sierra ATS2 supercomputer

    Both systems are being built by IBM using Power9 processors with Nvidia GPU co-processors. The machines will have two Power9 CPUs per system board and will use a Mellenox InfinBand interconnection network.

    Beyond that, the architecture of each machine is slightly different. The ORNL Summit machine will use six Nvidia Volta GPUs per two Power9 CPUs on a system board and will use NVLink to connect to 512 GB of memory. The Summit machine will use a combination of air and water cooling. The LLNL Sierra machine will use four Nvidia Voltas and 256 GB of memory connected with the two Power9 CPUs per board. The Sierra machine will use only air cooling. As was reported by HPCwire in November 2017, the peak performance of the Summit machine will be about 200 petaflops and the Sierra machine is expected to be about 125 petaflops.

    Installation of both the Summit and Sierra systems is currently underway with about 279 racks (without system boards) and the interconnection network already installed at each lab. Now that IBM has formally released the Power9 processors, the racks will soon start being populated with the boards that contain the CPUs, GPUs and memory. Once that is completed, the labs will start their acceptance testing, which is expected to be finished later in 2018.

    Another important piece of news about the DOE exascale program is the clarification of the status of the Argonne National Laboratory (ANL) Aurora machine.

    Depiction of ANL ALCF Cray Shasta Aurora supercomputer

    This system was part of the collaborative CORAL procurement that also selected the Sierra and Summit machines. The Aurora system is being manufactured by Intel with Cray Inc. acting as the system integrator. The machine was originally scheduled to be an approximately 180 peak petaflops system using the Knights Hill third generation Phi processors. However, during SC17, we learned that Intel is removing the Knights Hill chip from its roadmap. This explains the reason why during the September ASCR Advisory Committee (ASCAC) meeting, Barb Helland, the Associate Director of the ASCR office, announced that the Aurora system would be delayed to 2021 and upgraded to 1,000 petaflops (aka 1 exaflops).

    The full details of the revised Aurora system are still under wraps. We have learned that it is going to use “novel” processor technologies, but exactly what that means is unclear. The ASCR program subjected the new Aurora design to an independent outside review. It found, “The hardware choices/design within the node is extremely well thought through. Early projections suggest that the system will support a broad workload.” The review committee even suggested that, “The system as presented is exciting with many novel technology choices that can change the way computing is done.” The Aurora system is in the process of being “re-baselined” by the DOE. Hopefully, once that is complete, we will get a better understanding of the meaning of “novel” technologies. If things go as expected, the changes to Aurora will allow the U.S. to achieve exascale by 2021.

    An important, but sometimes overlooked, aspect of the U.S. exascale program is the number of computing systems that are being procured, tested and optimized by the ASCR and ASC programs as part of the buildup to exascale. Other computing systems involved with “pre-exascale” systems include the 8.6 petaflops Mira computer at ANL and the 14 petaflops Cori system at Lawrence Berkeley National Lab (LBNL).

    ANL ALCF MIRA IBM Blue Gene Q supercomputer at the Argonne Leadership Computing Facility

    NERSC Cray Cori II supercomputer at NERSC at LBNL

    The NNSA also has the 14.1 petaflops Trinity system at Los Alamos National Lab (LANL). Up to 20 percent of these precursor machines will serve as testbeds to enable computing science R&D needed to ensure that the U.S. exascale systems will be able to productively address important national security and discovery science objectives.

    The last, but certainly not least, bit of hardware news is that the ASCR and ASC programs are expected to start their next computer system procurement processes in early 2018. During her presentation to the U.S. Consortium for the Advancement of Supercomputing (USCAS), Barb Helland told the group that she expects that the Request for Proposals (RFP) will soon be released for the follow-ons to the Summit and Sierra systems. These systems, to be delivered in the 2021-2023 timeframe, are expected to be provide in excess of exaFLOP/s performance. The procurement process to be used will be similar to the CORAL procurement and will be a collaboration between the DOE-SC ASCR and NNSA ASC programs. The ORNL exascale system will be called Frontier and the LLNL system will be known as El Capitan.

    2017 also saw significant developments for the people element of the U.S HPC ecosystem. As was previously reported, at last September’s ASCAC meeting, Paul Messina announced that he would be stepping down as the ECP Director on October 1st. Doug Kothe, who was previously the applications development lead, was announced as the new ECP Director. Upon taking the Director job, Kothe with his deputy, Stephen Lee of LANL, instituted a process to review the organization and management of the ECP. At the December ASCAC conference call, Doug reported that the review had been completed and resulted in a number of changes. This included paring down ECP from five to four components (applications development, software technology, hardware and integration, and project management). He also reported that ECP has implemented a more structured management approach that includes a revised work breakdown structure (WBS) and additional milestones, new key performance parameters and risk management approaches. Finally, the new ECP Director reported that they had established an Extended Leadership Team with a number of new faces.

    Another important, element of the HPC ecosystem are the people doing the R&D and other work need to keep the ecosystem going. The DOE ECI involves a huge number of people. Last year, there were about 500 researchers who attended the ECP Principle Investigator meeting and there are many more involved in other DOE/NNSA programs and from industry. The ASCR and ASC programs are involved with a number of programs to educate and train future members of the HPC ecosystem. Such programs are the ASCR and ASC co-funded Computational Science Graduate Fellowship (CSGF) and the Early Career Research Program. The NNSA offers similar opportunities. Both the ASCR and ASC programs continue to coordinate with National Science Foundation educational programs to ensure that America’s top computational science talent continues to flow into the ecosystem.

    Finally, in addition to people and hardware, the U.S. program continues to develop the software stack (aka middleware) to develop end users’ applications to ensure that exascale will be used productively. Doug Kothe reported that ECP has adopted standard Software Development Kits. These SDKs are designed to support the goal of building a comprehensive, coherent software stack that enables application developers to productively write highly parallel applications that effectively target diverse exascale architectures. Kothe also reported that ECP is making good progress in developing applications software. This includes the implementation of innovative approaches that include Machine Learning to utilize the GPUs that are part of the future exascale computers.

    All in all – the last several months of 2017 have set the stage for a very exciting 2018 for the U.S. exascale program. It has been about 5 years since the ORNL Titan supercomputer came onto the stage at #1 on the TOP500 list.

    ORNL Cray XK7 Titan Supercomputer

    Over that time, other more powerful DOE computers have come online (Trinity, Cori, etc.) but they were overshadowed by Chinese and European systems.

    LANL Cray XC30 Trinity supercomputer

    It remains unclear whether or not the upcoming exascale systems will put the U.S. back on the top of the supercomputing world. However, the recent developments help to reassure the country is not going to give up its computing leadership position without a fight. That is great news because for more than 60 years, the U.S. has sought leadership in high performance computing for the strategic value it provides in the areas of national security, discovery science, energy security, and economic competitiveness.

    See the full article here .

    Please help promote STEM in your local schools.

    STEM Icon

    Stem Education Coalition

    HPCwire is the #1 news and information resource covering the fastest computers in the world and the people who run them. With a legacy dating back to 1987, HPC has enjoyed a legacy of world-class editorial and topnotch journalism, making it the portal of choice selected by science, technology and business professionals interested in high performance and data-intensive computing. For topics ranging from late-breaking news and emerging technologies in HPC, to new trends, expert analysis, and exclusive features, HPCwire delivers it all and remains the HPC communities’ most reliable and trusted resource. Don’t miss a thing – subscribe now to HPCwire’s weekly newsletter recapping the previous week’s HPC news, analysis and information at: http://www.hpcwire.com.

     
  • richardmitnick 11:23 am on October 9, 2017 Permalink | Reply
    Tags: , , , , Exascale computing, , ,   

    From Science Node: “US Coalesces Plans for First Exascale Supercomputer: Aurora in 2021” 

    Science Node bloc
    Science Node

    September 27, 2017
    Tiffany Trader

    ANL ALCF Cray Aurora supercomputer

    At the Advanced Scientific Computing Advisory Committee (ASCAC) meeting, in Arlington, Va., yesterday (Sept. 26), it was revealed that the “Aurora” supercomputer is on track to be the United States’ first exascale system. Aurora, originally named as the third pillar of the CORAL “pre-exascale” project, will still be built by Intel and Cray for Argonne National Laboratory, but the delivery date has shifted from 2018 to 2021 and target capability has been expanded from 180 petaflops to 1,000 petaflops (1 exaflop).

    2

    The fate of the Argonne Aurora “CORAL” supercomputer has been in limbo since the system failed to make it into the U.S. DOE budget request, while the same budget proposal called for an exascale machine “of novel architecture” to be deployed at Argonne in 2021.

    Until now, the only official word from the U.S. Exascale Computing Project was that Aurora was being “reviewed for changes and would go forward under a different timeline.”

    Officially, the contract has been “extended,” and not cancelled, but the fact remains that the goal of the Collaboration of Oak Ridge, Argonne, and Lawrence Livermore (CORAL) initiative to stand up two distinct pre-exascale architectures was not met.

    According to sources we spoke with, a number of people at the DOE are not pleased with the Intel/Cray (Intel is the prime contractor, Cray is the subcontractor) partnership. It’s understood that the two companies could not deliver on the 180-200 petaflops system by next year, as the original contract called for. Now Intel/Cray will push forward with an exascale system that is some 50x larger than any they have stood up.

    It’s our understanding that the cancellation of Aurora is not a DOE budgetary measure as has been speculated, and that the DOE and Argonne wanted Aurora. Although it was referred to as an “interim,” or “pre-exascale” machine, the scientific and research community was counting on that system, was eager to begin using it, and they regarded it as a valuable system in its own right. The non-delivery is regarded as disruptive to the scientific/research communities.

    Another question we have is that since Intel/Cray failed to deliver Aurora, and have moved on to a larger exascale system contract, why hasn’t their original CORAL contract been cancelled and put out again to bid?

    With increased global competitiveness, it seems that the DOE stakeholders did not want to further delay the non-IBM/Nvidia side of the exascale track. Conceivably, they could have done a rebid for the Aurora system, but that would leave them with an even bigger gap if they had to spin up a new vendor/system supplier to replace Intel and Cray.

    Starting the bidding process over again would delay progress toward exascale – and it might even have been the death knell for exascale by 2021, but Intel and Cray now have a giant performance leap to make and three years to do it. There is an open question on the processor front as the retooled Aurora will not be powered by Phi/Knights Hill as originally proposed.

    These events beg the question regarding the IBM-led effort and whether IBM/Nvidia/Mellanox are looking very good by comparison. The other CORAL thrusts — Summit at Oak Ridge and Sierra at Lawrence Livermore — are on track, with Summit several weeks ahead of Sierra, although it is looking like neither will make the cut-off for entry onto the November Top500 list as many had speculated.

    ORNL IBM Summit supercomputer depiction

    LLNL IBM Sierra supercomputer

    We reached out to representatives from Cray, Intel and the Exascale Computing Project (ECP) seeking official comment on the revised Aurora contract. Cray and Intel declined to comment and we did not hear back from ECP by press time. We will update the story as we learn more.

    See the full article here .

    Please help promote STEM in your local schools.
    STEM Icon

    Stem Education Coalition

    Science Node is an international weekly online publication that covers distributed computing and the research it enables.

    “We report on all aspects of distributed computing technology, such as grids and clouds. We also regularly feature articles on distributed computing-enabled research in a large variety of disciplines, including physics, biology, sociology, earth sciences, archaeology, medicine, disaster management, crime, and art. (Note that we do not cover stories that are purely about commercial technology.)

    In its current incarnation, Science Node is also an online destination where you can host a profile and blog, and find and disseminate announcements and information about events, deadlines, and jobs. In the near future it will also be a place where you can network with colleagues.

    You can read Science Node via our homepage, RSS, or email. For the complete iSGTW experience, sign up for an account or log in with OpenID and manage your email subscription from your account preferences. If you do not wish to access the website’s features, you can just subscribe to the weekly email.”

     
  • richardmitnick 5:54 am on July 1, 2017 Permalink | Reply
    Tags: , Exascale computing, , ,   

    From LLNL: “National labs, industry partners prepare for new era of computing through Centers of Excellence” 


    Lawrence Livermore National Laboratory

    June 30, 2017
    Jeremy Thomas
    thomas244@llnl.gov
    925-422-5539

    1
    IBM employees and Lab code and application developers held a “Hackathon” event in June to work on coding challenges for a predecessor system to the Sierra supercomputer. Through the ongoing Centers of Excellence (CoE) program, employees from IBM and NVIDIA have been on-site to help LLNL developers transition applications to the Sierra system, which will have a completely different architecture than the Lab has had before. Photo by Jeremy Thomas/LLNL

    The Department of Energy’s (link is external) drive toward the next generation of supercomputers, “exascale” machines capable of more than a quintillion (1018) calculations per second, isn’t simply to boast about having the fastest processing machines on the planet. At Lawrence Livermore National Laboratory (LLNL) and other DOE national labs, these systems will play a vital role in the National Nuclear Security Administration’s (NNSA) core mission of ensuring the nation’s nuclear stockpile in the absence of underground testing.

    The driving force behind faster, more robust computing power is the need for simulation and codes that are higher resolution, increasingly predictive and incorporate more complex physics. It’s an evolution that is changing the way the national labs’ application and code developers are approaching design. To aid in the transition and prepare researchers for pre-exascale and exascale systems, LLNL has brought experts from IBM (link is external) and NVIDIA together with Lab computer scientists in a Center of Excellence (CoE), a co-design strategy born out of the need for vendors and government to work together to optimize emerging supercomputing systems.

    “There are disruptive machines coming down the pike that are changing things out from under us,” said Rob Neely, an LLNL computer scientist and Weapon Simulation & Computing Program coordinator for Computing Environments. “We need a lot of time to prepare; these applications need insight, and who better to help us with that than the companies who will build the machines? The idea is that when a machine gets here, we’re not caught flat-footed. We want to hit the ground running right away.”

    While LLNL’s exascale system isn’t scheduled for delivery until 2023, Sierra, the Laboratory’s pre-exascale system, is on track to begin installation this fall and will begin running science applications at full machine scale by early next spring.

    LLNL IBM Sierra supercomputer

    Built by IBM and NVIDIA, Sierra will have about six times more computing power than LLNL’s current behemoth, Sequoia.

    2
    Sequoia at LLNL

    The Sierra system is unique to the Lab in that it’s made up of two kinds of hardware — IBM CPUs and NVIDIA GPUs — that have different memory locations associated with each type of computing device and a programming model more complex than LLNL scientists have programmed to in the past. In the meantime, Lab scientists are receiving guidance from experts from the two companies, utilizing a small predecessor system that is already running some components and has some of the technological features that Sierra will have.

    LLNL’s Center of Excellence, which began in 2014, involves about a half dozen IBM and NVIDIA personnel on-site, and a number of remote collaborators who work with Lab developers. The team is on hand to answer any questions Lab computer scientists have, educate LLNL personnel to use best practices in coding hybrid systems, develop strategies for optimizations, debug and advise on global code restructuring that often is needed to obtain performance. The CoE is a symbiotic relationship — LLNL scientists get a feel for how Sierra will operate, and IBM and NVIDIA gain better insight into what the Lab’s needs are and what the machines they build are capable of.

    “We see how the systems we design and develop are being used and how effective they can be,” said IBM research staff member Leopold Grinberg, who works on the LLNL site. “You really need to get into the mind of the developer to understand how they use the tools. To sit next to the developers’ seats and let them drive, to observe them, gives us a good idea of what we are doing right and what needs to be improved. Our experts have an intimate knowledge of how the system works, and having them side-by-side with Lab employees is very useful.”

    Sierra, Grinberg explained, will use a completely different system architecture than what has been used before at LLNL. It’s not only faster than any machine the Lab has had, it also has different tools built into the compilers and programming models. In some cases, the changes developers need to make are substantial, requiring restructuring hundreds or thousands of lines of code. Through the CoE, Grinberg said he’s learning more about how the system will be used for production scientific applications.

    “It’s a constant process of learning for everybody,” Grinberg said. “It’s fun, it’s challenging. We gather the knowledge and it’s also our job to distribute it. There’s always some knowledge to be shared. We need to bring the experience we have with heterogenous systems and emerging programming models to the lab, and help people generate updated codes or find out what can be kept as is to optimize the system we build. It’s been very fruitful for both parties.”

    The CoE strategy is additionally being implemented at Oak Ridge National Laboratory, which is bringing in a heterogenous system of its own called Summit. Other CoE programs are in place at Los Alamos and Lawrence Berkeley national laboratories. Each CoE has a similar goal of preparing computational scientists with the tools they will need to utilize pre-exascale and exascale systems. Since Livermore is new to using GPUs for the bulk of computing power, the Sierra architecture places a heavy emphasis on figuring out which sections of a multi-physics application are the most performance-critical, and the code restructuring that must take place to most effectively use the system.

    “Livermore and Oak Ridge scientists are really pushing the boundaries of the scale of these GPU-based systems,” said Max Katz, a solutions architect at NVIDIA who spends four days a week at LLNL as a technical adviser. “Part of our motivation is to understand machine learning and how to make it possible to merge high-performance computing with the applications demanded by industry. The CoE is essential because it’s difficult for any one party to predict how these CPU/GPU systems will behave together. Each one of us brings in expertise and by sharing information, it makes us all more well-rounded. It’s a great opportunity.”

    In fact, the opportunity was so compelling that in 2016 the CoE was augmented with a three-year institutional component (dubbed the Institutional Center of Excellence, or iCE) to ensure that other mission critical efforts at the Laboratory also could participate. This has added nine applications development efforts, including one in data science, and expanded the number of IBM and NVIDIA personnel. By working together cooperatively, many more types of applications can be explored, performance solutions developed and shared among all the greater CoE code teams.

    “At the end of the iCOE project, the real value will be not only that some important institutional applications run well, but that every directorate at LLNL will have trained staff with expertise in using Sierra, and we’ll have documented lessons learned to help train others,” said Bert Still, leader for Application Strategy (Livermore Computing).

    Steve Rennich, a senior HPC developer-technology engineer with NVIDIA, visits the Lab once a week to help LLNL scientists port mission-critical applications optimized for CPUs over to NVIDIA GPUs, which have an order of magnitude greater computing power than CPUs. Besides writing bug-free code, Rennich said, the goal is to improve performance enough to meet the Lab’s considerable computing requirements.

    “The challenge is they’re fairly complex codes so to do it correctly takes a fair amount of attention to detail,” Rennich said. “It’s about making sure the new system can handle as large a model as the Lab needs. These are colossal machines, so when you create applications at this scale, it’s like building a race car. To take advantage of this increase in performance, you need all the pieces to fit and work together.”

    Current plans are to continue the existing Center of Excellence at LLNL at least into 2019, when Sierra is fully operational. Until then, having experts working shoulder-to-shoulder with Lab developers to write code will be a huge benefit to all parties, said LLNL’s Neely, who wants the collaboration to publish their discoveries to share it with the broader computing community.

    “We’re focused on the issue at hand, and moving things toward getting ready for these machines is hugely beneficial,” Neely said. “These are very large applications developed over decades, so ultimately it’s the code teams that need to be ready to take this over. We’ve got to make this work because we need to ensure the safety and performance of the U.S. stockpile in the absence of nuclear testing. We’ve got the right teams and people to pull this off.”

    See the full article here .

    Please help promote STEM in your local schools.

    STEM Icon

    Stem Education Coalition
    LLNL Campus

    Operated by Lawrence Livermore National Security, LLC, for the Department of Energy’s National Nuclear Security
    Administration
    DOE Seal
    NNSA

     
  • richardmitnick 10:08 am on April 26, 2017 Permalink | Reply
    Tags: , , Building the Bridge to Exascale, , Exascale computing, , ,   

    From OLCF at ORNL: “Building the Bridge to Exascale” 

    i1

    Oak Ridge National Laboratory

    OLCF

    April 18, 2017 [Where was this hiding?]
    Katie Elyce Jones

    Building an exascale computer—a machine that could solve complex science problems at least 50 times faster than today’s leading supercomputers—is a national effort.

    To oversee the rapid research and development (R&D) of an exascale system by 2023, the US Department of Energy (DOE) created the Exascale Computing Project (ECP) last year. The project brings together experts in high-performance computing from six DOE laboratories with the nation’s most powerful supercomputers—including Oak Ridge, Argonne, Lawrence Berkeley, Lawrence Livermore, Los Alamos, and Sandia—and project members work closely with computing facility staff from the member laboratories.

    ORNL IBM Summit supercomputer depiction.

    At the Exascale Computing Project’s (ECP’s) annual meeting in February 2017, Oak Ridge Leadership Computing Facility (OLCF) staff discussed OLCF resources that could be leveraged for ECP research and development, including the facility’s next flagship supercomputer, Summit, expected to go online in 2018.

    At the first ECP annual meeting, held January 29–February 3 in Knoxville, Tennessee, about 450 project members convened to discuss collaboration in breakout sessions focused on project organization and upcoming R&D milestones for applications, software, hardware, and exascale systems focus areas. During facility-focused sessions, senior staff from the Oak Ridge Leadership Computing Facility (OLCF) met with ECP members to discuss opportunities for the project to use current petascale supercomputers, test beds, prototypes, and other facility resources for exascale R&D. The OLCF is a DOE Office of Science User Facility located at DOE’s Oak Ridge National Laboratory (ORNL).

    “The ECP’s fundamental responsibilities are to provide R&D to build exascale machines more efficiently and to prepare the applications and software that will run on them,” said OLCF Deputy Project Director Justin Whitt. “The facilities’ responsibilities are to acquire, deploy, and operate the machines. We are currently putting advanced test beds and prototypes in place to evaluate technologies and enable R&D efforts like those in the ECP.”

    ORNL has a unique connection to the ECP. The Tennessee-based laboratory is the location of the project office that manages collaboration within the ECP and among its facility partners. ORNL’s Laboratory Director Thom Mason delivered the opening talk at the conference, highlighting the need for coordination in a project of this scope.

    On behalf of facility staff, Mark Fahey, director of operations at the Argonne Leadership Computing Facility, presented the latest delivery and deployment plans for upcoming computing resources during a plenary session. From the OLCF, Project Director Buddy Bland and Director of Science Jack Wells provided a timeline for the availability of Summit, OLCF’s next petascale supercomputer, which is expected to go online in 2018; it will be at least 5 times more powerful than the OLCF’s 27-petaflop Titan supercomputer.

    ORNL Cray XK7 Titan Supercomputer.

    “Exascale hardware won’t be around for several more years,” Wells said. “The ECP will need access to Titan, Summit, and other leadership computers to do the work that gets us to exascale.”

    Wells said he was able to highlight the spring 2017 call for Innovative and Novel Computational Impact on Theory and Experiment, or INCITE, proposals, which will give 2-year projects the first opportunity for computing time on Summit. OLCF staff also introduced a handful of computing architecture test beds—including the developmental environment for Summit known as Summitdev, NVIDIA’s deep learning and accelerated analytics system DGX-1, an experimental cluster of ARM 64-bit compute nodes, and a Cray XC40 cluster of 168 nodes known as Percival—that are now available for OLCF users.

    In addition to leveraging facility resources for R&D, the ECP must understand the future needs of facilities to design an exascale system that is ready for rigorous computational science simulations. Facilities staff can offer insight about the level of performance researchers will expect from science applications on exascale systems and estimate the amount of space and electrical power that will be available in the 2023 timeframe.

    “Getting to capable exascale systems will require careful coordination between the ECP and the user facilities,” Whitt said.

    One important collaboration so far was the development of a request for information, or RFI, for exascale R&D that the ECP released in February to industry vendors. The RFI enables the ECP to evaluate potential software and hardware technologies for exascale systems—a step in the R&D process that facilities often undertake. Facilities will later release requests for proposals when they are ready to begin building exascale systems

    See the full article here .

    Please help promote STEM in your local schools.

    STEM Icon

    Stem Education Coalition

    ORNL is managed by UT-Battelle for the Department of Energy’s Office of Science. DOE’s Office of Science is the single largest supporter of basic research in the physical sciences in the United States, and is working to address some of the most pressing challenges of our time.

    i2

    The Oak Ridge Leadership Computing Facility (OLCF) was established at Oak Ridge National Laboratory in 2004 with the mission of accelerating scientific discovery and engineering progress by providing outstanding computing and data management resources to high-priority research and development projects.

    ORNL’s supercomputing program has grown from humble beginnings to deliver some of the most powerful systems in the world. On the way, it has helped researchers deliver practical breakthroughs and new scientific knowledge in climate, materials, nuclear science, and a wide range of other disciplines.

    The OLCF delivered on that original promise in 2008, when its Cray XT “Jaguar” system ran the first scientific applications to exceed 1,000 trillion calculations a second (1 petaflop). Since then, the OLCF has continued to expand the limits of computing power, unveiling Titan in 2013, which is capable of 27 petaflops.


    ORNL Cray XK7 Titan Supercomputer

    Titan is one of the first hybrid architecture systems—a combination of graphics processing units (GPUs), and the more conventional central processing units (CPUs) that have served as number crunchers in computers for decades. The parallel structure of GPUs makes them uniquely suited to process an enormous number of simple computations quickly, while CPUs are capable of tackling more sophisticated computational algorithms. The complimentary combination of CPUs and GPUs allow Titan to reach its peak performance.

    The OLCF gives the world’s most advanced computational researchers an opportunity to tackle problems that would be unthinkable on other systems. The facility welcomes investigators from universities, government agencies, and industry who are prepared to perform breakthrough research in climate, materials, alternative energy sources and energy storage, chemistry, nuclear physics, astrophysics, quantum mechanics, and the gamut of scientific inquiry. Because it is a unique resource, the OLCF focuses on the most ambitious research projects—projects that provide important new knowledge or enable important new technologies.

     
  • richardmitnick 1:17 pm on April 4, 2017 Permalink | Reply
    Tags: Exascale computing, ,   

    From OLCF via TheNextPlatform: “Machine Learning, Analytics Play Growing Role in US Exascale Efforts” 

    i1

    Oak Ridge National Laboratory

    OLCF

    1

    TheNextPlatform

    April 4, 2017
    Jeffrey Burt

    Exascale computing promises to bring significant changes to both the high-performance computing space and eventually enterprise datacenter infrastructures.

    The systems, which are being developed in multiple countries around the globe, promise 50 times the performance of current 20 petaflop-capable systems that are now among the fastest in the world, and that bring corresponding improvements in such areas as energy efficiency and physical footprint. The systems need to be powerful run the increasingly complex applications being used by engineers and scientists, but they can’t be so expensive to acquire or run that only a handful of organizations can use them.

    At the same time, the emergence of high-level data analytics and machine learning is forcing some changes in the exascale efforts in the United States, changes that play a role in everything from the software stacks that are being developed for the systems to the competition with Chinese companies that also are aggressively pursuing exascale computing. During a talk last week at the OpenFabrics Workshop in Austin, Texas, Al Geist, from the Oak Ridge National Laboratory and CTO of the Exascale Computing Project (ECP), outlined the work the ECP is doing to develop exascale-capable systems within the next few years. Throughout his talk, Geist also mentioned that over the past 18 months, the emergence of data analytics and machine learning in the mainstream has expanded the scientists’ thoughts on what exascale computing will entail, both for HPC as well as enterprises.

    “In the future, there will be more and more drive to be able to have a machine that can solve a wider breadth of problems … that would require machine learning to be able to do the analysis on the fly inside the computer rather than having it be writer out to disk and analyzed later,” Geist said.

    The amount of data being generated continues to increase on a massive scale, driven by everything from the proliferation of mobile devices to the rapid adoption of cloud computing. HPC organizations and enterprises are looking for ways to collect, store and analyze this data in near real-time to be able to make immediate research and businesses decisions. Machine learning and artificial intelligence (AI) are increasingly being used to help accelerate the collection and analysis of the data. In addition, AI and machine learning are at the heart of many of emerging fields of technology, from new cybersecurity techniques to such systems as self-driving cars.

    The ECP is taking the growing role of data analytics and machine learning into consideration in the work it is doing, Geist said. That growing role can be seen in the applications being developed for exascale computing. For example, the application development effort touches on a range of areas, from climate and chemistry to genomics, seismology and cosmology. There also is a project underway to develop applications for cancer research and prevention, and increasingly that work includes the use of data sciences and machine learning.

    2

    In addition, ECP – which was launched in 2016 – originally had created four application develop co-design centers with relatively narrow focuses, such as efficient exascale discretizations and particle applications. One regarding online data analysis and reduction touched on the rising data sciences effort in the tech industry. In recent months, the program added a fifth co-design center, this one targeting graph and combinatorial methods for enabling exascale applications. The new center was created more specifically to deal with the challenges presented by data analytics and machine learning, he said.

    In a more general sense, both of those emerging technologies also have an impact on the types of systems that the ECP is seeking from vendors like IBM, Intel and Nvidia, and on the growing competition with China in the exascale race. The ECP is looking for systems that not only are capable of exascale-level computing, but also that can be used by a wide range of organizations, with the idea that the innovations developed for and used in HPC environments will cascade down into the enterprise and commodity machines. Geist said they should be usable for a broad array of users, not just a small number of “hero programmers.” Given that, the ECP is looking for vendors to develop systems that can meet the various requirements laid out in the program, such as enabling extreme parallelism, creating new memory and storage technologies that can handle the scaling, high reliability and energy consumption of 20 to 30 megawatts.

    However, that doesn’t necessarily mean creating radical designs or highly advanced or novel architectures. If the vendors can create exascale-capable systems without resorting to radical solutions, “that would be fine,” Geist said. “In fact, we’d probably prefer that.” At the same time, ECP officials understand that to develop the first exascale-capable computer by 2021, as planned, it will take some novel approaches to design and architecture. The hope is that by the time the next such systems follow in 2023, the need for such radical approaches won’t be necessary.

    “We’re not trying to build a stunt machine,” Geist said, adding that the goal for these systems is not just how many FLOPs they can produce, but how much science they can produce. “We want to build something that’s going to be useful for the nation and for science.”

    Such demands are a key differentiator in how the United States and China are approaching exascale computing. China already has three exascale projects underway and a prototype called the Tianhe-3 that is scheduled to be ready by 2018. Much of the talk about China’s efforts has been about the amount of money the Chinese government is investing in the projects. At the same time, China is not as limited by legacy technologies as is the United States, Geist said.

    “They can build a chip that is very much geared toward very low energy and high performance without having to worry about legacy apps, or does it run in a smartphone or does it run in a server market,” he said. “They can build a one-off chip.”

    For U.S. vendors, they have to build systems that can run a broad range of new and legacy applications, and components that can be used in other systems. They have to be able to handle myriad workloads, which is why the ECP expanded its efforts in the areas of data analytics and machine learning. These are increasingly important technologies that will have wide application in HPC and enterprise computing as well as consumer devices, he said. Vendors know that, and they have to take it into account when thinking about innovations around exascale computing.

    The application of emerging technologies will be critical in a wide range of technology areas, so innovations in architectures need to be able to address the needs of both the supercomputing and enterprise worlds. For example, in the HPC space, organizations are turning to machine learning to accelerate the simulation workloads they’re running for such tasks as quality assurance, Geist said.

    “In the United States, it’s really important to the health of the ecosystem,” Geist said. “If you’re only going to sell two machines a year or three machines a year, you can’t make a business out of that and you’ll go out of business. Therefore, those chips and those technologies need to be expanded into new markets. This is what the U.S. has to struggle with, to make sure we can meet the requirements of this high-performance level that we want to get to while still meeting the needs of all the other areas. This is why we see this expansion into data analytics and machine learning, which seems to have a much bigger market outside of just the HPC world.”

    See the full article here .

    Please help promote STEM in your local schools.

    STEM Icon

    Stem Education Coalition

    ORNL is managed by UT-Battelle for the Department of Energy’s Office of Science. DOE’s Office of Science is the single largest supporter of basic research in the physical sciences in the United States, and is working to address some of the most pressing challenges of our time.

    i2

    The Oak Ridge Leadership Computing Facility (OLCF) was established at Oak Ridge National Laboratory in 2004 with the mission of accelerating scientific discovery and engineering progress by providing outstanding computing and data management resources to high-priority research and development projects.

    ORNL’s supercomputing program has grown from humble beginnings to deliver some of the most powerful systems in the world. On the way, it has helped researchers deliver practical breakthroughs and new scientific knowledge in climate, materials, nuclear science, and a wide range of other disciplines.

    The OLCF delivered on that original promise in 2008, when its Cray XT “Jaguar” system ran the first scientific applications to exceed 1,000 trillion calculations a second (1 petaflop). Since then, the OLCF has continued to expand the limits of computing power, unveiling Titan in 2013, which is capable of 27 petaflops.


    ORNL Cray XK7 Titan Supercomputer

    Titan is one of the first hybrid architecture systems—a combination of graphics processing units (GPUs), and the more conventional central processing units (CPUs) that have served as number crunchers in computers for decades. The parallel structure of GPUs makes them uniquely suited to process an enormous number of simple computations quickly, while CPUs are capable of tackling more sophisticated computational algorithms. The complimentary combination of CPUs and GPUs allow Titan to reach its peak performance.

    The OLCF gives the world’s most advanced computational researchers an opportunity to tackle problems that would be unthinkable on other systems. The facility welcomes investigators from universities, government agencies, and industry who are prepared to perform breakthrough research in climate, materials, alternative energy sources and energy storage, chemistry, nuclear physics, astrophysics, quantum mechanics, and the gamut of scientific inquiry. Because it is a unique resource, the OLCF focuses on the most ambitious research projects—projects that provide important new knowledge or enable important new technologies.

     
  • richardmitnick 1:37 pm on January 12, 2017 Permalink | Reply
    Tags: Argo project, , , Exascale computing, Hobbes project, , , XPRESS project   

    From ASCRDiscovery via D.O.E. “Upscale computing” 

    DOE Main

    Department of Energy

    ASCRDiscovery

    ASCRDiscovery

    January 2017
    No writer credit

    National labs lead the push for operating systems that let applications run at exascale.

    2
    Image courtesy of Sandia National Laboratories.

    For high-performance computing (HPC) systems to reach exascale – a billion billion calculations per second – hardware and software must cooperate, with orchestration by the operating system (OS).

    But getting from today’s computing to exascale requires an adaptable OS – maybe more than one. Computer applications “will be composed of different components,” says Ron Brightwell, R&D manager for scalable systems software at Sandia National Laboratories.

    “There may be a large simulation consuming lots of resources, and some may integrate visualization or multi-physics.” That is, applications might not use all of an exascale machine’s resources in the same way. Plus, an OS aimed at exascale also must deal with changing hardware. HPC “architecture is always evolving,” often mixing different kinds of processors and memory components in heterogeneous designs.

    As computer scientists consider scaling up hardware and software, there’s no easy answer for when an OS must change. “It depends on the application and what needs to be solved,” Brightwell explains. On top of that variability, he notes, “scaling down is much easier than scaling up.” So rather than try to grow an OS from a laptop to an exascale platform, Brightwell thinks the other way. “We should try to provide an exascale OS and runtime environment on a smaller scale – starting with something that works at a higher scale and then scale down.”

    To explore the needs of an OS and conditions to run software for exascale, Brightwell and his colleagues conducted a project called Hobbes, which involved scientists at four national labs – Oak Ridge (ORNL), Lawrence Berkeley, Los Alamos and Sandia – plus seven universities. To perform the research, Brightwell – with Terry Jones, an ORNL computer scientist, and Patrick Bridges, a University of New Mexico associate professor of computer science – earned an ASCR Leadership Computing Challenge allocation of 30 million processor hours on Titan, ORNL’s Cray XK7 supercomputer.

    ORNL Cray Titan Supercomputer
    ORNL Cray XK7 Titan Supercomputer

    2
    The Hobbes OS supports multiple software stacks working together, as indicated in this diagram of the Hobbes co-kernel software stack. Image courtesy of Ron Brightwell, Sandia National Laboratories.

    Brightwell made a point of including the academic community in developing Hobbes. “If we want people in the future to do OS research from an HPC perspective, we need to engage the academic community to prepare the students and give them an idea of what we’re doing,” he explains. “Generally, OS research is focused on commercial things, so it’s a struggle to get a pipeline of students focusing on OS research in HPC systems.”

    The Hobbes project involved a variety of components, but for the OS side, Brightwell describes it as trying to understand applications as they become more sophisticated. They may have more than one simulation running in a single OS environment. “We need to be flexible about what the system environment looks like,” he adds, so with Hobbes, the team explored using multiple OSs in applications running at extreme scale.

    As an example, Brightwell notes that the Hobbes OS envisions multiple software stacks working together. The OS, he says, “embraces the diversity of the different stacks.” An exascale system might let data analytics run on multiple software stacks, but still provide the efficiency needed in HPC at extreme scales. This requires a computer infrastructure that supports simultaneous use of multiple, different stacks and provides extreme-scale mechanisms, such as reducing data movement.

    Part of Hobbes also studied virtualization, which uses a subset of a larger machine to simulate a different computer and operating system. “Virtualization has not been used much at extreme scale,” Brightwell says, “but we wanted to explore it and the flexibility that it could provide.” Results from the Hobbes project indicate that virtualization for extreme scale can provide performance benefits at little cost.

    Other HPC researchers besides Brightwell and his colleagues are exploring OS options for extreme-scale computing. For example, Pete Beckman, co-director of the Northwestern-Argonne Institute of Science and Engineering at Argonne National Laboratory, runs the Argo project.

    A team of 25 collaborators from Argonne, Lawrence Livermore National Laboratory and Pacific Northwest National Laboratory, plus four universities created Argo, an OS that starts with a single Linux-based OS and adapts it to extreme scale.

    When comparing the Hobbes OS to Argo, Brightwell says, “we think that without getting in that Linux box, we have more freedom in what we do, other than design choices already made in Linux. Both of these OSs are likely trying to get to the same place but using different research vehicles to get there.” One distinction: The Hobbes project uses virtualization to explore the use of multiple OSs working on the same simulation at extreme scale.

    As the scale of computation increases, an OS must also support new ways of managing a systems’ resources. To explore some of those needs, Thomas Sterling, director of Indiana University’s Center for Research in Extreme Scale Technologies, developed ParalleX, an advanced execution model for computations. Brightwell leads a separate project called XPRESS to support the ParalleX execution model. Rather than computing’s traditional static methods, ParalleX implementations use dynamic adaptive techniques.

    More work is always necessary as computation works toward extreme scales. “The important thing in going forward from a runtime and OS perspective is the ability to evaluate technologies that are developing in terms of applications,” Brightwell explains. “For high-end applications to pursue functionality at extreme scales, we need to build that capability.” That’s just what Hobbes and XPRESS – and the ongoing research that follows them – aim to do.

    See the full article here .

    Please help promote STEM in your local schools.

    STEM Icon

    Stem Education Coalition

    The mission of the Energy Department is to ensure America’s security and prosperity by addressing its energy, environmental and nuclear challenges through transformative science and technology solutions.

     
  • richardmitnick 10:09 am on December 12, 2016 Permalink | Reply
    Tags: , Exascale computing, , , ,   

    From PPPL: “PPPL physicists win funding to lead a DOE exascale computing project” 


    PPPL

    October 27, 2016 [Just now out on social media.]
    Raphael Rosen

    1
    PPPL physicist Amitava Bhattacharjee. (Photo by Elle Starkman/PPPL Office of Communications)

    A proposal from scientists at the U.S. Department of Energy’s (DOE) Princeton Plasma Physics Laboratory (PPPL) has been chosen as part of a national initiative to develop the next generation of supercomputers. Known as the Exascale Computing Project (ECP), the initiative will include a focus on exascale-related software, applications, and workforce training.

    Once developed, exascale computers will perform a billion billion operations per second, a rate 50 to 100 times faster than the most powerful U.S. computers now in use. The fastest computers today operate at the petascale and can perform a million billion operations per second. Exascale machines in the United States are expected to be ready in 2023.

    The PPPL-led multi-institutional project, titled High-Fidelity Whole Device Modeling of Magnetically Confined Fusion Plasmas, was selected during the ECP’s first round of application development funding, which distributed $39.8 million. The overall project will receive $2.5 million a year for four years to be distributed among all the partner institutions, including Argonne, Lawrence Livermore, and Oak Ridge national laboratories, together with Rutgers University, the University of California, Los Angeles, and the University of Colorado, Boulder. PPPL itself will receive $800,000 per year; the project it leads was one of 15 selected for full funding, and the only one dedicated to fusion energy. Seven additional projects received seed funding.

    The application efforts will help guide DOE’s development of a U.S. exascale ecosystem as part of President Obama’s National Strategic Computing Initiative (NSCI). DOE, the Department of Defense and the National Science Foundation have been designated as NSCI lead agencies, and ECP is the primary DOE contribution to the initiative.

    The ECP’s multi-year mission is to maximize the benefits of high performance computing (HPC) for U.S. economic competitiveness, national security and scientific discovery. In addition to applications, the DOE project addresses hardware, software, platforms and workforce development needs critical to the effective development and deployment of future exascale systems. The ECP is supported jointly by DOE’s Office of Science and the National Nuclear Security Administration within DOE.

    PPPL has been involved with high-performance computing for years. PPPL scientists created the XGC code, which models the behavior of plasma in the boundary region where the plasma’s ions and electrons interact with each other and with neutral particles produced by the tokamak’s inner wall. The high-performance code is maintained and updated by PPPL scientist C.S. Chang and his team.

    3
    PPPL scientist C.S. Chang

    XGC runs on Titan, the fastest computer in the United States, at the Oak Ridge Leadership Computing Facility, a DOE Office of Science User Facility at Oak Ridge National Laboratory.

    ORNL Cray Titan Supercomputer
    ORNL Cray Titan Supercomputer

    The calculations needed to model the behavior of the plasma edge are so complex that the code uses 90 percent of the computer’s processing capabilities. Titan performs at the petascale, completing a million billion calculations each second, and the DOE was primarily interested in proposals by institutions that possess petascale-ready codes that can be upgraded for exascale computers.

    The PPPL proposal lays out a four-year plan to combine XGC with GENE, a computer code that simulates the behavior of the plasma core. GENE is maintained by Frank Jenko, a professor at the University of California, Los Angeles. Combining the codes would give physicists a far better sense of how the core plasma interacts with the edge plasma at a fundamental kinetic level, giving a comprehensive view of the entire plasma volume.

    Leading the overall PPPL proposal is Amitava Bhattacharjee, head of the Theory Department at PPPL. Co-principal investigators are PPPL’s Chang and Andrew Siegel, a computational scientist at the University of Chicago.

    The multi-institutional effort will develop a full-scale computer simulation of fusion plasma. Unlike current simulations, which model only part of the hot, charged gas, the proposed simulations will display the physics of an entire plasma all at once. The completed model will integrate the XGC and GENE codes and will be designed to run on exascale computers.

    The modeling will enable physicists to understand plasmas more fully, allowing them to predict its behavior within doughnut-shaped fusion facilities known as tokamaks. The exascale computing fusion proposal focuses primarily on ITER, the international tokamak being built in France to demonstrate the feasibility of fusion power.

    Iter experimental tokamak nuclear fusion reactor that is being built next to the Cadarache facility in Saint Paul les-Durance south of France
    Iter experimental tokamak nuclear fusion reactor that is being built next to the Cadarache facility in Saint Paul les-Durance south of France

    But the proposal will be developed with other applications in mind, including stellarators, another variety of fusion facility.

    Wendelstgein 7-X stellarator
    Wendelstgein 7-X stellarator,built in Greifswald, Germany

    Better predictions can lead to better engineered facilities and more efficient fusion reactors. Currently, support for this work comes from the DOE’s Advanced Science Computing Research program.

    “This will be a team effort involving multiple institutions,” said Bhattacharjee. He noted that PPPL will be involved in every aspect of the project, including working with applied mathematicians and computer scientists on the team to develop the simulation framework that will couple GENE with XGC on exascale computers.

    “You need a very-large-scale computer to calculate the multiscale interactions in fusion plasmas,” said Chang. “Whole-device modeling is about simulating the whole thing: all the systems together.”

    Because plasma behavior is immensely complicated, developing an exascale computer is crucial for future research. “Taking into account all the physics in a fusion plasma requires enormous computational resources,” said Bhattacharjee. “With the computer codes we have now, we are already pushing on the edge of the petascale. The exascale is very much needed in order for us to have greater realism and truly predictive capability.”

    See the full article here .

    Please help promote STEM in your local schools.

    STEM Icon

    Stem Education Coalition

    Princeton Plasma Physics Laboratory is a U.S. Department of Energy national laboratory managed by Princeton University. PPPL, on Princeton University’s Forrestal Campus in Plainsboro, N.J., is devoted to creating new knowledge about the physics of plasmas — ultra-hot, charged gases — and to developing practical solutions for the creation of fusion energy. Results of PPPL research have ranged from a portable nuclear materials detector for anti-terrorist use to universally employed computer codes for analyzing and predicting the outcome of fusion experiments. The Laboratory is managed by the University for the U.S. Department of Energy’s Office of Science, which is the largest single supporter of basic research in the physical sciences in the United States, and is working to address some of the most pressing challenges of our time. For more information, please visit science.energy.gov.

     
c
Compose new post
j
Next post/Next comment
k
Previous post/Previous comment
r
Reply
e
Edit
o
Show/Hide comments
t
Go to top
l
Go to login
h
Show/Hide help
shift + esc
Cancel
%d bloggers like this: