April 18, 2017 [Where was this hiding?]
Katie Elyce Jones
Building an exascale computer—a machine that could solve complex science problems at least 50 times faster than today’s leading supercomputers—is a national effort.
To oversee the rapid research and development (R&D) of an exascale system by 2023, the US Department of Energy (DOE) created the Exascale Computing Project (ECP) last year. The project brings together experts in high-performance computing from six DOE laboratories with the nation’s most powerful supercomputers—including Oak Ridge, Argonne, Lawrence Berkeley, Lawrence Livermore, Los Alamos, and Sandia—and project members work closely with computing facility staff from the member laboratories.
At the Exascale Computing Project’s (ECP’s) annual meeting in February 2017, Oak Ridge Leadership Computing Facility (OLCF) staff discussed OLCF resources that could be leveraged for ECP research and development, including the facility’s next flagship supercomputer, Summit, expected to go online in 2018.
At the first ECP annual meeting, held January 29–February 3 in Knoxville, Tennessee, about 450 project members convened to discuss collaboration in breakout sessions focused on project organization and upcoming R&D milestones for applications, software, hardware, and exascale systems focus areas. During facility-focused sessions, senior staff from the Oak Ridge Leadership Computing Facility (OLCF) met with ECP members to discuss opportunities for the project to use current petascale supercomputers, test beds, prototypes, and other facility resources for exascale R&D. The OLCF is a DOE Office of Science User Facility located at DOE’s Oak Ridge National Laboratory (ORNL).
“The ECP’s fundamental responsibilities are to provide R&D to build exascale machines more efficiently and to prepare the applications and software that will run on them,” said OLCF Deputy Project Director Justin Whitt. “The facilities’ responsibilities are to acquire, deploy, and operate the machines. We are currently putting advanced test beds and prototypes in place to evaluate technologies and enable R&D efforts like those in the ECP.”
ORNL has a unique connection to the ECP. The Tennessee-based laboratory is the location of the project office that manages collaboration within the ECP and among its facility partners. ORNL’s Laboratory Director Thom Mason delivered the opening talk at the conference, highlighting the need for coordination in a project of this scope.
On behalf of facility staff, Mark Fahey, director of operations at the Argonne Leadership Computing Facility, presented the latest delivery and deployment plans for upcoming computing resources during a plenary session. From the OLCF, Project Director Buddy Bland and Director of Science Jack Wells provided a timeline for the availability of Summit, OLCF’s next petascale supercomputer, which is expected to go online in 2018; it will be at least 5 times more powerful than the OLCF’s 27-petaflop Titan supercomputer.
“Exascale hardware won’t be around for several more years,” Wells said. “The ECP will need access to Titan, Summit, and other leadership computers to do the work that gets us to exascale.”
Wells said he was able to highlight the spring 2017 call for Innovative and Novel Computational Impact on Theory and Experiment, or INCITE, proposals, which will give 2-year projects the first opportunity for computing time on Summit. OLCF staff also introduced a handful of computing architecture test beds—including the developmental environment for Summit known as Summitdev, NVIDIA’s deep learning and accelerated analytics system DGX-1, an experimental cluster of ARM 64-bit compute nodes, and a Cray XC40 cluster of 168 nodes known as Percival—that are now available for OLCF users.
In addition to leveraging facility resources for R&D, the ECP must understand the future needs of facilities to design an exascale system that is ready for rigorous computational science simulations. Facilities staff can offer insight about the level of performance researchers will expect from science applications on exascale systems and estimate the amount of space and electrical power that will be available in the 2023 timeframe.
“Getting to capable exascale systems will require careful coordination between the ECP and the user facilities,” Whitt said.
One important collaboration so far was the development of a request for information, or RFI, for exascale R&D that the ECP released in February to industry vendors. The RFI enables the ECP to evaluate potential software and hardware technologies for exascale systems—a step in the R&D process that facilities often undertake. Facilities will later release requests for proposals when they are ready to begin building exascale systems
See the full article here .
Please help promote STEM in your local schools.
ORNL is managed by UT-Battelle for the Department of Energy’s Office of Science. DOE’s Office of Science is the single largest supporter of basic research in the physical sciences in the United States, and is working to address some of the most pressing challenges of our time.
The Oak Ridge Leadership Computing Facility (OLCF) was established at Oak Ridge National Laboratory in 2004 with the mission of accelerating scientific discovery and engineering progress by providing outstanding computing and data management resources to high-priority research and development projects.
ORNL’s supercomputing program has grown from humble beginnings to deliver some of the most powerful systems in the world. On the way, it has helped researchers deliver practical breakthroughs and new scientific knowledge in climate, materials, nuclear science, and a wide range of other disciplines.
The OLCF delivered on that original promise in 2008, when its Cray XT “Jaguar” system ran the first scientific applications to exceed 1,000 trillion calculations a second (1 petaflop). Since then, the OLCF has continued to expand the limits of computing power, unveiling Titan in 2013, which is capable of 27 petaflops.
Titan is one of the first hybrid architecture systems—a combination of graphics processing units (GPUs), and the more conventional central processing units (CPUs) that have served as number crunchers in computers for decades. The parallel structure of GPUs makes them uniquely suited to process an enormous number of simple computations quickly, while CPUs are capable of tackling more sophisticated computational algorithms. The complimentary combination of CPUs and GPUs allow Titan to reach its peak performance.
The OLCF gives the world’s most advanced computational researchers an opportunity to tackle problems that would be unthinkable on other systems. The facility welcomes investigators from universities, government agencies, and industry who are prepared to perform breakthrough research in climate, materials, alternative energy sources and energy storage, chemistry, nuclear physics, astrophysics, quantum mechanics, and the gamut of scientific inquiry. Because it is a unique resource, the OLCF focuses on the most ambitious research projects—projects that provide important new knowledge or enable important new technologies.