From Lawrence Livermore National Laboratory: “Lawrence Livermore unveils NNSA’s Sierra, world’s third fastest supercomputer”

From Lawrence Livermore National Laboratory

Oct. 26, 2018

Jeremy Thomas
thomas244@llnl.gov
925-422-5539

LLNL IBM ATS-2 NVIDIA Mellanox Sierra Supercomputer

The Department of Energy’s National Nuclear Security Administration (NNSA), Lawrence Livermore National Laboratory (LLNL) and its industry partners today officially unveiled Sierra, one of the world’s fastest supercomputers, at a dedication ceremony to celebrate the system’s completion.

Sierra will serve the NNSA’s three nuclear security laboratories, LLNL, Sandia National Laboratories and Los Alamos National Laboratory, providing high-fidelity simulations in support of NNSA’s core mission of ensuring the safety, security and effectiveness of the nation’s nuclear stockpile. Its arrival represents years of procurement, design, code development and installation, requiring the efforts of hundreds of computer scientists, developers and operations personnel working in close partnership with IBM, NVIDIA and Mellanox.

“Today we mark our latest milestone toward computing on a truly exascale level,” Department of Energy Secretary Rick Perry said in a video message prepared for the dedication. “With its dramatic unveiling of Sierra, Lawrence Livermore National Laboratory has taken a pivotal step forward on behalf of America’s national security.”

“With the advent of Sierra, Livermore has delivered a powerful new tool for NNSA and stockpile stewardship. This machine represents a new approach to high performance computing that will enable us to address and answer scientific questions previously beyond our reach,” said LLNL Director Bill Goldstein. “I thank everyone involved in getting us to this point: our sponsors at NNSA, our industry and national lab partners and our own dedicated staff. This is a signal moment in Livermore’s history, and a new milestone in our leadership in high performance computing and simulation.”

Sierra, ranked as the third-fastest supercomputer in the world on the latest TOP500 list, is NNSA’s first large-scale production heterogeneous system, meaning each node incorporates both IBM central processing units (CPUs) and NVIDIA graphics processing units (GPUs). It is specifically designed for modeling and simulations essential for NNSA’s Stockpile Stewardship Program, ongoing life extension programs, weapons science and nuclear deterrence. It is expected to go into use for classified production in early 2019.

“NNSA and its predecessors have been at the forefront of scientific computing since World War II,” said Mark Anderson, director for the Office of Advanced Simulation and Computing and Institutional Research & Development at NNSA. “The supercomputers provided by NNSA are an essential element of stockpile stewardship without nuclear testing. Sierra is the most capable computer we have ever fielded. It also is a harbinger of future computing technology and a critical step along the path to exascale.”

Sierra boasts a peak performance of 125 petaFLOPS — 125 quadrillion floating-point operations per second. Early indications using existing codes and benchmark tests are promising, demonstrating as predicted that Sierra can perform most required calculations far more efficiently in terms of cost and power consumption than systems consisting of CPUs alone. Depending on the application, Sierra is expected to be six to 10 times more capable than LLNL’s 20- petaFLOP Sequoia, currently the world’s eighth-fastest supercomputer.

“The continued aging of the stockpile requires much more capable computing systems,” said Mike Dunning, acting principal associate director for LLNL’s weapons program. “Sierra represents a continuation of NNSA’s leadership in high performance computing. It’s even more important today as we face increased global complexities, so it is essential that our tools are able to operate at the leading edge.”

With a footprint of 7,000 square feet, Sierra is comprised of 240 computing racks and 4,320 nodes, with each node consisting of two IBM POWER 9 CPUs, four NVIDIA V100 GPUs and a Mellanox EDR InfiniBand interconnect. To prepare for this architecture, LLNL has partnered with IBM and NVIDIA to rapidly develop codes and prepare applications to effectively optimize the CPU/GPU nodes.

IBM and NVIDIA personnel worked closely with LLNL, both on-site and remotely, on code development and restructuring to achieve maximum performance, while LLNL personnel provided feedback on system design and the software stack to the vendor. This “center of excellence” co-design strategy is necessary to assure that codes and platforms are well-matched, and applications are optimized for GPU-accelerated architecture. LLNL’s partnership with Oak Ridge National Laboratory, which is siting the Summit system from IBM, also has been extremely helpful throughout the project, from procurement to operation.

LLNL selected the IBM/NVIDIA system due to its energy and cost efficiency, as well as its potential to effectively run NNSA applications. Sierra’s IBM POWER9 processors feature CPU-to-GPU connection via NVIDIA NVLink interconnect, enabling greater memory bandwidth between each node so Sierra can move data throughout the system for maximum performance and efficiency. Backing Sierra is 154 petabytes of IBM Spectrum Scale, a software-defined parallel file system, deployed across 24 racks of Elastic Storage Servers (ESS). To meet the scaling demands of the heterogeneous systems, the solution delivers 1.54 terabytes per second in both read and write bandwidth and can manage 100 billion files per file system.

“The next frontier of supercomputing lies in artificial intelligence,” said John Kelly, senior vice president, Cognitive Solutions and IBM Research. “IBM’s decades-long partnership with LLNL has allowed us to build Sierra from the ground up with the unique design and architecture needed for applying AI to massive data sets. The tremendous insights researchers are seeing will only accelerate high performance computing for research and business.”

As the first NNSA production supercomputer backed by GPU-accelerated architecture, Sierra’s acquisition required a fundamental shift in how scientists at the three NNSA laboratories program their codes to take advantage of the GPUs. The system’s NVIDIA GPUs also present scientists with an opportunity to investigate the use of machine learning and deep learning to accelerate time-to-solution of physics codes. It is expected that simulation, leveraged by acceleration coming from the use of artificial intelligence technology, will be increasingly employed over the coming decade.

“Sierra is a world-class, pre-exascale supercomputer that allows researchers to run large complex scientific simulations at scale, at speeds never before thought possible,” said Ian Buck, vice president and general manager of Accelerated Computing at NVIDIA. “Equipped with more than 17,000 of our Tesla Tensor Core V100 GPUs, Sierra is a powerful, universal platform for compute-intensive scientific simulations, machine learning, deep learning and visualization applications all in one — paving the path forward for the future of high performance computing.”

Sierra also leverages Mellanox EDR 100 Gigabit InfiniBand In-Network Computing acceleration engines to achieve higher applications performance and scalability.

“We are very proud to provide essential technology for one of the fastest supercomputers in the world at Lawrence Livermore National Laboratory,” said Gilad Shainer, vice president of marketing at Mellanox Technologies. “Our InfiniBand smart interconnect delivers the necessary performance, efficiency and scalability to support the needs of the Laboratory’s next-generation high performance and artificial intelligence applications, and the path to exascale computing.”

In addition to critical national security applications, a companion unclassified system, called Lassen, also has been installed in the Livermore Computing Center. This institutionally focused system will play a role in projects aimed at speeding cancer drug discovery, precision medicine, research on traumatic brain injury, seismology, climate, astrophysics, materials science and other basic science benefiting society.

Sierra continues the long lineage of world-class LLNL supercomputers and represents the penultimate step on NNSA’s road to exascale computing, which is expected to be achieved by 2023 with an LLNL system called “El Capitan.” Funded by the NNSA’s Advanced Simulation and Computing (ASC) program, El Capitan will be NNSA’s first exascale supercomputer, capable of more than a quintillion calculations per second, about 10 times greater performance than Sierra. Such computing power will be easily absorbed by NNSA for its mission, having required the most advanced computing capabilities and deep partnerships with American industry.

“In just a few short years, we expect to see exascale systems deployed at Lawrence Livermore, Argonne and Oak Ridge (national laboratories), ensuring our global superiority in this arena for years and decades to come,” Perry said. “Starting with Sierra, this new generation of supercomputers will be an absolute game-changer for the world.”

See the full article here .


five-ways-keep-your-child-safe-school-shootings

Please help promote STEM in your local schools.

Stem Education Coalition

LLNL Campus

Operated by Lawrence Livermore National Security, LLC, for the Department of Energy’s National Nuclear Security Administration
Lawrence Livermore National Laboratory (LLNL) is an American federal research facility in Livermore, California, United States, founded by the University of California, Berkeley in 1952. A Federally Funded Research and Development Center (FFRDC), it is primarily funded by the U.S. Department of Energy (DOE) and managed and operated by Lawrence Livermore National Security, LLC (LLNS), a partnership of the University of California, Bechtel, BWX Technologies, AECOM, and Battelle Memorial Institute in affiliation with the Texas A&M University System. In 2012, the laboratory had the synthetic chemical element livermorium named after it.

LLNL is self-described as “a premier research and development institution for science and technology applied to national security.” Its principal responsibility is ensuring the safety, security and reliability of the nation’s nuclear weapons through the application of advanced science, engineering and technology. The Laboratory also applies its special expertise and multidisciplinary capabilities to preventing the proliferation and use of weapons of mass destruction, bolstering homeland security and solving other nationally important problems, including energy and environmental security, basic science and economic competitiveness.

The Laboratory is located on a one-square-mile (2.6 km2) site at the eastern edge of Livermore. It also operates a 7,000 acres (28 km2) remote experimental test site, called Site 300, situated about 15 miles (24 km) southeast of the main lab site. LLNL has an annual budget of about $1.5 billion and a staff of roughly 5,800 employees.

LLNL was established in 1952 as the University of California Radiation Laboratory at Livermore, an offshoot of the existing UC Radiation Laboratory at Berkeley. It was intended to spur innovation and provide competition to the nuclear weapon design laboratory at Los Alamos in New Mexico, home of the Manhattan Project that developed the first atomic weapons. Edward Teller and Ernest Lawrence,[2] director of the Radiation Laboratory at Berkeley, are regarded as the co-founders of the Livermore facility.

The new laboratory was sited at a former naval air station of World War II. It was already home to several UC Radiation Laboratory projects that were too large for its location in the Berkeley Hills above the UC campus, including one of the first experiments in the magnetic approach to confined thermonuclear reactions (i.e. fusion). About half an hour southeast of Berkeley, the Livermore site provided much greater security for classified projects than an urban university campus.

Lawrence tapped 32-year-old Herbert York, a former graduate student of his, to run Livermore. Under York, the Lab had four main programs: Project Sherwood (the magnetic-fusion program), Project Whitney (the weapons-design program), diagnostic weapon experiments (both for the Los Alamos and Livermore laboratories), and a basic physics program. York and the new lab embraced the Lawrence “big science” approach, tackling challenging projects with physicists, chemists, engineers, and computational scientists working together in multidisciplinary teams. Lawrence died in August 1958 and shortly after, the university’s board of regents named both laboratories for him, as the Lawrence Radiation Laboratory.

Historically, the Berkeley and Livermore laboratories have had very close relationships on research projects, business operations, and staff. The Livermore Lab was established initially as a branch of the Berkeley laboratory. The Livermore lab was not officially severed administratively from the Berkeley lab until 1971. To this day, in official planning documents and records, Lawrence Berkeley National Laboratory is designated as Site 100, Lawrence Livermore National Lab as Site 200, and LLNL’s remote test location as Site 300.[3]

The laboratory was renamed Lawrence Livermore Laboratory (LLL) in 1971. On October 1, 2007 LLNS assumed management of LLNL from the University of California, which had exclusively managed and operated the Laboratory since its inception 55 years before. The laboratory was honored in 2012 by having the synthetic chemical element livermorium named after it. The LLNS takeover of the laboratory has been controversial. In May 2013, an Alameda County jury awarded over $2.7 million to five former laboratory employees who were among 430 employees LLNS laid off during 2008.[4] The jury found that LLNS breached a contractual obligation to terminate the employees only for “reasonable cause.”[5] The five plaintiffs also have pending age discrimination claims against LLNS, which will be heard by a different jury in a separate trial.[6] There are 125 co-plaintiffs awaiting trial on similar claims against LLNS.[7] The May 2008 layoff was the first layoff at the laboratory in nearly 40 years.[6]

On March 14, 2011, the City of Livermore officially expanded the city’s boundaries to annex LLNL and move it within the city limits. The unanimous vote by the Livermore city council expanded Livermore’s southeastern boundaries to cover 15 land parcels covering 1,057 acres (4.28 km2) that comprise the LLNL site. The site was formerly an unincorporated area of Alameda County. The LLNL campus continues to be owned by the federal government.

LLNL/NIF


DOE Seal
NNSA