From Lawrence Livermore National Laboratory: “New computing cluster coming to Livermore”

From Lawrence Livermore National Laboratory

Nov. 8, 2018
Jeremy Thomas

LLNL Penguin Computing Corona AMD Mellanox high-performance computing cluster

Lawrence Livermore National Laboratory, in partnership with Penguin Computing, AMD and Mellanox Technologies, will accept delivery of Corona, a new unclassified high-performance computing (HPC) cluster that will provide unique capabilities for Lab researchers and industry partners to explore data science, machine learning and big data analytics.

The system will be provided by Penguin Computing and will be comprised of AMD EPYC™ processors and AMD Radeon™ Instinct™ GPU (graphics processing unit) accelerators connected via a Mellanox HDR 200 Gigabit InfiniBand network. The system lends itself to applying machine learning and data analysis techniques to challenging problems in HPC and big data and will be used to support the National Nuclear Security Administration’s (NNSA) Advanced Simulation and Computing (ASC) program. The system will be housed by Livermore Computing (LC) in an unclassified site adjacent to the High Performance Computing Innovation Center (HPCIC), dedicated to partnerships with American industry.

Procured through the Commodity Technology Systems (CTS-1) contract, Corona will help NNSA assess future architectures, fill institutional and ASC needs to develop leadership in data science and machine learning capabilities at scale, provide access to HPCIC partners and extend a continuous collaboration vehicle for AMD, Penguin, Mellanox and LLNL.

“Corona will provide an excellent platform for our research into cognitive computing algorithms and developing predictive simulations for both inertial confinement fusion applications as well as molecular dynamics simulations targeting precision medicine for oncology,” said Brian Van Essen, LLNL Informatics group leader and computer scientist. “The unique computational resources and interconnect will allow us to continue to develop leading edge algorithms for scalable distributed deep learning. As deep learning becomes an integral part of many applications at the Laboratory, computational resources like Corona are vital to our ability to develop the next generation of scientific applications.”

Funded by the LLNL Multi-Programmatic and Institutional Computing (M&IC) program and the NNSA’s ASC program, the 383 teraFLOPS (floating point operations per second) Corona cluster will be delivered in late November and is expected to be available for limited use by December. The cluster consists of 170 two-socket nodes incorporating 24-core AMD EPYC™ 7401 processors and a PCIe 1.6 Terabyte (TB) nonvolatile (solid-state) memory device. Each Corona compute node is GPU-ready with half of those nodes utilizing four AMD Radeon Instinct™ MI25 GPUs per node, delivering 4.2 petaFLOPS of FP32 peak performance. The remaining compute nodes may be upgraded with future GPUs.

Corona is likely to supplant the LLNL Catalyst cluster, a 150-teraFLOPS unclassified HPC cluster.

It will run the NNSA-funded Tri-lab Open Source Software (TOSS) that provides a common user environment for Los Alamos, Sandia and Lawrence Livermore national labs.

“We’re in a unique position working with this heterogenous architecture,” said Matt Leininger, deputy of Advanced Technology Projects for LLNL. “Corona is the next logical step in applying leading-edge technologies to the scientific discovery mission of the Laboratory. This system will be capable of generating big data from HPC simulations, while also being capable of translating that data into knowledge through the use of machine learning and data analysis.”

The HPC Innovation Center at LLNL will offer access to Corona and the expected machine learning innovations it enables as a new option for its ongoing collaboration with American companies and research institutions.

“Penguin Computing has been working with America’s national energy and defense labs on projects focused on open systems for almost 20 years,” said Sid Mair, senior vice president, federal systems at Penguin Computing. “During this long collaboration, we’ve been able to help them take advantage of the value, both in terms of return on investment and flexibility, that open systems provide compared to proprietary systems. Helping them deploy AI using open systems in the Corona system is an exciting new chapter in this relationship that we hope will help them execute their mission even more effectively.”

“AMD welcomes the delivery of the Corona system to the HPCIC and the selection of high-performance AMD EPYC processors and AMD Radeon Instinct accelerators for the cluster,” said Mark Papermaster, AMD’s senior vice president and chief technology officer. “The collaboration between AMD, Penguin, Mellanox and Lawrence Livermore National Lab has built a world-class HPC system that will enable researchers to push the boundaries of science and innovation.”

The system is interconnected via the new-generation high-performance Mellanox HDR 200G InfiniBand network, enabling the Lab to accelerate applications and increase scaling and efficiencies. The diverse mixture of computing technologies will allow LLNL and Corona partners to explore new approaches to cognitive simulation – blending machine learning and HPC – and intelligence-based data analytics.

“HDR 200G InfiniBand brings a new level of performance and scalability needed to build the next generation of high-performance computing and artificial intelligence system,” said Gilad Shainer, vice president of marketing at Mellanox Technologies. “The collaboration between Penguin, AMD and LLNL results in a technology-leading platform that will progress science and discovery at the Lab.”

See the full article here .


Please help promote STEM in your local schools.

Stem Education Coalition

LLNL Campus

Operated by Lawrence Livermore National Security, LLC, for the Department of Energy’s National Nuclear Security Administration
Lawrence Livermore National Laboratory (LLNL) is an American federal research facility in Livermore, California, United States, founded by the University of California, Berkeley in 1952. A Federally Funded Research and Development Center (FFRDC), it is primarily funded by the U.S. Department of Energy (DOE) and managed and operated by Lawrence Livermore National Security, LLC (LLNS), a partnership of the University of California, Bechtel, BWX Technologies, AECOM, and Battelle Memorial Institute in affiliation with the Texas A&M University System. In 2012, the laboratory had the synthetic chemical element livermorium named after it.

LLNL is self-described as “a premier research and development institution for science and technology applied to national security.” Its principal responsibility is ensuring the safety, security and reliability of the nation’s nuclear weapons through the application of advanced science, engineering and technology. The Laboratory also applies its special expertise and multidisciplinary capabilities to preventing the proliferation and use of weapons of mass destruction, bolstering homeland security and solving other nationally important problems, including energy and environmental security, basic science and economic competitiveness.

The Laboratory is located on a one-square-mile (2.6 km2) site at the eastern edge of Livermore. It also operates a 7,000 acres (28 km2) remote experimental test site, called Site 300, situated about 15 miles (24 km) southeast of the main lab site. LLNL has an annual budget of about $1.5 billion and a staff of roughly 5,800 employees.

LLNL was established in 1952 as the University of California Radiation Laboratory at Livermore, an offshoot of the existing UC Radiation Laboratory at Berkeley. It was intended to spur innovation and provide competition to the nuclear weapon design laboratory at Los Alamos in New Mexico, home of the Manhattan Project that developed the first atomic weapons. Edward Teller and Ernest Lawrence,[2] director of the Radiation Laboratory at Berkeley, are regarded as the co-founders of the Livermore facility.

The new laboratory was sited at a former naval air station of World War II. It was already home to several UC Radiation Laboratory projects that were too large for its location in the Berkeley Hills above the UC campus, including one of the first experiments in the magnetic approach to confined thermonuclear reactions (i.e. fusion). About half an hour southeast of Berkeley, the Livermore site provided much greater security for classified projects than an urban university campus.

Lawrence tapped 32-year-old Herbert York, a former graduate student of his, to run Livermore. Under York, the Lab had four main programs: Project Sherwood (the magnetic-fusion program), Project Whitney (the weapons-design program), diagnostic weapon experiments (both for the Los Alamos and Livermore laboratories), and a basic physics program. York and the new lab embraced the Lawrence “big science” approach, tackling challenging projects with physicists, chemists, engineers, and computational scientists working together in multidisciplinary teams. Lawrence died in August 1958 and shortly after, the university’s board of regents named both laboratories for him, as the Lawrence Radiation Laboratory.

Historically, the Berkeley and Livermore laboratories have had very close relationships on research projects, business operations, and staff. The Livermore Lab was established initially as a branch of the Berkeley laboratory. The Livermore lab was not officially severed administratively from the Berkeley lab until 1971. To this day, in official planning documents and records, Lawrence Berkeley National Laboratory is designated as Site 100, Lawrence Livermore National Lab as Site 200, and LLNL’s remote test location as Site 300.[3]

The laboratory was renamed Lawrence Livermore Laboratory (LLL) in 1971. On October 1, 2007 LLNS assumed management of LLNL from the University of California, which had exclusively managed and operated the Laboratory since its inception 55 years before. The laboratory was honored in 2012 by having the synthetic chemical element livermorium named after it. The LLNS takeover of the laboratory has been controversial. In May 2013, an Alameda County jury awarded over $2.7 million to five former laboratory employees who were among 430 employees LLNS laid off during 2008.[4] The jury found that LLNS breached a contractual obligation to terminate the employees only for “reasonable cause.”[5] The five plaintiffs also have pending age discrimination claims against LLNS, which will be heard by a different jury in a separate trial.[6] There are 125 co-plaintiffs awaiting trial on similar claims against LLNS.[7] The May 2008 layoff was the first layoff at the laboratory in nearly 40 years.[6]

On March 14, 2011, the City of Livermore officially expanded the city’s boundaries to annex LLNL and move it within the city limits. The unanimous vote by the Livermore city council expanded Livermore’s southeastern boundaries to cover 15 land parcels covering 1,057 acres (4.28 km2) that comprise the LLNL site. The site was formerly an unincorporated area of Alameda County. The LLNL campus continues to be owned by the federal government.


DOE Seal