From OLCF: “Optimizing Miniapps for Better Portability”

i1

Oak Ridge National Laboratory

OLCF

January 17, 2018
Rachel Harken

When scientists run their scientific applications on massive supercomputers, the last thing they want to worry about is optimizing their codes for new architectures. Computer scientist Sunita Chandrasekaran at the University of Delaware is taking steps to make sure they don’t have a reason to worry.

Chandrasekaran collaborates with a team at the US Department of Energy’s (DOE’s) Oak Ridge National Laboratory (ORNL) to optimize miniapps, smaller pieces of large applications that can be extracted and fine-tuned to run on GPU architectures. Chandrasekaran and her PhD student, Robert Searles, have taken on the task of porting (adapting) one such miniapp, Minisweep, to OpenACC—a directive-based programming model that allows users to run a code on multiple computing platforms without having to change or rewrite it.

1
Minisweep performs a “sweep” computation across a grid (pictured)—representative of a 3D volume in space—to calculate the positions, energies, and flows of neutrons in a nuclear reactor. The yellow cube marks the beginning location of the sweep. The green cubes are dependent upon information from the yellow cube, the blue cubes are dependent upon information from the green cubes, and so forth. In practice, sweeps are performed from each of the eight corners of the cube simultaneously.

Minisweep is particularly important because it represents approximately 80–99 percent of the computation time of Denovo, a 3D code for radiation transport in nuclear reactors being used in a current DOE Innovative and Novel Computational Impact on Theory and Experiment, or INCITE, project. Minisweep is also being used in benchmarking for the Oak Ridge Leadership Computing Facility’s (OLCF’s) new Summit supercomputer.

ORNL IBM Summit supercomputer depiction

Summit is scheduled to be in full production in 2019 and will be the next leadership-class system at the OLCF, a DOE Office of Science User Facility located at ORNL.

Created from Denovo by OLCF computational scientist Wayne Joubert, Minisweep works by “sweeping” diagonally across grid cells that represent points in space, allowing it to track the positions, flows, and energies of neutrons in a nuclear reactor. Cubes in the grid cell represent a number of these qualities and depend on information from previous cubes in the grid.

“Scientists need to know how neutrons are flowing in a reactor because it can help them figure out how to build the radiation shield around it,” Chandrasekaran said. “Using Denovo, physicists can simulate this flow of neutrons, and with a faster code, they can compute many different configurations quickly and get their work done faster.”

Minisweep has already been ported to multicore platforms using the OpenMP programming interface and to GPU accelerators using the lower-level programming language CUDA. ORNL computer scientists and ORNL Miniapps Port Collaboration organizers Tiffany Mintz and Oscar Hernandez knew that porting these kinds of codes to OpenACC would equip them for use on different high-performance computing architectures.

Chandrasekaran and Searles have been using the Summit early access system, Summitdev, and the Cray XK7 Titan supercomputer at the OLCF to test Minisweep since mid-2017.

ORNL Cray XK7 Titan Supercomputer

2
Visualization of a nuclear reactor simulation on Titan.

Now, they’ve successfully enabled Minisweep to run on parallel architectures using OpenACC for fast execution on the targeted computer. An option to port to these types of systems without compromising performance didn’t previously exist.

Whereas the code typically sweeps in eight directions from diagonal corners of a cube inward, the team saw that with only one sweep, the OpenACC directive performed on par with CUDA.

“We saw OpenACC performing as well as CUDA on an NVIDIA Volta GPU, which is a state-of-the-art GPU card,” Searles said. “That’s huge for us to take away, because we are normally lucky to get performance that’s even 85 percent of CUDA. That one sweep consistently showed us about 0.3 or 0.4 seconds faster, which is significant at the problem size we used for measuring performance.”

Chandrasekaran and the team at ORNL will continue optimizing Minisweep to get the application up and “sweeping” from all eight corners of a grid cell. Other radiation transport applications and one for DNA sequencing may be able to take advantage of Minisweep for multiple GPU architectures such as Summit—and even exascale systems—in the future.

“I’m constantly trying to look at how I can package these kinds of tools from a user’s perspective,” Chandrasekaran said. “I take applications that are essential for these scientists’ research and try to find out how to make them more accessible. I always say: write once, reuse multiple times.”

See the full article here .

Please help promote STEM in your local schools.

STEM Icon

Stem Education Coalition

ORNL is managed by UT-Battelle for the Department of Energy’s Office of Science. DOE’s Office of Science is the single largest supporter of basic research in the physical sciences in the United States, and is working to address some of the most pressing challenges of our time.

i2

The Oak Ridge Leadership Computing Facility (OLCF) was established at Oak Ridge National Laboratory in 2004 with the mission of accelerating scientific discovery and engineering progress by providing outstanding computing and data management resources to high-priority research and development projects.

ORNL’s supercomputing program has grown from humble beginnings to deliver some of the most powerful systems in the world. On the way, it has helped researchers deliver practical breakthroughs and new scientific knowledge in climate, materials, nuclear science, and a wide range of other disciplines.

The OLCF delivered on that original promise in 2008, when its Cray XT “Jaguar” system ran the first scientific applications to exceed 1,000 trillion calculations a second (1 petaflop). Since then, the OLCF has continued to expand the limits of computing power, unveiling Titan in 2013, which is capable of 27 petaflops.


ORNL Cray XK7 Titan Supercomputer

Titan is one of the first hybrid architecture systems—a combination of graphics processing units (GPUs), and the more conventional central processing units (CPUs) that have served as number crunchers in computers for decades. The parallel structure of GPUs makes them uniquely suited to process an enormous number of simple computations quickly, while CPUs are capable of tackling more sophisticated computational algorithms. The complimentary combination of CPUs and GPUs allow Titan to reach its peak performance.

The OLCF gives the world’s most advanced computational researchers an opportunity to tackle problems that would be unthinkable on other systems. The facility welcomes investigators from universities, government agencies, and industry who are prepared to perform breakthrough research in climate, materials, alternative energy sources and energy storage, chemistry, nuclear physics, astrophysics, quantum mechanics, and the gamut of scientific inquiry. Because it is a unique resource, the OLCF focuses on the most ambitious research projects—projects that provide important new knowledge or enable important new technologies.