From isgtw: “Getting value from a trillion electron haystack”

July 11, 2012
Linda Vu

Modern research tools like high-performance computers and particle colliders are generating so much data, so quickly, that many scientists fear they will not be able to keep up with the deluge. Now, for the first time, Berkeley researchers have designed strategies for extracting interesting data from massive scientific datasets, and queried 32 terabytes of a trillion particle dataset in three seconds.

After querying a dataset of 114,875,956,837 particles for those with energy values less than 1.5, FastQuery identified 57,740,614 particles, which are mapped on this plot. Image courtesy Oliver Rubel, Berkeley Lab.

‘These instruments are capable of answering some of our most fundamental scientific questions, but it is all for nothing if we can’t get a handle on the data and make sense of it,’ said Surendra Byna of the Lawrence Berkeley National Laboratory’s (Berkeley Lab’s) Computational Research Division (CRD).

That’s why researchers from Berkeley Lab’s CRD, the University of California, San Diego (UCSD), Los Alamos National Laboratory, Tsinghua University, and Brown University teamed up to develop software strategies for storing, mining, and analyzing massive datasets – specifically, for data generated by a state-of-the-art plasma physics code called VPIC.

When the team ran VPIC on the Department of Energy’s National Energy Research Scientific Computing Center’s (NERSC’s) Cray XE6 ’Hopper’ high-performance computer, they generated a 3D dataset of a trillion particles to better understand magnetic reconnection in particles. Magnetic reconnection is a physical process where magnetic topology is rearranged, and magnetic energy is converted into kinetic energy, thermal energy, and particle acceleration.”

See the full article here.