August 27, 2014
No longer active, the Tevatron was host to the Collider Detector at Fermilab (CDF) and DZero experiments, and is recognized for the discovery of the top quark and for providing evidence for the existence of the Higgs boson, which was confirmed at CERN in 2012. Several years later, there is a continued effort to preserve the data resulting from the Tevatron’s three-decade legacy.
The Run II Data Preservation system is expected to be sustainable through the year 2020. The project is moving progressively, having successfully tested both the CDF and DZero pilot systems. Tape migration is continuing on schedule, and both the hardware and software infrastructures have been running since February 2012. One of the biggest misconceptions about what data preservation entails, is that only the data is preserved on tape — when, in fact, the more difficult task is preserving the software and an environment on which it can run.
Willis Sakumoto, a senior scientist at Fermi National Accelerator Laboratory (Fermilab), confirms ongoing efforts to fully integrate CDF data into the Fermilab Intensity Frontier Structure and provide Run II documentation within the scope of the project. These efforts include running compatibility validation tests for the transition from Root4 to Root5, as well as the integration of the Cern Virtual Machine File System (CernVM-FS). “The project is well on its way to accomplishing its goal of handing off CDF analysis and documentation infrastructure to Fermilab Scientific Computing Division (FSCD) operations.”
Michelle Brochmann, a student working on the DZero data preservation project, is also optimistic about the progress made thus far. “CernVM-FS facilitates cooperation among scientists by enabling them to access a consistent computational analysis environment.” It has some nice features: the software appears local despite being stored remotely, and files are accessed quickly because CernVM-FS uses optimized, existing http infrastructure and only fetches files from the remote server as they are needed. “Fermilab has committed to help maintain the CernVM-FS for the next decade or so,” adds Brochmann.
Challenges the Run II Data Preservation team must overcome include lack of new resources and manpower. Fortunately, scientists like Kenneth Herner and Bo Jayatilaka — who have worked on the DZero and CDF experiments respectively — recognize the value of the labor they are putting forth and the overall significance it could have for a scientist who may need to revisit a measurement or make new theoretical calculations. “This data has the potential to make new discoveries,” says Jayatilaka.
The growing spread of digital science means not only data but also software preservation is of critical importance to the long-term value of research outcomes. As the magnitude of the experiments — both in cost and in labor — increase, the need for a common forum of usable data is amplified. In response, projects such as the Data and Software Preservation for Open Science (DASPOS) and the Study Group for Data Preservation in High Energy Physics (DPHEP) are working to expand and improve data preservation technology.
Sakumoto is planning to integrate the use of cloud-based technology as a possible analysis solution. Regardless of the methodology chosen, the need for sustainable data preservation will continue to increase as science advances, experiments become less replicable, and data sets become more unique.
See the full article here.
iSGTW is an international weekly online publication that covers distributed computing and the research it enables.
“We report on all aspects of distributed computing technology, such as grids and clouds. We also regularly feature articles on distributed computing-enabled research in a large variety of disciplines, including physics, biology, sociology, earth sciences, archaeology, medicine, disaster management, crime, and art. (Note that we do not cover stories that are purely about commercial technology.)
In its current incarnation, iSGTW is also an online destination where you can host a profile and blog, and find and disseminate announcements and information about events, deadlines, and jobs. In the near future it will also be a place where you can network with colleagues.
You can read iSGTW via our homepage, RSS, or email. For the complete iSGTW experience, sign up for an account or log in with OpenID and manage your email subscription from your account preferences. If you do not wish to access the website’s features, you can just subscribe to the weekly email.”
ScienceSprings relies on technology from