Tagged: ECP- Exascale Computing Project Toggle Comment Threads | Keyboard Shortcuts

  • richardmitnick 3:03 pm on July 24, 2019 Permalink | Reply
    Tags: , Director Doug Kothe sat down with Mike Bernhardt ECP communications manager., ECP- Exascale Computing Project, Red Team Review   

    From Exascale Computing Project: Assessing Progress, Considering ECP’s Enduring Legacy, and More 

    From Exascale Computing Project

    1

    Exascale Computing Project (ECP) Director Doug Kothe sat down with Mike Bernhardt, ECP communications manager, this month to talk about a variety of topics. Covered in the discussion were the aspects of ensuring that a capable exascale computing ecosystem will come to fruition in conjunction with the arrival of the nation’s first exascale systems, objectively assessing whether the project’s efforts are on track, correcting course and instilling confidence through Red Team reviews, addressing the challenges posed by hardware accelerators, consolidating projects for greater technical synergy, acknowledging behind-the-scenes leadership, tracking costs and schedule performance, and reflecting on ECP’s enduring legacy.

    2

    The following is a transcript of the interview.

    Bernhardt: Doug, now that plans for the first exascale systems have been formally and officially announced—Aurora at Argonne and Frontier at Oak Ridge—and we know that the El Capitan system at Lawrence Livermore is right around the corner.

    Depiction of ANL ALCF Cray Intel SC18 Shasta Aurora exascale supercomputer

    ORNL Cray Frontier Shasta based Exascale supercomputer with Slingshot interconnect featuring high-performance AMD EPYC CPU and AMD Radeon Instinct GPU technology

    Kothe: Right. Right.

    Bernhardt: The announcement of the nation’s first exascale systems is such a huge milestone for this country and for the Department of Energy. What do we do to ensure that we will have a capable exascale ecosystem, the software stack, and the exascale-ready applications when these systems are actually standing up?

    Kothe: A good question. As we’ve talked in the past, Mike, we—the ECP team and the staff—knew enough in terms of what we thought was coming to where we weren’t shooting in the dark, so to speak, with regard to building our software stack and our apps. But now with these announcements coming out, we have a more refined, focused target and for the most part, it’s not surprising; it’s a matter of expectations, and I feel like our preparations have us on a good path.

    I do believe that both architectures, as we know of them—Aurora at Argonne and Frontier at Oak Ridge—are very exciting, have tremendous upsides, and are consistent with our overall preparations, meaning the nodes feature what we call accelerators, which is hardware acceleration of certain floating-point operations; so, it allows us to exploit those accelerators for our good. Overall, we’re really excited. We are three years in and we have very specific targets now. The announced systems really met with our expectations, and from what we can tell in terms of the speeds and feeds and the overall design, they really do look very solid, very exciting, and I’m very confident we’re going to be able to deliver on those systems.

    Bernhardt: And as you mentioned, we’re into our third year with the project. If you think back three years ago, where you thought the project would be at this point in time, are we on track? Have there been any surprises that have come across that changed scheduling in your vision?

    Kothe: First of all, we are on track. There are always surprises in R&D and many of them are good. Some, are, I would say, setbacks or things you have to really prepare for, so relative to three years ago, we have teams now that have really figured out how to fit well into our project structure. We have defined very specific metrics that are quantitative but also directly reflect the overall goals and objectives and, in particular, the science and engineering, and national security objectives.

    What we are seeing over the past year and now is that we have a really good sense of how to track performance. So when we say that, it’s not really just a subjective answer. We really have a lot of the objective evidence for being on track and being a formal project with very specific metrics helps us to make collective decisions about what matters and what doesn’t, and it’s all about achieving our objectives, which we’ve mapped into these specific metrics.

    So, yeah, we are definitely on track. That doesn’t mean that it’s not going to be a challenging, tough road ahead. I think we understand what the risks are, and certainly with Aurora and Frontier being announced, many of the unknown unknowns become known unknowns and so I think we’re even better prepared moving forward.

    Bernhardt: You’ve mentioned that it’s not just simply a subjective view of whether or not we’re on track. As I understand it, the team just went through something called a Red Team Review.

    Kothe: Right.

    Bernhardt: Could you explain for our followers what is a Red Team Review and what’s the significance and the impact to the project?

    3

    Kothe: Yeah. And so Red Team, many years ago for me, 30-plus years ago, in the DOE was a new term. One could Google it and kind of see the historical aspects. I think it’s a term that’s readily used in business, in large organizations, in the military. For us it has a very specific meaning that at a high level is not too dissimilar from other organizations and agencies, and that is, we bring in a team that is there specifically to poke, to prod, to find any sort of flaw or hole in our plan, and the team is there to help us. They’re not there to be punitive, but they’re, also not friends and family. They are there to help us and specifically find problems in our plan, in our objectives. So, typically we go through at least one formal review a year by our Department of Energy sponsors, and so what we do with the Red Team is two to three months before that formal review, we have a Red Team Review. You could view it as an internal review where we essentially mimic the formal DOE review in terms of what we’re going to do, in terms of presentations, and breakout sessions, et cetera, but with an external, independent, separate team with no conflict of interest with ECP, meaning folks working on ECP aren’t a part of this. So it is very formal and independent.

    These reviews at times are painful, because it requires a lot of work and a lot of heads-down focus, but in the end, they always help us in terms of finding areas where we need to do better and make necessary course corrections. In the end, at post review, I think we all sit back and go, boy, that was painful, but we’re glad we went through it because we’re better off now; we have a better plan; we have corrective actions in place.

    We did recently go through a Red Team Review and design review. Our next big review with DOE is this December, and so we intend to not wait for the last minute to really be ready to show DOE that we’re on a good path or on track, as you say.

    Bernhardt: Got it. So, the Red Team Reviews go a long way in helping instill confidence with you, the leadership team, and the program office, your sponsors, that things are, in fact, on track, that you’ve identified the risk, you’re taking all of the mitigation steps, et cetera?

    Kothe: They do. I would say that the outcomes of a Red Team Review are typically, hey, we recommend you consider doing this, this, and this; or, fix this, this, and this. So, we call those formal recommendations. Typically we give ourselves two or three months to respond to those recommendations, to make those fixes. Obviously, if there is a systemic problem found by the Red Team that takes longer to fix, that’s a problem for us. In the end, our Red Team reviews have been very successful suggesting kind of minor tweaks, sort of realizing or relaying to us that we’re, for the most part, on track. And so right now we are actually making a few course corrections and a few changes in our plans as we prepare for our next DOE review; but I do feel like they’re all necessary, needed, and probably the best news is they’re consistent with our expectations as to where we needed to work. So, we’ve not heard things that were sort of orthogonal to our own internal assessment. Having independent assessments is good, and it’s even better when the results are consistent with our own view of where we still need to put some work in.

    Bernhardt: Awesome. So Doug, I’d like to dive into one discussion just quickly here, and it’s in reference to something we’ve heard recently at a number of conferences. Would it be accurate to say that accelerated computing and the implementation of GPUs is going to play a key role in delivering the necessary performance of our DOE exascale systems, and if that is the case, what’s ECP doing to prepare the community for this?

    Kothe: Good question. It is accurate. I think we’re going to see more and more of this, and maybe it’s disingenuous to even call them GPUs, because they’re very purpose-fit, hardware accelerators for specific floating-point operations, or specific operations that may not be floating point. The way I like to think about it is, in ECP—and this is an aspect of co-design—we’re working on hardware-driven algorithm design, but we’re also working on algorithm-driven hardware design; and so there is really a give and take there. Based on our experience with Summit and Sierra, Summit at Oak Ridge, Sierra at Lawrence Livermore, and the coming Perlmutter system, and certainly Titan at Oak Ridge, we have seen, and will continue to see, hardware acceleration on a node. That doesn’t mean it’s easy. The point is we’ve been through this. I think we know what to expect. It is a tremendous potential, this sort of design. So there’s a lot of concurrency, local concurrency, that we can exploit with an accelerator.

    LLNL IBM NVIDIA Mellanox ATS-2 Sierra Supercomputer, NO.2 on the TOP500

    ORNL IBM AC922 SUMMIT supercomputer, No.1 on the TOP500. Credit: Carlos Jones, Oak Ridge National Laboratory/U.S. Dept. of Energy

    Cray Shasta Perlmutter SC18 AMD Epyc Nvidia pre-exascale supeercomputer

    ORNL Cray XK7 Titan Supercomputer, once the fastest in the world, to be decommissioned

    I can now embody my simulation with richer physics, with broader, deeper physical phenomena, with higher-confidence results, because I can afford now to offload some additional physics on the hardware acceleration, or the current algorithms I have in place. If they don’t adapt well to the accelerator, I’ve got to redesign and rethink my algorithms. And so, we’ve been doing that, and the recent announcements of Aurora and Frontier basically tell us that we’re on a good path.

    I think with regard to acceleration moving into the future, my own opinion is we’ll continue to see this post exascale and it could be even more purpose fit, more along the line of ASICs that are very specific to current algorithms. And again, I think what we’re doing in ECP now is really hardware-driven algorithm design, meaning we know accelerators are here. We are figuring out how to best exploit them. In many cases it’s rethinking of our algorithms. I think the hardest part is to figure out how do I change my data structures, how do I rework my algorithms, and so in some cases it’s a wholesale restructure of an application or a software technology. In some cases it’s very surgical for the compute-intensive portions.

    In the end implementation, the hard part is rethinking your algorithms, and the implementation of those reworked algorithms often is much easier than the algorithm rethinking. So whether we’re looking at accelerators from NVIDIA, or AMD, or Intel, the programming models won’t be as dissimilar as one might think.

    4
    nvidia

    6
    AMD

    7
    Intel

    The real challenge is rethinking your algorithms and we’ve been doing that since the start of ECP. So, not that we’re not going to have some challenges and hurdles, but I do think that these recent announcements have pretty much met with where we thought things were going to go, and so in that sense I do believe we really are on track relative to our objectives.

    Bernhardt: Recent comments from some conferences indicates folks in the application development community think that this (wider spread use of accelerators) is a pretty big, heavy lift. It’s a learning curve that they’re going to have to go through with the growing use of accelerators. Is that the proper way to frame it, do you think?

    Kothe: It certainly isn’t easy, and I don’t want to downplay the fact that this can be difficult and challenging. I think it requires conceptual rethinking of algorithms. Now, in ECP we have a whole spectrum of application software maturity relative to the accelerators. We have many applications in software technology products that have already reworked their algorithm design and are achieving fantastic performance on, say, Summit.

    And so we would anticipate, I think with fairly low risk, that moving that implementation from Summit to Aurora or Frontier may not be seamless, but won’t be a heavy lift, so to speak.

    We have other applications and software technology products that are not quite there yet in terms of rethinking and redesigning their algorithms, and so these comments certainly do apply to some aspect of ECP.

    In terms of, say, our upcoming DOE review, one key aspect of this review is determining if we are prepared to really help those teams move along more quickly, more with a sense of urgency. Can we take the successful experiences of some applications and apply those lessons learned and best practices to others, and I think we can.

    In our three focus areas—software, applications, and hardware and integration—we have a number of projects that have more or less a direct line of sight to essentially figuring out the techniques for exploiting those hardware accelerations. So I feel that in terms of the way we’re scoped, we have the efforts in place to help bring along everybody and, you know, the fact that we’re a large project with lots of teams allows us to cross-fertilize and share experiences and lessons learned, and that helps reduce risk with regard to moving things along.

    So, I think when you’re first exposed to these accelerators, you have to sit back and go, okay, wow; this is a tremendous opportunity, but I’ve also got to rethink how I’ve been doing things. In many cases it’s back to the future. Some of the algorithms designed for Cray vector machines in the 70s and 80s, now are apropos and work well on accelerators such as Summit. We have direct evidence that this is not necessarily reinventing or inventing from whole cloth. It might be sort of accessing an algorithm that was used successfully in the past and is again useful now.

    Bernhardt: Just another tool in the application developer’s bag of tricks, huh?

    Kothe: That’s right. Indeed it is, and I think the teams realize that they’re not going to succeed or succeed on the path that we have in front of us by closing their door and trying to do all of this on their own. And so, we really are managing and tracking and forcing, frankly, integration of efforts, especially the software stack. Key products that applications need to not just be aware of, but actually use. And so in many cases the applications are, to some extent, passing the risk or the challenge of exploiting on-node accelerators to the software technology products, and that makes a lot of sense. And in many cases as well, they’re not doing that, for a good reason. So, this is one of the advantages of having a large project where we can plug pieces together to make basically the whole greater than the sum of the parts.

    Bernhardt: Got it. Yeah. It makes a lot of sense. So, within ECP, some efforts that I’ve noticed have been expanding and some have been consolidating. Maybe you could give us a few of the current stats to frame where we are today for the listeners, more like ECP by the numbers.

    Kothe: Okay. So, we always are taking a hard look at how we’re organized and trying to see if there is a simpler way to put our organization together in terms of managing. Really, it’s not about the boxology so much, because the challenges are always managing at the interfaces, but we have worked hard to consolidate and simplify where possible and where it makes sense. So right now in ECP we have 81 R&D projects and that’s come down from about 100. So, where we found areas where we could consolidate, we did that, and it wasn’t oil and water. We didn’t force it just for the sake of trying to decrease the number; but in every case that we’ve done this, it has helped. So, let me give an example: In software technology, led by Mike Heroux at Sandia and Jonathan Carter at Berkeley, they recognized that there were several smaller projects, say, looking at I/O and by putting them together there were synergies there that we could take advantage of where they could adopt and use each other’s approaches, and we could move toward maybe one API for a particular I/O instance. And so the consolidation wasn’t just, hey, let’s reduce the number of projects—this is too hard to manage. It was really driven by what makes technical sense, and so right now I think we’re in really good shape to move into what we call our performance baseline period, which will be this fall and early next year, meaning our current structure of 81 teams, still over 1,000 researchers across the HPC and computational science community and industry, academia, and DOE labs; but I think this restructuring has us in really good position for the stretch run as we see Aurora and Frontier delivered.

    Bernhardt: You mentioned a few of the folks there, and that leads into what I wanted to get to next. ECP’s success, in fact, the Nation’s success with exascale and bringing it to life depends on a very, very large group of people and it’s more than just the ECP. You know, the collaborating agencies, the collaborating universities, the technical vendor community that ultimately will stand up the systems. I know it’s difficult to single out just a few individuals when there are so many that are making these important contributions, but perhaps you could take a few minutes to acknowledge at least some of the folks, maybe from the leadership team level and so forth that often work behind the scenes a fair amount and don’t get the recognition they deserve.

    Kothe: Yeah. That’s a good point. Let me start first with our Department of Energy sponsors. Barb Helland in the Advanced Scientific Computing Research (ASCR) office and the Office of Science (SC), Thuc Hoang on the Advanced Simulation and Computing Program (ASC) in the National Nuclear Security Administration (NNSA), and Dan Hogue in the Oak Ridge National Lab site office here, who’s our Federal Project Director. They have been fantastic in their support, and that doesn’t always mean it’s a thumbs up, team, you guys are doing great. It could mean you guys need to work on this, and so they give us a good, honest, objective assessment and they’re always there. We speak to them daily, weekly, all of the time. So our sponsors have been fantastic in making sure we’re on the right course and giving us the support that we need.

    Our leadership team, again, consists of about 30 or so—I think 32 by last count—DOE staff across six labs; and we’ve been really fortunate to have leaders in the community with a proven track record and the trust and respect of their colleagues. We’ve been together now as a team for most of the time ECP has been in existence, meaning there hasn’t been a lot of turnover—not that that’s bad—but people are all in; they’re committed; they have the passion and the energy.

    Many people, to quote some of our leaders, feel like this is the culmination of their careers, feeling like their whole career was built for this, and so, you know, that really helps during, say, tough times where you’re trying to prepare for a review when you realize that this is something that I feel like my whole career was built around. We have many people who feel that way.

    To single out some names, our three focus areas, software technology, led by Mike Heroux at Sandia National Lab and Jonathan Carter at Lawrence Berkeley, really are up and running on all cylinders. And they, Mike and Jonathan, have made a lot of very productive and useful changes in how things are running and organized. They work hard to make sure our software products have a line of sight up into software development kits and are released and deployed on the facilities.

    Terry Quinn at Lawrence Livermore and Susan Coghlan at Argonne National Lab run our hardware and technology focus area. Both Terry and Susan—I don’t know if people appreciate this—are dual hatted in that Terry is really on point for a large part of the El Capitan procurement and deployment at Lawrence Livermore, and Susan for Aurora; and so, we’re really fortunate to have two leaders in the field for procuring, deploying, and operating HPC systems but also leading our staff in terms of what does it take to make sure that products and applications are production quality and get deployed and used on these systems. So, their feet are sort of on both sides of the fence there.

    And then in the applications area, Andrew Siegel at Argonne and Eric Draeger at Livermore lead that area, and they’ve really taken our applications from what looked like, say, three years ago some interesting may-work sort of R&D efforts to the applications now that have very specific challenge problems.

    We’re assessing them annually. They have very specific metrics and they’re really, for the most part, all on track. So, these folks have been fantastic in leading these efforts. And I said there were over 30 leaders, so Andrew and Eric, for example, have a team of five or six that each oversee over half a dozen of these R&D projects. But the 81 R&D projects all have principal investigators leading these projects who are, for the most part, senior people with career-track records. I try to, and I think our leadership team does as well, call out these PIs, because that’s really where the work is getting done; and we’re lucky to have these PIs, who are all in, just like the leadership team, to make sure we succeed.

    Bernhardt: And a lot of the behind-the-scenes, heavy lifting that takes place is with the project office, which happens to be housed at Oak Ridge.

    Kothe: Yes, and I’m glad you brought that up. They are. This really isn’t a customer-client relationship between the PhD scientists and the project office. They really are our peers. The PhD scientists responsible for leading the technical areas have learned a lot from the project office about what good project management looks like; what is our responsibility; how do we need to track costs and schedule performance. It’s a tremendous responsibility with the budget we have. And so the project office is in itself a small organization that’s made up of people who care about risk, project controls, budget, procurement. All of these things are day-to-day sort of contact sports, so to speak, with regard to our technical leaders. So, I sit personally at Oak Ridge National Lab, and I think this lab in particular, as many other labs, has a very good track record in project management and leading and executing on large projects. So, we’re fortunate to have a project office staffed almost entirely here at Oak Ridge that has been through the trenches in running and being part of large projects, and knows what to expect. This is a unique gig in ECP, but I think we’ve figured out how to really tailor this to formal project management, sort of in and around doing more exploratory high-risk research.

    Bernhardt: Great. This has been a good update, Doug. I’d like to wrap up with one topic that I know is near and dear to you. Talk a little bit about, if you could, the enduring legacy of the Exascale Computing Project.

    Kothe: Very good point. I wouldn’t be here, and I don’t think the leadership team or the staff would be here if we didn’t think that there was going to be an enduring legacy. The beauty of a seven-year project is it allows you to have a sense of urgency, and a sprint, and you pay attention to metrics, and you really make sure you can dot I’s and cross T’s, but a project would fail if the leave-behind wasn’t useful. So, let me take you through applications, for example.

    Enduring legacy translates to having dozens of application technologies that will be used to tackle some of the toughest problems in DOE and the nation, and so the applications are now going to be positioned to address their challenge problems and in many cases help solve them or be a part of the solution. So, an enduring legacy for us is the applications now are going to be ready at exascale to tackle currently intractable problems and when I say tackle, many, many program offices in DOE—by last count there were ten of them—and other federal agencies are going to essentially use these as their science and engineering tools, so that’s an important legacy. In software technology I think what we’re seeing with the leadership of Mike and Jonathan is the genesis of a probably multi-decade software stack that’s going to be used and deployed on many HPC systems, well beyond Aurora, Frontier, and El Capitan. And I think that by paying attention to what it takes to containerize and package things up, and make them production quality, and make them basically adhering to application and hardware requirements, we’re going to see a software stack that I think DOE will continue to support, maintain, and require on HPC systems in the future. Time will tell post ECP. But we wouldn’t be involved in the ECP if we didn’t expect and, frankly, require our efforts to really have a line of sight well beyond 2023.

    Bernhardt: Great. That’s all I have. Is there anything else that you’d like to throw out there for the community at this point in time?

    Kothe: Just that we appreciate the support, the engagement of the HPC, R&D, and computational science community. I’m not going to claim that we always have all of the answers, so we encourage the community to feel free to touch base with us, myself personally, or the leadership team. There are ways that you can collaborate and work with us. There are certainly ways that you can engage and help us move forward. We’re really lucky to be a part of this big project and always happy to hear about new suggestions and new possibilities from the community at large.

    See the full article here.

    five-ways-keep-your-child-safe-school-shootings

    Please help promote STEM in your local schools.

    Stem Education Coalition

    About ECP

    The ECP is a collaborative effort of two DOE organizations – the Office of Science and the National Nuclear Security Administration. As part of the National Strategic Computing initiative, ECP was established to accelerate delivery of a capable exascale ecosystem, encompassing applications, system software, hardware technologies and architectures, and workforce development to meet the scientific and national security mission needs of DOE in the early-2020s time frame.

    About the Office of Science

    DOE’s Office of Science is the single largest supporter of basic research in the physical sciences in the United States and is working to address some of the most pressing challenges of our time. For more information, please visit https://science.energy.gov/.

    About NNSA

    Established by Congress in 2000, NNSA is a semi-autonomous agency within the DOE responsible for enhancing national security through the military application of nuclear science. NNSA maintains and enhances the safety, security, and effectiveness of the U.S. nuclear weapons stockpile without nuclear explosive testing; works to reduce the global danger from weapons of mass destruction; provides the U.S. Navy with safe and effective nuclear propulsion; and responds to nuclear and radiological emergencies in the United States and abroad. https://nnsa.energy.gov

    The Goal of ECP’s Application Development focus area is to deliver a broad array of comprehensive science-based computational applications that effectively utilize exascale HPC technology to provide breakthrough simulation and data analytic solutions for scientific discovery, energy assurance, economic competitiveness, health enhancement, and national security.

    Awareness of ECP and its mission is growing and resonating—and for good reason. ECP is an incredible effort focused on advancing areas of key importance to our country: economic competiveness, breakthrough science and technology, and national security. And, fortunately, ECP has a foundation that bodes extremely well for the prospects of its success, with the demonstrably strong commitment of the US Department of Energy (DOE) and the talent of some of America’s best and brightest researchers.

    ECP is composed of about 100 small teams of domain, computer, and computational scientists, and mathematicians from DOE labs, universities, and industry. We are tasked with building applications that will execute well on exascale systems, enabled by a robust exascale software stack, and supporting necessary vendor R&D to ensure the compute nodes and hardware infrastructure are adept and able to do the science that needs to be done with the first exascale platforms.

     
  • richardmitnick 6:52 am on July 22, 2019 Permalink | Reply
    Tags: , , , ECP- Exascale Computing Project, , , , QMCPACK, , , The quantum Monte Carlo (QMC) family of these approaches is capable of delivering the most highly accurate calculations of complex materials without biasing the results of a property of interest.   

    From insideHPC: “Supercomputing Complex Materials with QMCPACK” 

    From insideHPC

    July 21, 2019

    In this special guest feature, Scott Gibson from the Exascale Computing Project writes that computer simulations based on quantum mechanics are getting a boost through QMCPACK.

    2

    The theory of quantum mechanics underlies explorations of the behavior of matter and energy in the atomic and subatomic realms. Computer simulations based on quantum mechanics are consequently essential in designing, optimizing, and understanding the properties of materials that have, for example, unusual magnetic or electrical properties. Such materials would have potential for use in highly energy-efficient electrical systems and faster, more capable electronic devices that could vastly improve our quality of life.

    Quantum mechanics-based simulation methods render robust data by describing materials in a truly first-principles manner. This means they calculate electronic structure in the most basic terms and thus can allow speculative study of systems of materials without reference to experiment, unless researchers choose to add parameters. The quantum Monte Carlo (QMC) family of these approaches is capable of delivering the most highly accurate calculations of complex materials without biasing the results of a property of interest.

    An effort within the US Department of Energy’s Exascale Computing Project (ECP) is developing a QMC methods software named QMCPACK to find, predict, and control materials and properties at the quantum level. The ultimate aim is to achieve an unprecedented and systematically improvable accuracy by leveraging the memory and power capabilities of the forthcoming exascale computing systems.

    Greater Accuracy, Versatility, and Performance

    One of the primary objectives of the QMCPACK project is to reduce errors in calculations so that predictions concerning complex materials can be made with greater assurance.

    “We would like to be able to tell our colleagues in experimentation that we have confidence that a certain short list of materials is going to have all the properties that we think they will,” said Paul Kent of Oak Ridge National Laboratory and principal investigator of QMCPACK. “Many ways of cross-checking calculations with experimental data exist today, but we’d like to go further and make predictions where there aren’t experiments yet, such as a new material or where taking a measurement is difficult—for example, in conditions of high pressure or under an intense magnetic field.”

    The methods the QMCPACK team is developing are fully atomistic and material specific. This refers to having the capability to address all of the atoms in the material—whether it be silver, carbon, cerium, or oxygen, for example—compared with more simplified lattice model calculations where the full details of the atoms are not included.

    The team’s current activities are restricted to simpler, bulk-like materials; but exascale computing is expected to greatly widen the range of possibilities.

    “At exascale not only the increase in compute power but also important changes in the memory on the machines will enable us to explore material defects and interfaces, more-complex materials, and many different elements,” Kent said.

    With the software engineering, design, and computational aspects of delivering the science as the main focus, the project plans to improve QMCPACK’s performance by at least 50x. Based on experimentation using a mini-app version of the software, and incorporating new algorithms, the team achieved a 37x improvement on the pre-exascale Summit supercomputer versus the Titan system.

    ORNL IBM AC922 SUMMIT supercomputer, No.1 on the TOP500. Credit: Carlos Jones, Oak Ridge National Laboratory/U.S. Dept. of Energy

    ORNL Cray XK7 Titan Supercomputer, once the fastest in the world, to be decommissioned

    One Robust Code

    “We’re taking the lessons we’ve learned from developing the mini app and this proof of concept, the 37x, to update the design of the main application to support this high efficiency, high performance for a range of problem sizes,” Kent said. “What’s crucial for us is that we can move to a single version of the code with no internal forks, to have one source supporting all architectures. We will use all the lessons we’ve learned with experimentation to create one version where everything will work everywhere—then it’s just a matter of how fast. Moreover, in the future we will be able to optimize. But at least we won’t have a gap in the feature matrix, and the student who is running QMCPACK will always have all features work.”

    As an open-source and openly developed product, QMCPACK is improving via the help of many contributors. The QMCPACK team recently published the master citation paper for the software’s code; the publication has 48 authors with a variety of affiliations.

    “Developing these large science codes is an enormous effort,” Kent said. “QMCPACK has contributors from ECP researchers, but it also has many past developers. For example, a great deal of development was done for the Knights Landing processor on the Theta supercomputer with Intel. This doubled the performance on all CPU-like architectures.”

    ANL ALCF Theta Cray XC40 supercomputer

    A Synergistic Team

    The QMCPACK project’s collaborative team draws talent from Argonne, Lawrence Livermore, Oak Ridge, and Sandia National Laboratories.




    It also benefits from collaborations with Intel and NVIDIA.

    3

    The composition of the staff is nearly equally divided between scientific domain specialists and people centered on the software engineering and computer science aspects.

    “Bringing all of this expertise together through ECP is what has allowed us to perform the design study, reach the 37x, and improve the architecture,” Kent said. “All the materials we work with have to be doped, which means incorporating additional elements in them. We can’t run those simulations on Titan but are beginning to do so on Summit with improvements we have made as part of our ECP project. We are really looking forward to the opportunities that will open up when the exascale systems are available.”

    See the full article here .

    five-ways-keep-your-child-safe-school-shootings

    Please help promote STEM in your local schools.

    Stem Education Coalition

    Founded on December 28, 2006, insideHPC is a blog that distills news and events in the world of HPC and presents them in bite-sized nuggets of helpfulness as a resource for supercomputing professionals. As one reader said, we’re sifting through all the news so you don’t have to!

    If you would like to contact me with suggestions, comments, corrections, errors or new company announcements, please send me an email at rich@insidehpc.com. Or you can send me mail at:

    insideHPC
    2825 NW Upshur
    Suite G
    Portland, OR 97239

    Phone: (503) 877-5048

     
  • richardmitnick 1:34 pm on June 7, 2019 Permalink | Reply
    Tags: Among the achievements made possible by lattice QCD is the calculation of the masses of quarks., , As another means of exploring the nature of matter researchers collide electrons and protons together at Jefferson Lab to get a more vivid picture of the proton., , Deep Underground Neutrino Experiment (DUNE) at Sanford Underground Research Facility (SURF) in South Dakota., ECP- Exascale Computing Project, Exascale computing will be absolutely essential to extending the precision part of what we do., Famous for the study of neutrinos Fermilab shoots beams of neutrinos at detectors located on site and in Minnesota., Fermilab scientist Andreas Kronfeld is principal investigator of ECP’s LatticeQCD project., More than 1000 collaborators are working on the DUNE project, Neutrinos do in fact have masses - albeit tiny., , Subtle and elusive particles neutrinos permeate the universe and pass through matter but rarely interact.,   

    From Exascale Computing Project: “High Precision for Studying the Building Blocks of the Universe” 

    From Exascale Computing Project

    05/28/19
    Scott Gibson

    1
    Fermilab scientist Andreas Kronfeld is principal investigator of ECP’s LatticeQCD project.

    Quantum chromodynamics (QCD) is the quantum field theory of the subatomic particles called quarks and gluons. QCD explains what is known as the strong nuclear force, the interaction that holds protons and neutrons together in atomic nuclei and shapes the structure of nearly all visible matter.

    A project within the US Department of Energy’s (DOE) Exascale Computing Project (ECP) called LatticeQCD is increasing the precision of QCD calculations to understand the properties of quarks and gluons in the Standard Model of particle physics, a theory that clarifies the basic building blocks (or fundamental particles) of the universe and how they are related.

    Precision and Illumination

    Lattice QCD calculations are the scientific instrument to connect observed properties of hadrons (particles that contain quarks) to fundamental laws of quarks and gluons. This instrument serves as a critical complement to experiments such as the ones taking place at Brookhaven National Lab and at CERN to study a phenomenon called quark gluon plasma.

    “To interpret these experiments and many others in particle physics and all of nuclear physics, we need both the precision side and the illumination side,” said Fermilab scientist Andreas Kronfeld, principal investigator of the LatticeQCD project.

    “Exascale computing will be absolutely essential to extending the precision part of what we do to small nuclei and more complicated properties of protons and neutrons that we’ve been able to achieve to date,” he said. “These calculations are not only interesting in their own right because they make clear an emerging general class of fascinating physical phenomena, but they’re also central for interpreting all experiments in particle physics and nuclear physics.”

    Among the achievements made possible by lattice QCD is the calculation of the masses of quarks. “These are fundamental constants of nature comparable to the electric mass, and so they exemplify the use of precision,” Kronfeld said. “We now want to extend a similar level of rigor to the neutrino sector.”

    Subtle and elusive particles, neutrinos permeate the universe and pass through matter but rarely interact. The Standard Model predicted that neutrinos would have no mass, but about twenty years ago experiments revealed that they do in fact have masses, albeit tiny. Moreover, they are the most abundant particle with mass, and by learning more about them, researchers could increase understanding of the most fundamental physics in the universe.

    In experiments, neutrinos are scattered off the nucleus of a carbon, oxygen, or argon atom. “We need to understand not only how the neutrino interacts with a nucleon [a proton or neutron] but also how it interacts with the whole nucleus,” Kronfeld said. “This is why it is so important to extend the precision that we’ve done for similar things to nucleons and nuclei.”

    Famous for the study of neutrinos, Fermilab shoots beams of neutrinos at detectors located on site and in Minnesota.

    FNAL NOvA Near Detector

    FNAL/NOvA experiment map

    In the future, the lab will target detectors even farther away at the Deep Underground Neutrino Experiment (DUNE) under construction at the Sanford Underground Research Facility in South Dakota.

    FNAL LBNF/DUNE from FNAL to SURF, Lead, South Dakota, USA

    More than 1,000 collaborators are working on the DUNE project, which is a leading-edge, international experiment for neutrino science and proton decay studies.

    Meanwhile, as another means of exploring the nature of matter, researchers collide electrons and protons together at Jefferson Lab to get a more vivid picture of the proton. As with the neutrino experiments, theoretical calculations are required to make sense of the results—in addition, the same is true for heavy ion collision work in nuclear physics. “There has been an excellent cross talk between results from such experimentation on the one hand and lattice QCD calculations on the other,” Kronfeld said.

    “What we now think is that there is a critical point, a point where water vapor and liquid water and ice can coexist when you have a high enough baryon [composite subatomic particle] density,” he said. “We’ll need exascale computing to understand that point at the same time that the experimentalists are trying to discover it. Again, that’s a case where we learned qualitative and quantitative information. The first is interesting—the second is essential.”

    Pre-exascale Improvements on the Path to Exascale

    Kronfeld explained that the pre-exascale supercomputer Summit at the Oak Ridge Leadership Computing Facility (OLCF) at Oak Ridge National Laboratory is allowing the LatticeQCD project team to increase the feasibility of its complicated, difficult calculations and thus make expensive experiments worth the investment.

    ORNL IBM AC922 SUMMIT supercomputer, No.1 on the TOP500. Credit: Carlos Jones, Oak Ridge National Laboratory/U.S. Dept. of Energy

    He said the advances on Summit are manifested in four ways.

    First, a group from the LatticeQCD project team is striving to understand how to do the computation for what is called the Dirac equation, which is central to research in electrodynamics and chromodynamics. The equation is used repeatedly in lattice QCD calculations. On Summit the team is devising and implementing better algorithms to solve the equation.

    “I’m excited by the improvement in the solutions to the Dirac equation the group has made,” he said. “My collaborators have come up with multigrid methods that now finally work after 20 years of dreaming about it.”

    A second focus is a probe of small nuclei and associated complicated calculations. Another LatticeQCD project group is studying how to perform the calculation efficiently by mapping the details of the problem onto the architecture of Summit. “We anticipate that Frontier exascale machine will be similar, and when we learn more about Aurora, the group will be mapping to that system as well,” Kronfeld said.

    Depiction of ANL ALCF Cray Intel SC18 Shasta Aurora exascale supercomputer

    ORNL Cray Frontier Shasta based Exascale supercomputer with Slingshot interconnect featuring high-performance AMD EPYC CPU and AMD Radeon Instinct GPU technology

    The ECP Factor

    “The Exascale Computing Project has been breathtaking to watch,” Kronfeld said. “There’s never been anything like this before. We had support at a smaller scale, but the ambition has led to improvements in algorithms that we used to dream about but didn’t have the resources, the support, and also the access to the machine, to test and verify. We had no idea how essential it would be before we started. Whoever came up with this idea really needs to be commended. I think it is a fantastic investment. These computers are not cheap, and to have people thoughtfully consider how to use them before they come online has just been brilliant.”

    Another task is to evolve what is known as a Markov chain, which Kronfeld described as attempts to create random snapshots of a process taken at various rates of speed based on the details of algorithms. The LatticeQCD project team has a group that is endeavoring to accelerate the Markov chain.

    “When I worked on the Markov chain as a graduate student, I wasn’t successful because, frankly, you couldn’t see the difference in speedup using the computers we had then, but it seems to be bearing fruit now—that’s personally satisfying,” he said.

    The fourth area being pursued by the LatticeQCD project team on Summit is the development of software better suited to the aims of the effort. This improved software, Kronfeld explained, will be crucial to analyzing data on an exascale machine. Researchers outside ECP at the University of Edinburgh are collaborating in the work.

    See the full article here.

    five-ways-keep-your-child-safe-school-shootings

    Please help promote STEM in your local schools.

    Stem Education Coalition

    About ECP

    The ECP is a collaborative effort of two DOE organizations – the Office of Science and the National Nuclear Security Administration. As part of the National Strategic Computing initiative, ECP was established to accelerate delivery of a capable exascale ecosystem, encompassing applications, system software, hardware technologies and architectures, and workforce development to meet the scientific and national security mission needs of DOE in the early-2020s time frame.

    About the Office of Science

    DOE’s Office of Science is the single largest supporter of basic research in the physical sciences in the United States and is working to address some of the most pressing challenges of our time. For more information, please visit https://science.energy.gov/.

    About NNSA

    Established by Congress in 2000, NNSA is a semi-autonomous agency within the DOE responsible for enhancing national security through the military application of nuclear science. NNSA maintains and enhances the safety, security, and effectiveness of the U.S. nuclear weapons stockpile without nuclear explosive testing; works to reduce the global danger from weapons of mass destruction; provides the U.S. Navy with safe and effective nuclear propulsion; and responds to nuclear and radiological emergencies in the United States and abroad. https://nnsa.energy.gov

    The Goal of ECP’s Application Development focus area is to deliver a broad array of comprehensive science-based computational applications that effectively utilize exascale HPC technology to provide breakthrough simulation and data analytic solutions for scientific discovery, energy assurance, economic competitiveness, health enhancement, and national security.

    Awareness of ECP and its mission is growing and resonating—and for good reason. ECP is an incredible effort focused on advancing areas of key importance to our country: economic competiveness, breakthrough science and technology, and national security. And, fortunately, ECP has a foundation that bodes extremely well for the prospects of its success, with the demonstrably strong commitment of the US Department of Energy (DOE) and the talent of some of America’s best and brightest researchers.

    ECP is composed of about 100 small teams of domain, computer, and computational scientists, and mathematicians from DOE labs, universities, and industry. We are tasked with building applications that will execute well on exascale systems, enabled by a robust exascale software stack, and supporting necessary vendor R&D to ensure the compute nodes and hardware infrastructure are adept and able to do the science that needs to be done with the first exascale platforms.

     
  • richardmitnick 10:52 am on March 25, 2019 Permalink | Reply
    Tags: , , , , , ECP- Exascale Computing Project, ExaLearn, , , , ,   

    From insideHPC: “ExaLearn Project to bring Machine Learning to Exascale” 

    From insideHPC

    March 24, 2019

    As supercomputers become ever more capable in their march toward exascale levels of performance, scientists can run increasingly detailed and accurate simulations to study problems ranging from cleaner combustion to the nature of the universe. Enter ExaLearn, a new machine learning project supported by DOE’s Exascale Computing Project (ECP), aims to develop new tools to help scientists overcome this challenge by applying machine learning to very large experimental datasets and simulations.

    1
    The first research area for ExaLearn’s surrogate models will be in cosmology to support projects such a the LSST (Large Synoptic Survey Telescope) now under construction in Chile and shown here in an artist’s rendering. (Todd Mason, Mason Productions Inc. / LSST Corporation)

    “The challenge is that these powerful simulations require lots of computer time. That is, they are “computationally expensive,” consuming 10 to 50 million CPU hours for a single simulation. For example, running a 50-million-hour simulation on all 658,784 compute cores on the Cori supercomputer NERSC would take more than three days.

    NERSC

    NERSC Cray Cori II supercomputer at NERSC at LBNL, named after Gerty Cori, the first American woman to win a Nobel Prize in science

    NERSC Hopper Cray XE6 supercomputer


    LBL NERSC Cray XC30 Edison supercomputer


    The Genepool system is a cluster dedicated to the DOE Joint Genome Institute’s computing needs. Denovo is a smaller test system for Genepool that is primarily used by NERSC staff to test new system configurations and software.

    NERSC PDSF


    PDSF is a networked distributed computing cluster designed primarily to meet the detector simulation and data analysis requirements of physics, astrophysics and nuclear science collaborations.

    Future:

    Cray Shasta Perlmutter SC18 AMD Epyc Nvidia pre-exascale supeercomputer

    Running thousands of these simulations, which are needed to explore wide ranges in parameter space, would be intractable.

    One of the areas ExaLearn is focusing on is surrogate models. Surrogate models, often known as emulators, are built to provide rapid approximations of more expensive simulations. This allows a scientist to generate additional simulations more cheaply – running much faster on many fewer processors. To do this, the team will need to run thousands of computationally expensive simulations over a wide parameter space to train the computer to recognize patterns in the simulation data. This then allows the computer to create a computationally cheap model, easily interpolating between the parameters it was initially trained on to fill in the blanks between the results of the more expensive models.

    “Training can also take a long time, but then we expect these models to generate new simulations in just seconds,” said Peter Nugent, deputy director for science engagement in the Computational Research Division at LBNL.

    From Cosmology to Combustion

    Nugent is leading the effort to develop the so-called surrogate models as part of ExaLearn. The first research area will be cosmology, followed by combustion. But the team expects the tools to benefit a wide range of disciplines.

    “Many DOE simulation efforts could benefit from having realistic surrogate models in place of computationally expensive simulations,” ExaLearn Principal Investigator Frank Alexander of Brookhaven National Lab said at the recent ECP Annual Meeting.

    “These can be used to quickly flesh out parameter space, help with real-time decision making and experimental design, and determine the best areas to perform additional simulations.”

    The surrogate models and related simulations will aid in cosmological analyses to reduce systematic uncertainties in observations by telescopes and satellites. Such observations generate massive datasets that are currently limited by systematic uncertainties. Since we only have a single universe to observe, the only way to address these uncertainties is through simulations, so creating cheap but realistic and unbiased simulations greatly speeds up the analysis of these observational datasets. A typical cosmology experiment now requires sub-percent level control of statistical and systematic uncertainties. This then requires the generation of thousands to hundreds of thousands of computationally expensive simulations to beat down the uncertainties.

    These parameters are critical in light of two upcoming programs:

    The Dark Energy Spectroscopic Instrument, or DESI, is an advanced instrument on a telescope located in Arizona that is expected to begin surveying the universe this year.

    LBNL/DESI Dark Energy Spectroscopic Instrument for the Nicholas U. Mayall 4-meter telescope at Kitt Peak National Observatory near Tucson, Ariz, USA


    NOAO/Mayall 4 m telescope at Kitt Peak, Arizona, USA, Altitude 2,120 m (6,960 ft)

    DESI seeks to map the large-scale structure of the universe over an enormous volume and a wide range of look-back times (based on “redshift,” or the shift in the light of distant objects toward redder wavelengths of light). Targeting about 30 million pre-selected galaxies across one-third of the night sky, scientists will use DESI’s redshifts data to construct 3D maps of the universe. There will be about 10 terabytes (TB) of raw data per year transferred from the observatory to NERSC. After running the data through the pipelines at NERSC (using millions of CPU hours), about 100 TB per year of data products will be made available as data releases approximately once a year throughout DESI’s five years of operations.

    The Large Synoptic Survey Telescope, or LSST, is currently being built on a mountaintop in Chile.

    LSST


    LSST Camera, built at SLAC



    LSST telescope, currently under construction on the El Peñón peak at Cerro Pachón Chile, a 2,682-meter-high mountain in Coquimbo Region, in northern Chile, alongside the existing Gemini South and Southern Astrophysical Research Telescopes.


    LSST Data Journey, Illustration by Sandbox Studio, Chicago with Ana Kova

    When completed in 2021, the LSST will take more than 800 panoramic images each night with its 3.2 billion-pixel camera, recording the entire visible sky twice each week. Each patch of sky it images will be visited 1,000 times during the survey, and each of its 30-second observations will be able to detect objects 10 million times fainter than visible with the human eye. A powerful data system will compare new with previous images to detect changes in brightness and position of objects as big as far-distant galaxy clusters and as small as nearby asteroids.

    For these programs, the ExaLearn team will first target large-scale structure simulations of the universe since the field is more developed than others and the scale of the problem size can easily be ramped up to an exascale machine learning challenge.

    As an example of how ExaLearn will advance the field, Nugent said a researcher could run a suite of simulations with the parameters of the universe consisting of 30 percent dark energy and 70 percent dark matter, then a second simulation with 25 percent and 75 percent, respectively. Each of these simulations generates three-dimensional maps of tens of billions of galaxies in the universe and how the cluster and spread apart as time goes by. Using a surrogate model trained on these simulations, the researcher could then quickly run another surrogate model that would generate the output of a simulation in between these values, at 27.5 and 72.5 percent, without needing to run a new, costly simulation — that too would show the evolution of the galaxies in the universe as a function of time. The goal of the ExaLearn software suite is that such results, and their uncertainties and biases, would be a byproduct of the training so that one would know the generated models are consistent with a full simulation.

    Toward this end, Nugent’s team will build on two projects already underway at Berkeley Lab: CosmoFlow and CosmoGAN. CosmoFlow is a deep learning 3D convolutional neural network that can predict cosmological parameters with unprecedented accuracy using the Cori supercomputer at NERSC. CosmoGAN is exploring the use of generative adversarial networks to create cosmological weak lensing convergence maps — maps of the matter density of the universe as would be observed from Earth — at lower computational costs.

    See the full article here .

    five-ways-keep-your-child-safe-school-shootings

    Please help promote STEM in your local schools.

    Stem Education Coalition

    Founded on December 28, 2006, insideHPC is a blog that distills news and events in the world of HPC and presents them in bite-sized nuggets of helpfulness as a resource for supercomputing professionals. As one reader said, we’re sifting through all the news so you don’t have to!

    If you would like to contact me with suggestions, comments, corrections, errors or new company announcements, please send me an email at rich@insidehpc.com. Or you can send me mail at:

    insideHPC
    2825 NW Upshur
    Suite G
    Portland, OR 97239

    Phone: (503) 877-5048

     
  • richardmitnick 7:08 am on July 21, 2018 Permalink | Reply
    Tags: , , ECP- Exascale Computing Project,   

    From Exascale Computing Project: “ECP Announces New Co-Design Center to Focus on Exascale Machine Learning Technologies” 

    From Exascale Computing Project

    07/20/18

    The Exascale Computing Project has initiated its sixth Co-Design Center, ExaLearn, to be led by Principal Investigator Francis J. Alexander, Deputy Director of the Computational Science Initiative at the U.S. Department of Energy’s (DOE) Brookhaven National Laboratory.

    1
    Francis J. Alexander. BNL


    ExaLearn is a co-design center for Exascale Machine Learning (ML) Technologies and is a collaboration initially consisting of experts from eight multipurpose DOE labs.

    Brookhaven National Laboratory (Francis J. Alexander)
    Argonne National Laboratory (Ian Foster)
    Lawrence Berkeley National Laboratory (Peter Nugent)
    Lawrence Livermore National Laboratory (Brian van Essen)
    Los Alamos National Laboratory (Aric Hagberg)
    Oak Ridge National Laboratory (David Womble)
    Pacific Northwest National Laboratory (James Ang)
    Sandia National Laboratories (Michael Wolf)

    Rapid growth in the amount of data and computational power is driving a revolution in machine learning (ML) and artificial intelligence (AI). Beyond the highly visible successes in machine-based natural language translation, these new ML technologies have profound implications for computational and experimental science and engineering and the exascale computing systems that DOE is deploying to support those disciplines.

    To address these challenges, the ExaLearn co-design center will provide exascale ML software for use by ECP Applications projects, other ECP Co-Design Centers and DOE experimental facilities and leadership class computing facilities. The ExaLearn Co-Design Center will also collaborate with ECP PathForward vendors on the development of exascale ML software.

    The timeliness of ExaLearn’s proposed work ties into the critical national need to enhance economic development through science and technology. It is increasingly clear that advances in learning technologies have profound societal implications and that continued U.S. economic leadership requires a focused effort, both to increase the performance of those technologies and to expand their applications. Linking exascale computing and learning technologies represents a timely opportunity to address those goals.

    The practical end product will be a scalable and sustainable ML software framework that allows application scientists and the applied mathematics and computer science communities to engage in co-design for learning. The new knowledge and services to be provided by ExaLearn are imperative for the nation to remain competitive in computational science and engineering by making effective use of future exascale systems.

    “Our multi-laboratory team is very excited to have the opportunity to tackle some of the most important challenges in machine learning at the exascale,” Alexander said. “There is, of course, already a considerable investment by the private sector in machine learning. However, there is still much more to be done in order to enable advances in very important scientific and national security work we do at the Department of Energy. I am very happy to lead this effort on behalf of our collaborative team.”

    See the full article here.

    five-ways-keep-your-child-safe-school-shootings

    Please help promote STEM in your local schools.

    Stem Education Coalition

    About ECP

    The ECP is a collaborative effort of two DOE organizations – the Office of Science and the National Nuclear Security Administration. As part of the National Strategic Computing initiative, ECP was established to accelerate delivery of a capable exascale ecosystem, encompassing applications, system software, hardware technologies and architectures, and workforce development to meet the scientific and national security mission needs of DOE in the early-2020s time frame.

    About the Office of Science

    DOE’s Office of Science is the single largest supporter of basic research in the physical sciences in the United States and is working to address some of the most pressing challenges of our time. For more information, please visit https://science.energy.gov/.

    About NNSA

    Established by Congress in 2000, NNSA is a semi-autonomous agency within the DOE responsible for enhancing national security through the military application of nuclear science. NNSA maintains and enhances the safety, security, and effectiveness of the U.S. nuclear weapons stockpile without nuclear explosive testing; works to reduce the global danger from weapons of mass destruction; provides the U.S. Navy with safe and effective nuclear propulsion; and responds to nuclear and radiological emergencies in the United States and abroad. https://nnsa.energy.gov

    The Goal of ECP’s Application Development focus area is to deliver a broad array of comprehensive science-based computational applications that effectively utilize exascale HPC technology to provide breakthrough simulation and data analytic solutions for scientific discovery, energy assurance, economic competitiveness, health enhancement, and national security.

    Awareness of ECP and its mission is growing and resonating—and for good reason. ECP is an incredible effort focused on advancing areas of key importance to our country: economic competiveness, breakthrough science and technology, and national security. And, fortunately, ECP has a foundation that bodes extremely well for the prospects of its success, with the demonstrably strong commitment of the US Department of Energy (DOE) and the talent of some of America’s best and brightest researchers.

    ECP is composed of about 100 small teams of domain, computer, and computational scientists, and mathematicians from DOE labs, universities, and industry. We are tasked with building applications that will execute well on exascale systems, enabled by a robust exascale software stack, and supporting necessary vendor R&D to ensure the compute nodes and hardware infrastructure are adept and able to do the science that needs to be done with the first exascale platforms.

     
  • richardmitnick 4:05 pm on May 18, 2018 Permalink | Reply
    Tags: Director's Update, ECP- Exascale Computing Project   

    From Exascale Computing Project: Director’s Update 

    Exascale Computing Project

    1
    Doug Kothe
    Oak Ridge National Laboratory
    ECP Director

    Dear Colleagues:

    Awareness of ECP and its mission is growing and resonating—and for good reason. ECP is an incredible effort focused on advancing areas of key importance to our country: economic competiveness, breakthrough science and technology, and national security. And, fortunately, ECP has a foundation that bodes extremely well for the prospects of its success, with the demonstrably strong commitment of the US Department of Energy (DOE) and the talent of some of America’s best and brightest researchers.

    ECP is composed of about 100 small teams of domain, computer, and computational scientists, and mathematicians from DOE labs, universities, and industry. We are tasked with building applications that will execute well on exascale systems, enabled by a robust exascale software stack, and supporting necessary vendor R&D to ensure the compute nodes and hardware infrastructure are adept and able to do the science that needs to be done with the first exascale platforms.

    I recently sat for a video interview with ECP Communications to revisit the big-picture perspective of why, how, and what we’re doing to pursue our mission. We talk, in general terms, about the teams, the projects, and the co-design centers; and the magnitude of what’s required relative to hardware, particularly considering the uptick during the last 4 or 5 years in artificial intelligence, machine learning, and deep learning.

    As a reminder for some—and new information for those whose interest in ECP has only recently been piqued—we clarify what is within ECP’s scope and what isn’t.

    Finally, our video chat will provide you with highlights of ECP’s progress since the first of the year and some of the key areas with which we’ll be concerning ourselves during the rest of 2018.

    With respect to progress, marrying high-risk exploratory and high-return R&D with formal project management is a formidable challenge. In January, through what is called DOE’s Independent Project Review, or IPR, process, we learned that we can indeed meet that challenge in a way that allows us to drive hard with a sense of urgency and still deliver on the essential products and solutions.

    In short, we passed the review with flying colors—and what’s especially encouraging is that the feedback we received tells us what we can do to improve. Moreover, we found that what the reviewers said was very consistent with our own thinking. Undoubtedly, the IPR experience represented a key step forward for us. We’ll be going through an IPR at least once a year, and that’s a good thing because we believe external scrutiny of what we’re doing and how we’re doing it is important and useful.

    I also highlight successes we’ve had in our three research focus areas: Application Development (AD), Software Technology (ST), and Hardware and Integration (HI). But at this point, I also want to note that each of ECP’s focus area directors recently participated in audio interviews to share their up-close perspectives that you can listen to via this newsletter in the Focus Areas Update section; their associated discussion points are also posted for you to read.

    We’ve made significant headway in identifying key AD and ST products. AD has demonstrated effectiveness by releasing a number of applications over the last several months while also developing deep-dive algorithms and models. The ST effort, with relatively new leadership, has been moving from R&D to product development and deployment. ST has a good plan for packaging our various-size components into bite-size chunks of software that the DOE laboratories will consume, integrate, and test.

    The scope of Hardware and Integration (HI) includes support for US vendor research and development focused on innovative architectures for competitive exascale system designs (PathForward), hardware evaluation, an integrated and continuously tested exascale software ecosystem deployed at DOE facilities, accelerated application readiness on targeted exascale architectures, and training on key ECP technologies to accelerate the software development cycle and optimize productivity of application and software developers. In the HI PathForward activity, which funds US vendor R&D for nodes that are tuned for our applications and system designs, the vendors have been hitting milestones on schedule and on time. We are feeling very optimistic that the vendor R&D will appear in key products in the exascale systems when they’re procured.

    Developing a Diverse Portfolio of Applications

    ECP supports all of the key program offices in DOE (Office of Science, applied offices, NNSA Defense Programs), and so our incredible teams are engaged in several main categories of applications research. Examples of some of those categories are national security, energy, fundamental materials and chemistry, scientific discovery, and data analytics.

    For national security, we’re developing next-generation applications in support of the NNSA’s stockpile stewardship program, namely reliability testing and maintenance of U.S. nuclear weapons without the use of nuclear testing.

    In energy, our work is centered on fission and fusion reactors, wind plants, combustion for internal engines and land-based gas turbines, advanced particle accelerator design, and chemical looping reactors for the clean combustion of fossil fuels. The chemistry and materials category is looking at everything from strongly correlated quantum materials to atomistic design of materials for extreme environments to advanced additive manufacturing process design. Our researchers in additive manufacturing are endeavoring to understand that process essentially to allow the printing of qualified metal alloys for defense and aerospace. On the chemical side, a great example of what we’re doing is catalyst design. We’re also addressing the very foundations of matter via the study of the strong nuclear force and the associated Standard Model, which is among the most fundamental focus areas of nuclear and high-energy physics.

    Our earth and space science applications include astrophysics and cosmology (e.g., understanding the origin of elements in the universe, and understanding the evolution of the universe and trying to explain dark matter and dark energy). Other key applications include subsurface, or the accurate modeling of the geologic subsurface for fossil fuel extraction, waste disposal, and carbon capture technologies; developing a cloud-resolving Earth system model to enable regional climate change impact assessments; and addressing the risks and response of the nation’s infrastructure to large earthquakes.

    Within the data analytics category, we have artificial intelligence and machine-learning applications focused on the cancer moonshot, which is basically precision medicine for oncology. We’re also investigating metagenomics data sets for new products and life forms. We are also focused on optimization of the US power grid for the efficient use of new technologies in support of new consumer products and on a multiscale, multisector urban simulation framework that supports the design, planning, policies, and optimized operation of cities. Another facet of our data analytics work involves seeing how we can extract more knowledge from the experimental data coming from the DOE Science facilities. Our study is focused on SLAC’s Linear Coherent Light Source (LCLS) facility, but we are committed to helping myriad facilities across the DOE complex in terms of the streaming of data and trying to determine what’s in it and how we can drive experiments or computationally steer them to give us more insight.

    Impacting Industry

    ECP aims to be a thought leader and provide direction, whether the subject is programming next-generation hardware or designing models and algorithms to target certain physical phenomena, for example. We know we must interface with industry—from small businesses to large corporations—to avoid missing functional requirements that are important to them.

    That’s why we stood up ECP’s Industry Council to work with us as an external advisory group. It is really helping to guide us concerning the challenge problems we’re addressing. The council gives us advice on whether the applications we’re tackling can be leveraged in their environments and, if not, how we can move in that direction. We meet with the council every couple of months to discuss the status of progress and where ECP is headed to ensure it will best fit the needs of US industry.

    Understanding and Mitigating Risks

    ECP must adhere to a very aggressive schedule, and I believe we are, and with the proper sense of urgency. The schedule is not only extremely dynamic but also abounding with risks. We can, however, unassumingly say that we are on track because we rigorously monitor the work to a granular level. To help us perform the tracking, we use tools called the schedule performance index and the cost performance index.

    Some projects have higher risk and more technical challenges than others, and that’s understandable. We rely heavily on our project office and our leadership team to understand what the risks are—both the known unknowns and the unknown unknowns.

    I believe that within the next year or so as we learn more about the first three exascale systems deployed in this country, a lot of our risks will either be retired or moved into the known unknowns category, which we can mitigate with our project’s use of contingency.

    We execute according to a certain funding profile, and so we hope that our DOE sponsors will be able to deliver on what we believe is the funding profile necessary for success.

    Another very important consideration for us is ensuring that the right programming models are available for the hardware, from both the software stack and the applications sides of ECP, so that the heterogeneity of the memory and the CPU hierarchy in the exascale systems can be optimally exploited.

    Workforce development is a risk as well. We have been fortunate to be able to staff our project teams with some of the best and brightest in the world. Ensuring they’re working on problems that are fun and challenging so they’ll stay with us is very important. These scientists and engineers are arguably among of the most marketable people anywhere, so they’re really in high demand outside of ECP.

    One other especially notable risk is ensuring that the US vendors deliver with the hardware and low-level software we need for our applications. The PathForward project allows us to inject resources for crucial vendor R&D. Through PathForward partnerships, we can pull in products sooner so we can extract the product quality and efficacy we need more quickly.

    Looking toward the Horizon

    Last year we executed the first of what will be an annual deep dive assessment of our AD and ST efforts, and so we’ll conduct our next one this year.

    We will engage external subject matter experts in that process, and we expect to see applications have fairly well-defined quantitative metrics for the performance parameters. The AD teams have laid out challenge problems of national interest that they plan to address on the exascale systems. Quantifying those challenge problems involves determining exactly what they are in terms of speeds and feeds on the system, and we believe we’ll be able to better clarify those numbers.

    On the ST side, we’ll examine what we call impact goals and metrics, which describe who is using a software component, whether the component is installed at a facility, and, if so, whether a line of sight to the facility or to an application is in place. Having that line of sight is crucial to proper integration.

    Finally, we anticipate dozens of more milestones to be completed by the end of the year that most definitely inform what we believe are exciting responses to the exascale Request for Proposals. ECP has the job of ensuring the vendors’ R&D is in a good place to propose exciting products for the exascale platform, and we’re working very hard to make that happen.

    See the full article here.

    Please help promote STEM in your local schools.

    stem

    Stem Education Coalition

    The Goal of ECP’s Application Development focus area is to deliver a broad array of comprehensive science-based computational applications that effectively utilize exascale HPC technology to provide breakthrough simulation and data analytic solutions for scientific discovery, energy assurance, economic competitiveness, health enhancement, and national security.

    Awareness of ECP and its mission is growing and resonating—and for good reason. ECP is an incredible effort focused on advancing areas of key importance to our country: economic competiveness, breakthrough science and technology, and national security. And, fortunately, ECP has a foundation that bodes extremely well for the prospects of its success, with the demonstrably strong commitment of the US Department of Energy (DOE) and the talent of some of America’s best and brightest researchers.

    ECP is composed of about 100 small teams of domain, computer, and computational scientists, and mathematicians from DOE labs, universities, and industry. We are tasked with building applications that will execute well on exascale systems, enabled by a robust exascale software stack, and supporting necessary vendor R&D to ensure the compute nodes and hardware infrastructure are adept and able to do the science that needs to be done with the first exascale platforms.

     
  • richardmitnick 12:44 pm on March 9, 2018 Permalink | Reply
    Tags: , ECP LANL Cray XC 40 Trinity supercomputer, ECP- Exascale Computing Project, , ,   

    From ECP: What is Exascale Computing and Why Do We Need It? 


    Exascale Computing Project


    Los Alamos National Lab


    The Trinity supercomputer, with both Xeon Haswell and the Xeon Phi Knights Landing processors, is the seventh fastest supercomputer on the TOP 500 list, and number three on the High Performance Conjugate Gradients Benchmark project.

    Meeting national security science challenges with reliable computing

    As part of the National Strategic Computing Initiative (NSCI), the Exascale Computing Project (ECP) was established to develop a capable exascale ecosystem, encompassing applications, system software, hardware technologies and architectures, and workforce development to meet the scientific and national security mission needs of the U.S. Department of Energy (DOE) in the mid-2020s time frame.

    The goal of ECP is to deliver breakthrough modeling and simulation solutions that analyze more data in less time, providing insights and answers to the most critical U.S. challenges in scientific discovery, energy assurance, economic competitiveness and national security.

    The Trinity Supercomputer at Los Alamos National Laboratory was recently named as a top 10 supercomputer on two lists: it made number three on the High Performance Conjugate Gradients (HPCG) Benchmark project, and is number seven on the TOP500 list.

    “Trinity has already made unique contributions to important national security challenges, and we look forward to Trinity having a long tenure as one of the most powerful supercomputers in the world.” said John Sarrao, associate director for Theory, Simulation and Computation at Los Alamos.

    Trinity, a Cray XC40 supercomputer at the Laboratory, was recently upgraded with Intel “Knights Landing” Xeon Phi processors, which propelled it from 8.10 petaflops six months ago to 14.14 petaflops.

    The Trinity Supercomputer Phase II project was completed during the summer of 2017, and the computer became fully operational during an unclassified “open science” run; it has now transitioned to classified mode. Trinity is designed to provide increased computational capability for the National Nuclear Security Agency in support of increasing geometric and physics fidelities in nuclear weapons simulation codes, while maintaining expectations for total time to solution.

    The capabilities of Trinity are required for supporting the NNSA Stockpile Stewardship program’s certification and assessments to ensure that the nation’s nuclear stockpile is safe, secure and effective.

    The Trinity project is managed and operated by Los Alamos National Laboratory and Sandia National Laboratories under the Alliance for Computing at Extreme Scale (ACES) partnership. The system is located at the Nicholas Metropolis Center for Modeling and Simulation at Los Alamos and covers approximately 5,200 square feet of floor space.

    See the full article here .

    Please help promote STEM in your local schools.

    STEM Icon

    Stem Education Coalition

    LANL campus

    What is exascale computing?
    Exascale computing refers to computing systems capable of at least one exaflop or a billion billion calculations per second (1018). That is 50 times faster than the most powerful supercomputers being used today and represents a thousand-fold increase over the first petascale computer that came into operation in 2008. How we use these large-scale simulation resources is the key to solving some of today’s most pressing problems, including clean energy production, nuclear reactor lifetime extension and nuclear stockpile aging.

    The Los Alamos role

    In the run-up to developing exascale systems, at Los Alamos we will be taking the lead on a co-design center, the Co-Design Center for Particle-Based Methods: From Quantum to Classical, Molecular to Cosmological. The ultimate goal is the creation of scalable open exascale software platforms suitable for use by a variety of particle-based simulations.

    Los Alamos is leading the Exascale Atomistic capability for Accuracy, Length and Time (EXAALT) application development project. EXAALT will develop a molecular dynamics simulation platform that will fully utilize the power of exascale. The platform will allow users to choose the point in accuracy, length or time-space that is most appropriate for the problem at hand, trading the cost of one over another. The EXAALT project will be powerful enough to address a wide range of materials problems. For example, during its development, EXAALT will examine the degradation of UO2 fission fuel and plasma damage in tungsten under fusion first-wall conditions.

    In addition, Los Alamos and partnering organizations will be involved in key software development proposals that cover many components of the software stack for exascale systems, including programming models and runtime libraries, mathematical libraries and frameworks, tools, lower-level system software, data management and I/O, as well as in situ visualization and data analysis.

    A collaboration of partners

    ECP is a collaborative effort of two DOE organizations—the Office of Science and the National Nuclear Security Administration (NNSA). DOE formalized this long-term strategic effort under the guidance of key leaders from six DOE and NNSA National Laboratories: Argonne, Lawrence Berkeley, Lawrence Livermore, Los Alamos, Oak Ridge and Sandia. The ECP leads the formalized project management and integration processes that bridge and align the resources of the DOE and NNSA laboratories, allowing them to work with industry more effectively.

     
  • richardmitnick 12:18 pm on October 18, 2017 Permalink | Reply
    Tags: , , ECP- Exascale Computing Project,   

    From ECP: ” Accelerating Delivery of a Capable Exascale Ecosystem” 

    Exascale Computing Project

    Accelerating Delivery of a Capable Exascale Ecosystem

    October 18, 2017
    Doug Kothe

    The Second Wave

    You may know that the ECP has been cited numerous times by the US Department of Energy (DOE)—by Secretary Perry, in fact—as one of DOE’s highest priorities. This is not only incredibly exciting but also a tremendous responsibility for us. There are high expectations for the ECP, expectations that we should not just meet—I believe we can far exceed them. All of us involved in this project are undoubtedly believers in the value and imperative of computer and computational science and engineering, and more recently of data science—especially within an exascale ecosystem. Meeting and exceeding our goals represents a tremendous return on investment for US taxpayers and potentially for the nation’s science and technology base for decades to come. This is a career opportunity for everyone involved.

    I would be remiss if I were not to thank—on behalf of all of us—Paul Messina, our inaugural ECP director. His experience and expertise have been invaluable in moving ECP through an admittedly difficult startup. The ECP is, after all, an extremely complicated endeavor. His steady hand, mentoring, and leadership, from which I benefitted first hand as the Application Development lead, have been vital to the project’s early successes. We will miss Paul but will not let him “hide”—we’ll maintain a steady line of communication with him for advice, as a sounding board, etc. Thanks again, Paul!

    As we focus our research teams on years 2 and 3 of the ECP, we must collectively and quickly move into a “steady state” mode of execution, i.e., delivering impactful milestones on a regular cadence, settling into a pattern of right-sized project management processes, and moving past exploration of technology integration opportunities and into commitments for new integrated products and deliverables. We are not there yet but will be soon. Some of this challenge has involved working with our DOE sponsors to find the right balance of “projectizing” R&D while delivering tangible products and solutions on a resource-loaded schedule that can accommodate the exploratory high-risk/high-reward nature of R&D activities so important for innovation.

    Changes in the ECP Leadership

    We are currently implementing several changes in the ECP, something that is typical of most large projects transitioning from “startup” to “steady state.” First, some ECP positions need to be filled. ECP is fortunate to have access to some of the best scientists in the world for leadership roles, but these positions take time away from personal research interests and projects, so some ECP leaders periodically may rotate back into full-time research. Fortunately, the six DOE labs responsible for leading the ECP provide plenty of “bench strength” of potential new leaders. Next, our third focus area, Hardware Technology, is being expanded in scope and renamed Hardware and Integration. It now includes an additional focus on engagement with DOE and National Nuclear Security Administration computing facilities and integrated product delivery. More information on both topics will follow.

    Looking toward the horizon, we must refine our resource-loaded schedule to ensure delivery on short-term goals, prepare for our next project review by DOE (an Independent Project Review, or IPR) in January 2018, and work more closely with US PathForward vendors and DOE HPC facilities to better understand architecture requirements and greatly improve overall software and application readiness. ECP leadership is focused on preparing for the IPR, which we must pass with flying colors. Therefore, we must collectively execute on all research milestones with a sense of urgency—in about a year, we will all know the details of the first two exascale systems!

    We’ve recently spent some time refining our strategic goals to ensure a clear message for our advocates, stakeholders, and project team members. ECP’s principal goals are threefold, and they align directly with our focus areas, as follows:

    Applications are the foundational element of the ECP and the vehicle for delivery of results from the exascale systems enabled by the ECP. Each application addresses an exascale challenge problem—a problem of strategic importance and national interest that is intractable without at least 50 times the computational power of today’s systems.
    Software Technologies are the underlying technologies on which applications are built and are essential for application performance, portability, integrity, and resilience. Software technologies span low-level system software to high-level application development environments, including infrastructure for large-scale data science and an expanded and vertically integrated software stack with advanced mathematical libraries and frameworks, extreme-scale programming environments, tools, and visualization libraries.
    Hardware and Integration points to key ECP-enabled partnerships between US vendors and the ECP (and community-wide) application and software developers to develop a new generation of commodity computing components. This partnership must ensure at least two diverse and viable exascale computing technology pathways for the nation to meet identified mission needs.
    The expected ECP outcome is the accelerated delivery of a capable exascale computing ecosystem to provide breakthrough solutions addressing our most critical challenges in scientific discovery, energy assurance, economic competitiveness, and national security. Capable implies a wide range of applications able to effectively use the systems developed through the ECP, thereby ensuring that both science and security needs will be addressed because the system is affordable, usable, and useful. Exascale, of course, refers to the ability to perform >1018 operations per second, and ecosystem implies not just more powerful systems, but rather all methods and tools needed for effective use of ECP-enabled exascale systems to be acquired by DOE labs.

    To close, I’m very excited and honored to be working with the most talented computer and computational scientists in the world as we collectively pursue an incredibly important and compelling national mission. I think taking the journey will be just as fun as arriving at our destination, and to get there we will need everyone’s support, talent, and hard work. Please contact me personally if you ever have any questions, comments, or concerns.

    In the meantime, as former University of Tennessee Lady Vols basketball coach Pat Summitt said, “Keep on keeping on.”

    Doug

    See the full article here.

    Please help promote STEM in your local schools.

    STEM Icon

    Stem Education Coalition

    EXASCALE COMPUTING PROJECT

    The Exascale Computing Project (ECP) was established with the goals of maximizing the benefits of high-performance computing (HPC) for the United States and accelerating the development of a capable exascale computing ecosystem.

    Exascale refers to computing systems at least 50 times faster than the nation’s most powerful supercomputers in use today.

    The ECP is a collaborative effort of two U.S. Department of Energy organizations – the Office of Science (DOE-SC) and the National Nuclear Security Administration (NNSA).

    ECP is chartered with accelerating delivery of a capable exascale computing ecosystem to provide breakthrough modeling and simulation solutions to address the most critical challenges in scientific discovery, energy assurance, economic competitiveness, and national security.

    This role goes far beyond the limited scope of a physical computing system. ECP’s work encompasses the development of an entire exascale ecosystem: applications, system software, hardware technologies and architectures, along with critical workforce development.

     
  • richardmitnick 1:24 pm on September 28, 2017 Permalink | Reply
    Tags: , “ExaSky” - “Computing the Sky at Extreme Scales” project or, Cartography of the cosmos, , ECP- Exascale Computing Project, , , Salman Habib, , The computer can generate many universes with different parameters, There are hundreds of billions of stars in our own Milky Way galaxy   

    From ALCF: “Cartography of the cosmos” 

    Argonne Lab
    News from Argonne National Laboratory

    ALCF

    September 27, 2017
    John Spizzirri

    2
    Argonne’s Salman Habib leads the ExaSky project, which takes on the biggest questions, mysteries, and challenges currently confounding cosmologists.

    1
    No image caption or credit

    There are hundreds of billions of stars in our own Milky Way galaxy.

    Milky Way NASA/JPL-Caltech /ESO R. Hurt

    Estimates indicate a similar number of galaxies in the observable universe, each with its own large assemblage of stars, many with their own planetary systems. Beyond and between these stars and galaxies are all manner of matter in various phases, such as gas and dust. Another form of matter, dark matter, exists in a very different and mysterious form, announcing its presence indirectly only through its gravitational effects.

    This is the universe Salman Habib is trying to reconstruct, structure by structure, using precise observations from telescope surveys combined with next-generation data analysis and simulation techniques currently being primed for exascale computing.

    “We’re simulating all the processes in the structure and formation of the universe. It’s like solving a very large physics puzzle,” said Habib, a senior physicist and computational scientist with the High Energy Physics and Mathematics and Computer Science divisions of the U.S. Department of Energy’s (DOE) Argonne National Laboratory.

    Habib leads the “Computing the Sky at Extreme Scales” project or “ExaSky,” one of the first projects funded by the recently established Exascale Computing Project (ECP), a collaborative effort between DOE’s Office of Science and its National Nuclear Security Administration.

    From determining the initial cause of primordial fluctuations to measuring the sum of all neutrino masses, this project’s science objectives represent a laundry list of the biggest questions, mysteries, and challenges currently confounding cosmologists.

    There is the question of dark energy, the potential cause of the accelerated expansion of the universe, while yet another is the nature and distribution of dark matter in the universe.

    Dark Energy Survey


    Dark Energy Camera [DECam], built at FNAL


    NOAO/CTIO Victor M Blanco 4m Telescope which houses the DECam at Cerro Tololo, Chile, housing DECam at an altitude of 7200 feet

    Dark Matter Research

    Universe map Sloan Digital Sky Survey (SDSS) 2dF Galaxy Redshift Survey

    Scientists studying the cosmic microwave background hope to learn about more than just how the universe grew—it could also offer insight into dark matter, dark energy and the mass of the neutrino.

    Dark matter cosmic web and the large-scale structure it forms The Millenium Simulation, V. Springel et al

    Dark Matter Particle Explorer China

    DEAP Dark Matter detector, The DEAP-3600, suspended in the SNOLAB deep in Sudbury’s Creighton Mine

    LUX Dark matter Experiment at SURF, Lead, SD, USA

    ADMX Axion Dark Matter Experiment, U Uashington

    These are immense questions that demand equally expansive computational power to answer. The ECP is readying science codes for exascale systems, the new workhorses of computational and big data science.

    Initiated to drive the development of an “exascale ecosystem” of cutting-edge, high-performance architectures, codes and frameworks, the ECP will allow researchers to tackle data and computationally intensive challenges such as the ExaSky simulations of the known universe.

    In addition to the magnitude of their computational demands, ECP projects are selected based on whether they meet specific strategic areas, ranging from energy and economic security to scientific discovery and healthcare.

    “Salman’s research certainly looks at important and fundamental scientific questions, but it has societal benefits, too,” said Paul Messina, Argonne Distinguished Fellow. “Human beings tend to wonder where they came from, and that curiosity is very deep.”

    HACC’ing the night sky

    For Habib, the ECP presents a two-fold challenge — how do you conduct cutting-edge science on cutting-edge machines?

    The cross-divisional Argonne team has been working on the science through a multi-year effort at the Argonne Leadership Computing Facility (ALCF), a DOE Office of Science User Facility. The team is running cosmological simulations for large-scale sky surveys on the facility’s 10-petaflop high-performance computer, Mira. The simulations are designed to work with observational data collected from specialized survey telescopes, like the forthcoming Dark Energy Spectroscopic Instrument (DESI) and the Large Synoptic Survey Telescope (LSST).

    LBNL/DESI Dark Energy Spectroscopic Instrument for the Nicholas U. Mayall 4-meter telescope at Kitt Peak National Observatory near Tucson, Ariz, USA

    LSST


    LSST Camera, built at SLAC



    LSST telescope, currently under construction at Cerro Pachón Chile, a 2,682-meter-high mountain in Coquimbo Region, in northern Chile, alongside the existing Gemini South and Southern Astrophysical Research Telescopes.

    Survey telescopes look at much larger areas of the sky — up to half the sky, at any point — than does the Hubble Space Telescope, for instance, which focuses more on individual objects.

    NASA/ESA Hubble Telescope

    One night concentrating on one patch, the next night another, survey instruments systematically examine the sky to develop a cartographic record of the cosmos, as Habib describes it.

    Working in partnership with Los Alamos and Lawrence Berkeley National Laboratories, the Argonne team is readying itself to chart the rest of the course.

    Their primary code, which Habib helped develop, is already among the fastest science production codes in use. Called HACC (Hardware/Hybrid Accelerated Cosmology Code), this particle-based cosmology framework supports a variety of programming models and algorithms.

    Unique among codes used in other exascale computing projects, it can run on all current and prototype architectures, from the basic X86 chip used in most home PCs, to graphics processing units, to the newest Knights Landing chip found in Theta, the ALCF’s latest supercomputing system.

    As robust as the code is already, the HACC team continues to develop it further, adding significant new capabilities, such as hydrodynamics and associated subgrid models.

    “When you run very large simulations of the universe, you can’t possibly do everything, because it’s just too detailed,” Habib explained. “For example, if we’re running a simulation where we literally have tens to hundreds of billions of galaxies, we cannot follow each galaxy in full detail. So we come up with approximate approaches, referred to as subgrid models.”

    Even with these improvements and its successes, the HACC code still will need to increase its performance and memory to be able to work in an exascale framework. In addition to HACC, the ExaSky project employs the adaptive mesh refinement code Nyx, developed at Lawrence Berkeley. HACC and Nyx complement each other with different areas of specialization. The synergy between the two is an important element of the ExaSky team’s approach.

    A cosmological simulation approach that melds multiple approaches allows the verification of difficult-to-resolve cosmological processes involving gravitational evolution, gas dynamics and astrophysical effects at very high dynamic ranges. New computational methods like machine learning will help scientists to quickly and systematically recognize features in both the observational and simulation data that represent unique events.

    A trillion particles of light

    The work produced under the ECP will serve several purposes, benefitting both the future of cosmological modeling and the development of successful exascale platforms.

    On the modeling end, the computer can generate many universes with different parameters, allowing researchers to compare their models with observations to determine which models fit the data most accurately. Alternatively, the models can make predictions for observations yet to be made.

    Models also can produce extremely realistic pictures of the sky, which is essential when planning large observational campaigns, such as those by DESI and LSST.

    “Before you spend the money to build a telescope, it’s important to also produce extremely good simulated data so that people can optimize observational campaigns to meet their data challenges,” said Habib.

    But the cost of realism is expensive. Simulations can range in the trillion-particle realm and produce several petabytes — quadrillions of bytes — of data in a single run. As exascale becomes prevalent, these simulations will produce 10 to 100 times as much data.

    The work that the ExaSky team is doing, along with that of the other ECP research teams, will help address these challenges and those faced by computer manufacturers and software developers as they create coherent, functional exascale platforms to meet the needs of large-scale science. By working with their own codes on pre-exascale machines, the ECP research team can help guide vendors in chip design, I/O bandwidth and memory requirements and other features.

    “All of these things can help the ECP community optimize their systems,” noted Habib. “That’s the fundamental reason why the ECP science teams were chosen. We will take the lessons we learn in dealing with this architecture back to the rest of the science community and say, ‘We have found a solution.’”

    The Exascale Computing Project is a collaborative effort of two DOE organizations — the Office of Science and the National Nuclear Security Administration. As part of President Obama’s National Strategic Computing initiative, ECP was established to develop a capable exascale ecosystem, encompassing applications, system software, hardware technologies and architectures and workforce development to meet the scientific and national security mission needs of DOE in the mid-2020s timeframe.

    ANL ALCF Cetus IBM supercomputer

    ANL ALCF Theta Cray supercomputer

    ANL ALCF Cray Aurora supercomputer

    ANL ALCF MIRA IBM Blue Gene Q supercomputer at the Argonne Leadership Computing Facility

    See the full article here .

    Please help promote STEM in your local schools.
    STEM Icon
    Stem Education Coalition

    Argonne National Laboratory seeks solutions to pressing national problems in science and technology. The nation’s first national laboratory, Argonne conducts leading-edge basic and applied scientific research in virtually every scientific discipline. Argonne researchers work closely with researchers from hundreds of companies, universities, and federal, state and municipal agencies to help them solve their specific problems, advance America’s scientific leadership and prepare the nation for a better future. With employees from more than 60 nations, Argonne is managed by UChicago Argonne, LLC for the U.S. Department of Energy’s Office of Science. For more visit http://www.anl.gov.

    About ALCF

    The Argonne Leadership Computing Facility’s (ALCF) mission is to accelerate major scientific discoveries and engineering breakthroughs for humanity by designing and providing world-leading computing facilities in partnership with the computational science community.

    We help researchers solve some of the world’s largest and most complex problems with our unique combination of supercomputing resources and expertise.

    ALCF projects cover many scientific disciplines, ranging from chemistry and biology to physics and materials science. Examples include modeling and simulation efforts to:

    Discover new materials for batteries
    Predict the impacts of global climate change
    Unravel the origins of the universe
    Develop renewable energy technologies

    Argonne is managed by UChicago Argonne, LLC for the U.S. Department of Energy’s Office of Science

    Argonne Lab Campus

     
  • richardmitnick 10:08 am on April 26, 2017 Permalink | Reply
    Tags: , , Building the Bridge to Exascale, ECP- Exascale Computing Project, , , ,   

    From OLCF at ORNL: “Building the Bridge to Exascale” 

    i1

    Oak Ridge National Laboratory

    OLCF

    April 18, 2017 [Where was this hiding?]
    Katie Elyce Jones

    Building an exascale computer—a machine that could solve complex science problems at least 50 times faster than today’s leading supercomputers—is a national effort.

    To oversee the rapid research and development (R&D) of an exascale system by 2023, the US Department of Energy (DOE) created the Exascale Computing Project (ECP) last year. The project brings together experts in high-performance computing from six DOE laboratories with the nation’s most powerful supercomputers—including Oak Ridge, Argonne, Lawrence Berkeley, Lawrence Livermore, Los Alamos, and Sandia—and project members work closely with computing facility staff from the member laboratories.

    ORNL IBM Summit supercomputer depiction.

    At the Exascale Computing Project’s (ECP’s) annual meeting in February 2017, Oak Ridge Leadership Computing Facility (OLCF) staff discussed OLCF resources that could be leveraged for ECP research and development, including the facility’s next flagship supercomputer, Summit, expected to go online in 2018.

    At the first ECP annual meeting, held January 29–February 3 in Knoxville, Tennessee, about 450 project members convened to discuss collaboration in breakout sessions focused on project organization and upcoming R&D milestones for applications, software, hardware, and exascale systems focus areas. During facility-focused sessions, senior staff from the Oak Ridge Leadership Computing Facility (OLCF) met with ECP members to discuss opportunities for the project to use current petascale supercomputers, test beds, prototypes, and other facility resources for exascale R&D. The OLCF is a DOE Office of Science User Facility located at DOE’s Oak Ridge National Laboratory (ORNL).

    “The ECP’s fundamental responsibilities are to provide R&D to build exascale machines more efficiently and to prepare the applications and software that will run on them,” said OLCF Deputy Project Director Justin Whitt. “The facilities’ responsibilities are to acquire, deploy, and operate the machines. We are currently putting advanced test beds and prototypes in place to evaluate technologies and enable R&D efforts like those in the ECP.”

    ORNL has a unique connection to the ECP. The Tennessee-based laboratory is the location of the project office that manages collaboration within the ECP and among its facility partners. ORNL’s Laboratory Director Thom Mason delivered the opening talk at the conference, highlighting the need for coordination in a project of this scope.

    On behalf of facility staff, Mark Fahey, director of operations at the Argonne Leadership Computing Facility, presented the latest delivery and deployment plans for upcoming computing resources during a plenary session. From the OLCF, Project Director Buddy Bland and Director of Science Jack Wells provided a timeline for the availability of Summit, OLCF’s next petascale supercomputer, which is expected to go online in 2018; it will be at least 5 times more powerful than the OLCF’s 27-petaflop Titan supercomputer.

    ORNL Cray XK7 Titan Supercomputer.

    “Exascale hardware won’t be around for several more years,” Wells said. “The ECP will need access to Titan, Summit, and other leadership computers to do the work that gets us to exascale.”

    Wells said he was able to highlight the spring 2017 call for Innovative and Novel Computational Impact on Theory and Experiment, or INCITE, proposals, which will give 2-year projects the first opportunity for computing time on Summit. OLCF staff also introduced a handful of computing architecture test beds—including the developmental environment for Summit known as Summitdev, NVIDIA’s deep learning and accelerated analytics system DGX-1, an experimental cluster of ARM 64-bit compute nodes, and a Cray XC40 cluster of 168 nodes known as Percival—that are now available for OLCF users.

    In addition to leveraging facility resources for R&D, the ECP must understand the future needs of facilities to design an exascale system that is ready for rigorous computational science simulations. Facilities staff can offer insight about the level of performance researchers will expect from science applications on exascale systems and estimate the amount of space and electrical power that will be available in the 2023 timeframe.

    “Getting to capable exascale systems will require careful coordination between the ECP and the user facilities,” Whitt said.

    One important collaboration so far was the development of a request for information, or RFI, for exascale R&D that the ECP released in February to industry vendors. The RFI enables the ECP to evaluate potential software and hardware technologies for exascale systems—a step in the R&D process that facilities often undertake. Facilities will later release requests for proposals when they are ready to begin building exascale systems

    See the full article here .

    Please help promote STEM in your local schools.

    STEM Icon

    Stem Education Coalition

    ORNL is managed by UT-Battelle for the Department of Energy’s Office of Science. DOE’s Office of Science is the single largest supporter of basic research in the physical sciences in the United States, and is working to address some of the most pressing challenges of our time.

    i2

    The Oak Ridge Leadership Computing Facility (OLCF) was established at Oak Ridge National Laboratory in 2004 with the mission of accelerating scientific discovery and engineering progress by providing outstanding computing and data management resources to high-priority research and development projects.

    ORNL’s supercomputing program has grown from humble beginnings to deliver some of the most powerful systems in the world. On the way, it has helped researchers deliver practical breakthroughs and new scientific knowledge in climate, materials, nuclear science, and a wide range of other disciplines.

    The OLCF delivered on that original promise in 2008, when its Cray XT “Jaguar” system ran the first scientific applications to exceed 1,000 trillion calculations a second (1 petaflop). Since then, the OLCF has continued to expand the limits of computing power, unveiling Titan in 2013, which is capable of 27 petaflops.


    ORNL Cray XK7 Titan Supercomputer

    Titan is one of the first hybrid architecture systems—a combination of graphics processing units (GPUs), and the more conventional central processing units (CPUs) that have served as number crunchers in computers for decades. The parallel structure of GPUs makes them uniquely suited to process an enormous number of simple computations quickly, while CPUs are capable of tackling more sophisticated computational algorithms. The complimentary combination of CPUs and GPUs allow Titan to reach its peak performance.

    The OLCF gives the world’s most advanced computational researchers an opportunity to tackle problems that would be unthinkable on other systems. The facility welcomes investigators from universities, government agencies, and industry who are prepared to perform breakthrough research in climate, materials, alternative energy sources and energy storage, chemistry, nuclear physics, astrophysics, quantum mechanics, and the gamut of scientific inquiry. Because it is a unique resource, the OLCF focuses on the most ambitious research projects—projects that provide important new knowledge or enable important new technologies.

     
c
Compose new post
j
Next post/Next comment
k
Previous post/Previous comment
r
Reply
e
Edit
o
Show/Hide comments
t
Go to top
l
Go to login
h
Show/Hide help
shift + esc
Cancel
%d bloggers like this: