From University of Washington eScience Institute: “UW researchers are changing the way we use satellite observations of Earth”

U Washington

From University of Washington

eScience Institute


Anthony Arendt

The University of Washington, along with collaborators from the National Center for Atmospheric Research (NCAR), Anaconda, and Element84 have just been awarded a $1.5 million grant from the National Aeronautics and Space Administration (NASA) to develop new approaches for using satellite observations of Earth. The team will work with the Pangeo Project, a community effort for big data in the geosciences, to develop state-of-the-art open-source tools for cloud-based data analysis.

Schematic of the proposed cloud-computing platform. A) The current paradigm where scientific computations are pre-formed locally on duplicated datasets. B) Representation of the proposed toolset which enables scientists to work entirely in the cloud via an API and scalable interactive computing resources. No image credit.

This team brings together software developers and research scientists to address emerging challenges in working with increasingly large and complex satellite data products. NASA has a long history of providing these data to the research community so that they can explore the complex workings of our planet. Typically a researcher will download data from a NASA repository and carry out their analysis on a laptop or workstation. As satellite technologies improve, data are being collected at increasingly finer resolutions, creating larger and larger datasets that take more time and resources to explore. By 2025, NASA estimates that it will be storing upwards of 250 Petabytes of data on the commercial cloud.

This project will demonstrate a new approach, one in which researchers can avoid moving data, and focus instead on building tools for data analysis in a shared computing environment. The Pangeo Project provides the technological and social framework for achieving this shift.

On the technical side, the Pangeo community is exploring new software that break down the processing and analysis of large datasets into smaller, more manageable sized “chunks.” Distributed computing tools are then used to send those chunks to many different computing “workers” that can be created or destroyed in a short amount of time. At the center of it all is a scheduler that orchestrates the efficient distribution of computing tasks across many workers. The datasets themselves are stored alongside the computing infrastructure allowing for faster computing and eliminating the need to move data. To date this computing architecture has been tested on both institutional and commercial cloud computing systems.

What do these technologies look like to a typical user? Pangeo is building a series of computing environments that have a set of tools for a variety of scientific disciplines such as climatology, oceanography and hydrology. A user can log in to a centralized hub that creates an instance of a “Jupyter Notebook”, a user-friendly, web-based scripting environment. Users can then issue commands to work with the data, and save key results back to their local computers.

On the social side of this work is the creation of an inclusive and welcoming community that is working together to build tools and educate scientists about these new approaches. We plan to host a series of events to offer training and provide a space for hacking on projects together. These efforts will help shift scientific culture toward open and reproducible software practices.

In a sense, this project team represents a microcosm of the larger Pangeo community: industry partners Anaconda (Matthew Rocklin) and Element84 (Dan Pilone) are contributing expertise in software development and in connecting NASA services to the community, while the NCAR (Ethan Gutmann, Joe Hamman) and UW teams (Scott Henderson, Amanda Tan, Rob Fatland and Anthony Arendt) are exploring scientific use cases and developing educational and community building tools.

Further information can be found in the blog posts “Cloud Native geoprocessing of Earth Observation satellite data with Pangeo” by Scott Henderson and “Pangeo applications for NASA Earth Observing Data” by Joe Hamman.

See the full article here .


Please help promote STEM in your local schools.

Stem Education Coalition

Mission Statement

The eScience Institute empowers researchers and students in all fields to answer fundamental questions through the use of large, complex, and noisy data. As the hub of data-intensive discovery on campus, we lead a community of innovators in the techniques, technologies, and best practices of data science and the fields that depend on them.

Specifically, we:

1. Bring expertise and help researchers at UW to leverage data science tools, methods, and best practices in their research and in their grant proposals.

2. Are data science experts and, in collaboration with faculty at UW, advance the state-of-the-art in data science methods and in domain sciences that benefit from them.

The University of Washington is one of the world’s preeminent public universities. Our impact on individuals, on our region, and on the world is profound — whether we are launching young people into a boundless future or confronting the grand challenges of our time through undaunted research and scholarship. Ranked number 10 in the world in Shanghai Jiao Tong University rankings and educating more than 54,000 students annually, our students and faculty work together to turn ideas into impact and in the process transform lives and our world. For more about our impact on the world, every day.
So what defines us —the students, faculty and community members at the University of Washington? Above all, it’s our belief in possibility and our unshakable optimism. It’s a connection to others, both near and far. It’s a hunger that pushes us to tackle challenges and pursue progress. It’s the conviction that together we can create a world of good. Join us on the journey.