Data Access and the ECCO Ocean and Ice State Estimate
Principal Investigator (PI): Patrick Heimbach, University Of Texas, Austin
Co-Investigators (Co-PI): Ian Fenty, Thomas Huang
The Estimating the Ocean Circulation and Climate (ECCO) global ocean state estimation system is the premier tool for synthesizing NASA's diverse Earth system observations into a complete physical description of Earth's time-evolving full-depth ocean and sea ice system. ECCO state estimates are of particular significance to NASA because on their own, all satellite observations, although global in coverage, remain sparse in space and time relative to the inherent scales of ocean variability, and are blind to the ocean's interior. With increased streams of data and better spatial resolution the scientific utility of the product is increasingly limited by (1) the inability to automate observing network ingestion and update the model in a rapid, robust manner, (2) the lack of tools for embedding the state estimate into NASA's Earth Observing System Data and Information System (EOSDIS) framework, and (3) the lack of capabilities to perform efficient online data analysis.
To overcome these hurdles, we are developing and implementing a production-ready cloud-native storage and data analysis system called ECCO-Cloud to manage the preprocessing and transformation of NASA Earth Science data and data products. The project has the following high-level goals:
- Expand and accelerate the integration of NASA Earth system data into ECCO through automated preprocessing and transformation.
- Radically streamline the integration of updated ECCO products into EOSDIS, specifically NASA's Physical Oceanography Distributed Active Archive Center (PO.DAAC).
- Facilitate and expand the scientific utilization of NASA remote sensing data integrated in ECCO by the growing community of interdisciplinary researchers in the oceanographic, sea-ice, sea level rise, and climate science fields.
This work leverages 1) the latest ECCO ocean global state estimate, 2) new software tools developed to display, analyze, extract, subset, reproject, and download ocean physical parameters from the ECCO state estimate (temperature, salinity, currents, atmosphere-ocean heat fluxes, sea level, etc.), 3) experience in hosting and rapidly accessing the tens of gigabytes of binary output files that comprise the complete ECCO state estimate, and 4) new developments that now allow new simulations based on ECCO's Oceanic General Circulation Model (OGCM) to be run on the Amazon Elastic Compute Cloud (Amazon EC2).
Recently, progress has been made in the development of software tools that allow rapid online access (through pre-caching) and analysis of a subset of ECCO output via interactive web pages. These software tools were development for the Sea Level Portal. In the proposed project we intend to leverage and significantly extend these tools, so a much larger set, if not all, of ECCO ocean parameters are made similarly available. The tools to access and conduct interactive analyses of these ocean parameters will be implemented on the new ECCO new website. Users will be provided links to the appropriate EOSDIS DAAC such as PO.DAAC and the National Snow and Ice Data Center DAAC (NSIDC DAAC) to access the original ocean and ice data products used in ECCO. This will also enable NASA's common access interface through the Earthdata enterprise Common Metadata Repository (CMR) and Earthdata Search capability.
Finally, now that ECCO's OGCM has been ported to run on Amazon EC2, we will provide an online front-end for users so that users can reproduce the full state estimate, formulate and conduct their own experimental simulations, and then seamlessly analyze the output of those simulations using the same data analysis tools. Our work will contribute to fuller utilization of NASA Earth System data—especially ocean data and ECCO products—by the research community.
First Year Update: October 2019
Summary of Accomplishments
During the project’s first year, the ECCO team refined the overall ECCO-Cloud architecture and established a fully automated cloud-based, serverless data processing interface between the AWS Cloud and NASA Ames Pleiades.
The HPC processing workflow allows users to 1) query NASA CMR to find new relevant data products; 2) harvest the new data from the associated Distributed Active Archive Centers (e.g., NSIDC, PO.DAAC); 3) apply scientific algorithms to transform the data; 4) package the data; and 5) publish it to the Pleiades High Performance Computing (HPC) system at NASA’s Ames Research Center (ARC) for ECCO model runs. The new workflow system retains data lineage for reproducibility and enables users to download input data associated to ECCO products produced by ECCO-Cloud. This workflow can also be run locally on a user’s machine, so it is not strictly reliant on AWS.
The automated workflow system has been validated with Sea Surface Temperature, Sea-ice Concentration and Ocean Bottom Pressure. Validation with Sea Surface Salinity and Sea Surface Height is in progress.
The ECCO team is continuing to work with NASA Physical Oceanography Distributed Active Archive Center (PO.DAAC), which is the long-term archive for the ECCO products. The ECCO-Cloud project will deliver 80 different parameters to the long-term archive that are daily and monthly and in both native and re-projected grids. These data products will be available for public access.
In order to facilitate use of these products, Dr. Patrick Heimbach and Dr. Ian Fenty organized and hosted the first ECCO Summer School from May 19-31, 2019, at Friday Harbor Laboratories, UW, Friday Harbor, WA. Approximately 50 participants attended the summer school, which targeted early career scientists (graduate students and early postdocs) working on diverse aspects of physical oceanography and ocean state estimation. See a list of tutorial and tool resources.
Invited Talks and Presentations
Huang, T. (2019, July). Open Source Data-Intensive Platform for the Cloud. Presented at ESIP Summer Meeting, Tacoma, WA.
Huang, T. (2019, July). Analytics Center Framework for Estimating the Circulation and Climate of the Ocean. Presented at IGRASS 2019, Yokohama, Japan.
Ford, E. & Huang, T. (2019, July). Analytics Center Framework for Estimating the Circulation and Climate of the Ocean. Presented at NASA ESDIS System Engineering Technical Interchange Meeting, Goddard Space Flight Center, Greenbelt, MD.
Greguska, F., Wilson, B. & Huang, T. (2019, September). Apache Science Data Analytics Platform (SDAP). Presented at ApacheCon North America, Las Vegas.
Huang, T. (2019, September). Advancing Technology Through Open Source. Presented at OceanObs 2019, Honolulu, HI.
Huang, T. (2019, September). From Data to Insights: Shift Toward Data Analytics. Presented at Data Analytics for Canadian Climate Services (DACCS) Kickoff Meeting by CRIM, Centre de Recherche Informatique de Montréal.
Huang, T. (2019, Oct). Analysis Ready Storage using Apache SDAP. Presented at CEOS 48th Meeting of the Working Group on Information Systems & Services (WGISS-48), Hanoi, Vietnam.
Huang, T. (2019, December). Aiming for Autonomously Sustainable Solution for Spatiotemporal Analysis. Presented at American Geophysical Union Fall Meeting, San Francisco, CA.
Huang, T. (2020, February). Autonomously Sustainable Solution for Big Ocean Science. To be presented at Ocean Sciences, San Diego, CA.
Last Updated: Nov 8, 2019 at 10:12 AM EST