ESDS Program

Systematic Data Transformation to Enable Web Coverage Services (WCS) and ArcGIS Image Services within ESDIS Cumulus Cloud

Principal Investigator (PI): Jeff Walter, NASA’s Langley Research Center, Hampton, VA

Co-Investigators: A. Jason Barnett and Brian Tisdale, contractors, Booz Allen Hamilton, NASA’s Atmospheric Science Data Center (ASDC) at NASA Langley; Hyo-Kyung (Joe) Lee, The HDF Group; Matthew Tisdale, ASDC

Project Background

NASA’s Earth Science Data and Information System (ESDIS) Project launched the Cumulus effort to develop a scalable cloud-based platform using Amazon Web Services (AWS) to ingest, archive, distribute, and manage Earth science data in NASA’s Earth Observing System Data and Information System (EOSDIS) collection. Cloud computing has enabled both EOSDIS Distributed Active Archive Centers (DAACs) and Earthdata users to handle a large collection of data by scaling out resources in an unprecedented manner. Infrastructure as a service (IaaS) along with function as a service (FaaS) in cloud computing has accelerated the movement of data management from on-premise to cloud for both DAACs and users. As cloud computing technology advances, the demand for analysis-ready data (ARD) that work well with large scale data analytics solutions in the cloud has increased.

Earth science data are geospatial in nature. According to the Geospatial Data Abstraction Library (GDAL) Enhancements for ESDIS (GEE) Assessment, however, many Earth science data products (e.g., data from the Measurement of Pollution in the Troposphere (MOPITT) and Clouds and the Earth’s Radiant Energy System (CERES) instruments, to name two) are difficult to access and use within commercial off-the-shelf (COTS) and open source geographic information systems (GIS) software, such as Esri’s ArcGIS and the open source QGIS software.

Illustration of WCS architecture.

According to the 2018 EOSDIS American Customer Satisfaction Index (ACSI) survey, ArcGIS is the most used software tool/package for working with NASA Earth science data (64%). Use of ArcGIS is followed by QGIS (42%), ENVI (32%), and Excel (29%).

In addition, ArcGIS Image Services and Web Coverage Services provide easy access to actual data values – not just static images – and alleviate the need to navigate complex and storage-intensive processes of raw data downloads. The importance of delivering geospatially enabled web services is emphasized through initiatives like the establishment of NASA’s Geospatial Web Services Working Group (GWSWG) (part of the Earth Science Data System Working Group (ESDSWG)), NASA’s Big Earth Data Initiative (BEDI), the NASA Earth Science Data Systems Geographic Information Systems Team (EGIST), and the ESDIS Cumulus Seamless 360 Degrees of Services that includes the implementation of a common user-facing application programming interface (API) such as Web Coverage Service (WCS).

Project Overview

This project developed geospatial data transformation plugins that can be used within the ESDIS Cumulus environment to serve out transformed MOPITT and CERES data products as Open Geospatial Consortium (OGC) WCS and Esri ArcGIS Image Services that easily can be consumed into COTS GIS software such as ArcGIS and QGIS. These plugins perform the transformation to fix any data issues (e.g., incorrect image sizes, orientation, multidimensional variable interpretation, georeferenced metadata recognition, etc.) and this extensible framework can be further expanded to add new plugins as additional issues are identified and addressed. These geospatial data transformation plugins will remove the barrier of transforming each NASA data product after download since it will be correctly served, leading to more easily accessible NASA data products by the Earth science community.

This project utilizes AWS Step Functions and Lambda Functions that also are utilized in Cumulus.

AWS Lambda and Step Functions

One of the major functions of this proposal work was to develop a series of lambda functions that can easily be linked together within AWS Step Functions. These can then be used as Cumulus Workflows. Currently, there are 12 developmental lambda functions utilizing the Serverless Framework covering a variety of different transformation options.

Lambda Function Description
cmr2ncml Generate NcML based on Copmmon Metadata Repository (CMR) search result. Write NcML to ncml-lambda bucket to trigger ncml lambda function.
cog2geos Update GeoServer Mosaic dataset in AWS RDS via GeoServer REST API call.
mdcs Run MDCS batch script to create mosaic dataset on ArcGIS Windows Server.
mop02j2tif Convert MOP02 Swath product on S3 to GeoTIFF using the (unreleased) latest GDAL VSIS3.
mop2tif Convert MOP03 Grid product on S3 to GeoTIFF using the GEE VSIS3 driver.
mrf2vrt Creates 24 individual VRT files from a 24-hour MRF for GeoServer Mosaic dataset.
ncml2mrf2 Convert a CERES NcML to MRF and uploads to ceres-mrf S3 bucket.
ncml2mrf Convert an hourly NcML to MRF and uploads to a user specified S3 bucket.
ncml2tif Convert a CERES NcML to GeoTIFF using Hyrax and GDAL DODS driver.
ncml Make ssh conection to Hyrax server and synchronize local ncml directory with ncml2tif and ncml2mrf S3 buckets.
rasterproxy Creates 24 individual raster proxy XML files from a 24 hour MRF and uploads to a user specified S3 bucket.
tif2cog Convert a standard GeoTIFF to Cloud Optimized GeoTiff (CoG).

In addition, the team implemented two lambda function layers that allow the version control of GDAL libraries used in the lambda functions and save time and space for deploying commonly used libraries like NetCDF-4 for Python.

The team successfully created a complete lambda-to-lambda workflow that generated a Geoserver Mosaic dataset from a MOPITT Level 3 product. The workflow was triggered from a granule being placed in an S3 bucket and transformed into a new file format and written to the Geoserver Mosaic. The mosaic was served as an OGC Web Coverage Service from Geoserver and verified with QGIS that the time information was stored correctly within the AWS Remote Database Service (RDS).

The team also successfully chained several of the new lambda functions into an AWS Step Function that launches an AWS state machine. This serverless-based workflow makes a web service call to OPeNDAP Hyrax to retrieve a CERES SYN1Deg data product. The final result is a raster mosaic management script on the EC2 instance and the update of an ArcGIS Image Service.

Products Developed and Deployed

Numerous products were developed and deployed over the course of this project:

Type Product Description
Web Mapping Applications 1. Data Access Analytics Viewer (DAAV)

2. NASA COVID Dashboard

1. Created lightweight geospatial web mapping application that ingests Image Services and utilizes the ArcGIS API for Python.

2. Created a dashboard application that presents location-based analytics using interactive data visualizations.

ArcGIS Server 1. ArcGIS Image Server/OGC

2. Raster Analytics

1. Provides a distributed computing and storage system that powers the analytical processing and serving of large imagery collections.

2. Enables distributed processing and storage to perform image analysis or persist a new raster product.

Portal for ArcGIS Home A component of ArcGIS Enterprise that allows users to share maps, scenes, apps, and other geographic information with other people in their organization via a front-end application.
Notebook Server 1. Notebook Server

2. CERES Notebook

3. MOPITT Notebook

1. An integrated platform to create, share, and run data science, data management, and administrative scripts.

2. Subsetting, Export, Clipping, Temporal Profile.

3. Subsetting, Anomaly Detection, Trend and Prediction, Temporal Profile.

Service Management Mosaic Dataset Configuration Script A Python script that reads parameters stored in an xml file in order to create, configure, and populate a mosaic dataset.

Documentation

Wiki Space and Docs Location of data transformation processes, enterprise architecture, ‘How To’ documentation, and best practices.
AWS Lambda Functions Transformation Functions An event-driven, serverless computing platform that runs code in response to events and automatically manages the computing resources required by that code.
Last Updated