Systematic Data Transformation to Enable Web Coverage Services (WCS) and ArcGIS Image Services within ESDIS Cumulus Cloud
Principal Investigator (PI): Jeff Walter, NASA's Langley Research Center
Co-Investigators: A. Jason Barnett, Brian Tisdale, Booz Allen Hamilton (BAH), NASA LaRC Atmospheric Science Data Center (ASDC); Hyo-Kyung (Joe) Lee, The HDF Group; Matthew Tisdale, NASA LaRC Atmospheric Science Data Center (ASDC) DAAC
NASA's Earth Science Data and Information System (ESDIS) Project has recently launched the Cumulus prototype which provides a scalable cloud-based platform to ingest, archive, distribute, and manage Earth Science data within the Amazon cloud environment. One of the goals for the Cumulus project is to increase the flexibility and effectiveness of NASA Earth Science data access and distribution to end users.
Earth science data is geospatial in nature. However, according to the Geospatial Data Abstraction Library (GDAL) Enhancements for ESDIS (GEE) Assessment ("GDAL Enhancements," 2016), many Earth science data products (e.g., Measurement of Pollution in the Troposphere (MOPITT), and Clouds and the Earth's Radiant Energy System (CERES), to name a few) are difficult to access and use within commercial off-the-shelf (COTS) and open source geographic information systems (GIS) software, such as Esri's ArcGIS and the open source QGIS software.
According to the 2017 American Customer Satisfaction Index (ACSI) survey, ArcGIS is the most used software tool/package at 64%, followed by QGIS (37%), ENVI (32%), and Excel (27%) to work with NASA Earth science data ("ACSI Reports," 2018). We are developing geospatial data transformation plugins that can be used within the ESDIS Cumulus environment to serve out transformed MOPITT and CERES data products as OGC Web Coverage Services (WCS) and Esri ArcGIS Image Services that can be easily consumed into COTS GIS software such as ArcGIS and QGIS. These plugins will perform the transformation to fix any data issues (e.g., incorrect image sizes, orientation, multidimensional variable interpretation, georeferenced metadata recognition, etc.) and this extensible framework can be further expanded to add new plugins as additional issues are identified and addressed.
In addition, ArcGIS Image Services and Web Coverage Services provide easy access to actual data values, not just static images, and alleviate the need to navigate complex and storage-intensive processes of raw data downloads. The importance of delivering geospatially enabled web services is emphasized through initiatives like the establishment of NASA's Geospatial Web Services Working Group (GWSWG) (part of the Earth Science Data System Working Group (ESDSWG)), NASA's Big Earth Data Initiative (BEDI), and the ESDIS Cumulus Seamless 360 Degrees of Services that includes the implementation of a common user-facing API such as WCS. Developing these geospatial data transformation plugins and subsequently providing the corrected data as OGC and Esri web services, will remove the barrier of transforming each NASA data product after download since it will be correctly served, leading to more easily accessible NASA data products by the Earth Science community.
This project will utilize Amazon Web Services (AWS) Step Functions and Lambda Functions as also utilized by the ESDIS Cumulus prototype, though if the ESDIS Cumulus system is not operationalized, the project can be decoupled and operate independently within a commercial cloud environment. We will approach this problem in three phases. Phase one, planning, will focus on the AWS workflow structure. Phase two, development and implementation, will deploy the AWS Step Functions and Lambda Functions used to orchestrate a workflow of customized micro-services executing GDAL transformations in order to geospatially enable and serve a new cloud-optimized MetaRaster Format (MRF) product as an OGC Web Coverage Service and ArcGIS Image Service. Phase three, testing, will allow us to measure the discoverability of NASA Earth Science data web services within major data catalogs (e.g., Earthdata Search, GeoPlatform, Atmospheric Science Data Center (ASDC) ArcGIS Portal, etc.), and their use, speed, accessibility, and analytical capabilities within QGIS, ArcGIS, custom web mapping applications, and Data Cubes.
Update October 2019
AWS Lambda and Step Functions
One of the major functions of this proposal work was to develop a series of lambda functions that can easily be linked together within AWS Step Functions that can be used as Cumulus Workflows. Currently, there are 12 developmental lambda functions utilizing the Serverless Framework covering a variety of different transformation options (Table 1).
In alphabetical order
|cmr2ncml||Generate NcML based on CMR search result. Write NcML to ncml-lambda bucket to trigger ncml lambda function.|
|cog2geos||Update GeoServer Mosaic dataset in AWS RDS via GeoServer REST API call.|
|mdcs||Run MDCS batch script to create mosaic dataset on ArcGIS Windows Server.|
|mop02j2tif||Convert MOP02 Swath product on S3 to GeoTIFF using the (unreleased) latest GDAL VSIS3.|
|mop2tif||Convert MOP03 Grid product on S3 to GeoTIFF using the GEE VSIS3 driver.|
|mrf2vrt||Creates 24 individual VRT files from a 24 hour MRF for GeoServer Mosaic dataset.|
|ncml2mrf2||Convert a CERES NcML to MRF and uploads to ceres-mrf S3 bucket.|
|ncml2mrf||Convert an hourly NcML to MRF and uploads to a user specified S3 bucket.|
|ncml2tif||Convert a CERES NcML to GeoTIFF using Hyrax and GDAL DODS driver.|
|ncml||Make ssh conection to Hyrax server and synchronize local ncml directory with ncml2tif and ncml2mrf S3 buckets.|
|rasterproxy||Creates 24 individual raster proxy XML files from a 24 hour MRF and uploads to a user specified S3 bucket|
|tif2cog||Convert a standard GeoTIFF to Cloud Optimized GeoTiff (CoG).|
In addition, the team has implemented two lambda function layers that allow the version control of GDAL libraries used in the lambda functions as well as save time and space for deploying commonly used libraries like NetCDF-4 for Python
The team successfully created a complete lambda-to-lambda workflow that generated a Geoserver Mosaic dataset from a MOPITT Level 3 product. The workflow was triggered from a granule being placed in an S3 bucket and transformed into a new file format and written to the Geoserver Mosaic. The mosaic was served as an OGC Web Coverage Service from Geoserver and verified with QGIS that the time information was stored correctly within the AWS Remote Database Service (RDS).
The team was also successful in chaining several of the new lambda functions into an AWS Step Function that launches an AWS state machine. This serverless-based workflow makes a web service call to OPeNDAP Hyrax to retrieve a CERES SYN1Deg data product. The final result is a raster mosaic management script on the EC2 instance, and the update of an ArcGIS Image Service.
Last Updated: Nov 12, 2019 at 8:40 AM EST