STARE: SpatioTemporal Adaptive-Resolution Encoding to Unify Diverse Earth Science Data for Integrative Analysis
Principal Investigator (PI): Michael Rilee, Rilee Systems Technologies, LLC.
Co-Investigators: Kwo-sen Kuo , Bayesics, LLC; James Frew , UC Santa Barbara; James Gallagher, OPeNDAP
With SpatioTemporal Adaptive-Resolution Encoding (STARE) we address ACCESS Focus 2.1.3 "Cloud Optimized Preprocessing and Data Transformation." Current Earth Science data processing features large, centralized archives providing exceptional browse and search capabilities used by researchers who identify then download data in file form to local compute/storage resources for preprocessing and integration prior to analysis. This data flow forces end-users to devote scarce resources and considerable time to support the transfer, storage, and management of archived data, as well as specialist expertise in the various different kinds of data sets of their research domain. We propose to simplify this flow by moving preprocessing activities to the archived data, eliminating the costs of transferring, creating, and maintaining redundant, idiosyncratic local archives, developed by researchers who are generally not archivists, nor the expert producers of the original data. With STARE providing a unifying platform for diverse data models (swath, point, grid), Earth Observing System Data and Information System (EOSDIS) data archives will be able to produce higher-level products made to order for end-user researchers.
The critical, new technology we apply is STARE. STARE's spatial component (SC) has descended from the Hierarchical Triangular Mesh (HTM) spherical indexing originally developed for the Sloan Digital Sky Survey, in which storage and computational efficiency was key. The STARE/SC recursively divides the Earth's surface into a set of quad-trees allowing any point on Earth to be identified with a single number. The STARE temporal component (TC) has similar properties. For observations, these STARE indices contain both location and resolution information, promoting efficient data placement on distributed, cloud resources minimizing costly data transport between nodes for operations such as joining, intersecting, (conditional) subsetting, and re-gridding diverse datasets.
STARE automatically co-aligns diverse data in the cloud, placing spatiotemporally close data on the same compute/storage node, for a relatively small cost in metadata. STARE thus allows diverse data to be efficiently integrated for analysis in the cloud, providing a foundation on which existing tools and processing methods can be placed. Much capability, e.g. preprocessing, searching, visualization, etc., has been developed to support researchers' use of Earth Science data. In the course of the proposed work, we will show how existing tools and methods benefit from the STARE-enabled platform. This can be via a tight integration as has been done, for example, incorporating STARE with the distributed array database SciDB, along with re-gridding functions, and fast parallel, geographic intersections. Or current tools may simply be applied to the results of STARE-enabled distributed processing, e.g. fast granule intersection, in a more conventional, but cloud-based, data processing flow.
As a unifying platform, STARE supports conventional processing, analysis, and visualization tools, bringing the opportunity for massively increasing the amount of data researchers can use. In the longer term, as tools evolve to take greater advantage of STARE's integrative capabilities, we can move from the current focus on the expensive low-level manipulation of data files to an ability to interact with Earth Science data at a higher level, with query-based declarative tools and user interfaces that favor scientific inquiry rather than data management. At the very least, STARE helps automate critical spatiotemporal functions while making efficient use of cloud computing, promising to eliminate the need for researchers to devote time, money, and expertise to the redundant transfer of archived data to their own, local systems. The time and effort saved improves scientific quality and the productivity of the current researchers and reduces the cost-of-entry for others who might seek value from EOSDIS data resources.
October 2019 Update
The STARE data variety harmonizing technology is maturing on schedule and is now ready for testing by interested developers and users. STARE has demonstrated its potential to address challenges associated with the variety and volume of big data. It is also adaptive to different compute-storage architectures. STARE will be tested in use cases during year 2. The technology will reduce processing time and is poised to flip the notorious 80/20 dilemma plaguing data science endeavors.
Year one accomplishments:
- Core STARE library and API functions are in place.
- Many science usability functions have been implemented, mostly spatial.
- PySTARE is functional for experimental scientific work.
- OPeNDAP Hyrax integration has started.
- UCSB snow cover science use case is in progress.
Publications & Presentations
Kuo, K.S., Yu, H., Pan, Yu, & Rilee, M. (2019). Leveraging STARE for Co-aligned Data Locality with netCDF and Python MPI. Presented at IGARS, Yokohama, Japan.
Rilee, M., Kuo, K.S., Frew, J., Griessbaum, N., Gallagher, & J. Neumiller, K. (2019). STARE Compatibility. Presented at ESIP Summer Meeting, Tacoma, WA.
Kuo, K.S. et al. (2019). Best-value Data-intensive Analysis Architecture Deduced Using ‘Geo-lly’ Beans. Presented at ESIP Summer Meeting, Tacoma, WA.
Kuo, K.S. et al. (2019). STARE and data packaging. Presented at ESIP Summer Meeting, Tacoma, WA.
Rilee, M. L., & Kuo, K. S. (2018). The Impact on Quality and Uncertainty of Regridding Diverse Earth Science Data for Integrative Analysis. Presented at AGU Fall Meeting.
Bauer, M., Kuo, K. S., Oloso, A., & Rilee, M. L. (2018). Exploring the Spatio-temporal Connectivity of Blizzard Conditions and Mid-latitude Cyclones: A Template for a Process-based Workflow. Presented at AGU Fall Meeting.
Last Updated: Nov 20, 2019 at 12:38 PM EST