Earth Institute Research Projects

Marine Geoscience Data System 2020: Optimizing Established Data Infrastructure for the Future

Lead PI: Vicki Ferrini

Unit Affiliation: Marine Geology & Geophysics, Lamont-Doherty Earth Observatory (LDEO)

November 2019 - October 2022
Project Type: Research

DESCRIPTION: The preservation of and access to scientific research data is a critical aspect of the modern scientific research enterprise that facilitates reproducibility and enables follow-on research to leverage pre-existing data products and results. We live in a data-rich era, but in many scientific disciplines, including the marine geosciences, there are vast gaps in observational data. In addition to scientific reproducibility, the spatial and temporal sparseness of marine geoscience data coupled with the costs of generating data products (from acquisition to processing and synthesis) are strong drivers for preserving and documenting data for future re-use. Although some marine geoscience data produced by scientific researchers are unique and small volume and are therefore considered to be "long-tail", when aggregated into curated collections, they can be transformed into "big data" where the whole is greater than the sum of its parts.

The Marine Geoscience Data System (MGDS) was established in 2003 as a trusted marine geoscience data repository that provides free public access to a curated collection of marine geophysical data products and complementary data related to understanding the formation and evolution of the seafloor and sub-seafloor. The system ensures that NSF-funded marine geoscience research data are Findable, Accessible, Interoperable and Reusable (FAIR), and the richness of the MGDS metadata catalog adds value to its data holdings by providing extensive disciplinary information and context that more readily enables data discovery and re-use. MGDS makes available a digital library of more than 80 terabytes of data files, and is designed to serve the needs of its science user community with a focus on (1) facilitating data discovery and access, (2) curating coherent disciplinary data collections that facilitate the creation of global syntheses, (3) ensuring the long-term preservation and stewardship of research data products, and (4) facilitating compliance with data management and sharing obligations.

This project is focused on the continued operation of the MGDS to provide ongoing public access to its curated data collections and to evolve the system to support efficient, scalable and sustainable operations. Technical enhancements will include modernizing and optimizing back-end components and the technical stack that drive the data system, as well as optimizing tools and workflows for integrating new data contributions, and streamlining front-end interfaces that enable discovery and access. The planned technical enhancements will yield operational efficiencies with respect to data curation efforts, data storage costs, and system maintenance to ensure that highly valuable observational marine geoscience data collections are maintained and accessible for ongoing access and use.