(HS 1) Toward Seamless Environmental Modeling: Integration of HydroShare with Server-side Methods for Exposing Large Datasets to Models
收藏DataONE2024-10-15 更新2025-04-26 收录
下载链接:
https://search.dataone.org/view/sha256:fd9fbead857a4e69cf5011fcd13edb83260fe3158405cad5bf4a213d987a3689
下载链接
链接失效反馈官方服务:
资源简介:
This HydroShare resource was created to support the study presented in Choi et al. (2024), titled \"Toward Reproducible and Interoperable Environmental Modeling: Integration of HydroShare with Server-side Methods for Exposing Large-Extent Spatial Datasets to Models.\" Ensuring the reproducibility of scientific studies is crucial for advancing research, with effective data management serving as a cornerstone for achieving this goal. In hydrologic and environmental modeling, spatial data is used as model input, and sharing this spatial data is a main step in the data management process. However, by focusing only on sharing data at the file level through small files rather than providing the ability to Find, Access, Interoperate with, and directly Reuse subsets of larger datasets, online data repositories have missed an opportunity to foster more reproducible science. This has led to challenges when accommodating large files that benefit from consistent data quality and seamless geographic extent.
To utilize the benefits of large datasets, the objective of the Choi et al. (2024) study was to create and test an approach for exposing large extent spatial (LES) datasets to support catchment-scale hydrologic modeling needs. GeoServer and THREDDS Data Server connected to HydroShare were used to provide seamless access to LES datasets. The approach was demonstrated using the Regional Hydro-Ecologic Simulation System (RHESSys) for three different-sized watersheds in the US. Data consistency was assessed across three different data acquisition approaches: the 'conventional' approach, which involved sharing data at the file level through small files, as well as GeoServer and THREDDS Data Server. This assessment was conducted using RHESSys to evaluate differences in model streamflow output. This approach provided an opportunity to serve datasets needed to create catchment models in a consistent way that could be accessed and processed to serve individual modeling needs. For full details on the methods and approach, please refer to Choi et al. (2024). This HydroShare resource is essential for accessing the data and workflows that were integral to the study.
This collection resource (HS 1) comprises 7 individual HydroShare resources (HS 2-8), each containing different datasets or workflows. These 7 HydroShare resources consist of the following: three resources for three state-scale LES datasets (HS 2-4), one resource with Jupyter notebooks for three different approaches and three different watersheds (HS 5), one resource for RHESSys model instances (i.e., input) of the conventional approach and observation data for all data access approaches in three different watersheds (HS 6), one resource with Jupyter notebooks for automated workflows to create LES datasets (HS 7), and finally one resource with Jupyter notebooks for the evaluation of data consistency (HS 8). More information on each resource is provided within it.
本HydroShare资源旨在支持Choi等人(2024)发表的题为《迈向可复现与可互操作的环境建模:HydroShare与服务器端方法的整合——将大尺度空间数据集开放给模型使用》的研究。确保科学研究的可复现性对于推进学术进展至关重要,而高效的数据管理是实现这一目标的基石。在水文与环境建模中,空间数据被用作模型输入,共享此类空间数据是数据管理流程中的关键步骤。然而,在线数据仓库若仅聚焦于通过小文件在文件级别共享数据,而非提供对大型数据集子集的查找(Find)、访问(Access)、互操作(Interoperate)与直接复用(Reuse)能力,则错失了促进更具可复现性科学的契机。这导致在处理需要一致数据质量与无缝地理范围的大型文件时面临挑战。
为利用大型数据集的优势,Choi等人(2024)的研究目标是构建并测试一种开放大尺度空间数据集(Large Extent Spatial, LES)的方法,以满足流域尺度水文建模的需求。研究使用了连接至HydroShare的GeoServer与THREDDS数据服务器,以提供对LES数据集的无缝访问。该方法通过美国三个不同规模流域的区域水文生态模拟系统(Regional Hydro-Ecologic Simulation System, RHESSys)进行了验证。研究针对三种不同的数据获取方法评估了数据一致性:包括通过小文件在文件级别共享数据的“传统”方法,以及基于GeoServer与THREDDS数据服务器的方法。该评估通过RHESSys开展,以分析模型径流输出的差异。此方法为以一致方式提供流域模型构建所需数据集创造了条件,这些数据集可被访问与处理,以满足个性化建模需求。有关方法与流程的完整细节,请参考Choi等人(2024)。本HydroShare资源对于获取研究不可或缺的数据与工作流至关重要。
本集合资源(HS 1)包含7个独立的HydroShare资源(HS 2-8),每个资源均涵盖不同的数据集或工作流。这7个HydroShare资源包括:三个州尺度LES数据集资源(HS 2-4)、一个包含针对三种方法与三个流域的Jupyter笔记本的资源(HS5)、一个包含传统方法的RHESSys模型实例(即输入数据)及三个流域所有数据访问方法观测数据的资源(HS6)、一个包含LES数据集自动化构建工作流Jupyter笔记本的资源(HS7),以及最后一个包含数据一致性评估Jupyter笔记本的资源(HS8)。每个资源的详细信息均在其内部提供。
创建时间:
2024-10-19



