five

Creation of a Public Dataspace for Earth System Data at Jülich Supercomputing Centre

收藏
DataCite Commons2026-05-12 更新2026-05-16 收录
下载链接:
https://b2share.fz-juelich.de/records/0045c0d4041641f4afc92c3f99cd8a02
下载链接
链接失效反馈
官方服务:
资源简介:
Jülich Supercomputing Centre (JSC) is forging a public dataspace for Earth system data. Data will be made available on both storage clusters at JSC, ExaStore and Jülich Storage Cluster (JUST), which provide petabyte-scale storage to the exascale system JUPITER and our pre-exascale systems, respectively. We provide insights in the ongoing implementation of new services for the data management as well as the selected tools for data access. This also covers the creation of a metadata catalog based on the SpatioTemporal Asset Catalog (STAC) specifications. Background: Improvements in computational speed lead to better simulations in Earth System Modeling (ESM), by allowing them to resolve scales of a few kilometers. The volume of the resulting data greatly increases with the improvements in resolution and poses challenges for data processing and storage. Currently a widespread use case gaining popularity in ESM is the training of machine learning (ML) models for weather and climate applications. They require fast access to datasets, which is supported by a special structure within the datasets with anemoi-zarr being a prominent file structure. Numerical and ML applications demand an easy and FAIR access to datasets. The simplification of subsequent data processing and analysis requires access without the necessity to create individual local copies, either through shared storage or through access over the web. JSC is a multipurpose high performance computing (HPC) center with ESM being a major user group. With Europe's first exascale system JUPITER, JSC has become the host for a second HPC infrastructure including the dedicated storage cluster ExaStore. ExaStore is designed to provide the high bandwidth, low latency and scalability required to efficiently support data-intensive workloads on JUPITER. Jülich MeteoCloud is a central data repository for meteorological data on JUST, which is accessible from our pre-exascale systems, such as JUWELS and JURECA-DC. It covers a wide range of datasets, from reanalysis data to satellite observations with the total amount of data being currently about 4PB. With the extension to ExaStore we introduce a new branch for ML-ready datasets. The limited overall storage capacity at JSC calls for a reduction of data duplicates, in particular across project data spaces, and requires services for data movement and also staging of ML-ready datasets on demand. Within the WarmWorld Easier project JSC and the German Climate Computing Center (DKRZ) co-develop and deploy services for data access. A core aspect is the findability of data, which is ensured with STAC. Each asset provides the necessary information to open the dataset described by the particular catalog entry in a specific way like, using file path when accessing from disk or URL for access through a web service. With a combination of these approaches we will improve the infrastructure for Earth system sciences at JSC and provide reliable, low-latency access to stored datasets. As a first use case we will include ML-ready datasets for the WeatherGenerator project in the MeteoCloud.
提供机构:
https://b2share.eudat.eu
创建时间:
2026-05-12
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作