five

shermansiu/dm_graphcast_datasets

收藏
Hugging Face2023-12-29 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/shermansiu/dm_graphcast_datasets
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-4.0 tags: - weather-forecasting - climate language: - en pretty_name: ECMWF's ERA5, HRES, (and fake) data, formatted for DeepMind GraphCast configs: - config_name: source-era5_date-2022-01-01_res-0.25_levels-13_steps-01 data_files: "dataset/source-era5_date-2022-01-01_res-0.25_levels-13_steps-01.nc" - config_name: source-era5_date-2022-01-01_res-0.25_levels-13_steps-04 data_files: "dataset/source-era5_date-2022-01-01_res-0.25_levels-13_steps-04.nc" - config_name: source-era5_date-2022-01-01_res-0.25_levels-13_steps-12 data_files: "dataset/source-era5_date-2022-01-01_res-0.25_levels-13_steps-12.nc" - config_name: source-era5_date-2022-01-01_res-0.25_levels-13_steps-12 data_files: "dataset/source-era5_date-2022-01-01_res-0.25_levels-13_steps-12.nc" - config_name: source-era5_date-2022-01-01_res-0.25_levels-37_steps-01 data_files: "dataset/source-era5_date-2022-01-01_res-0.25_levels-37_steps-01.nc" - config_name: source-era5_date-2022-01-01_res-0.25_levels-37_steps-04 data_files: "dataset/source-era5_date-2022-01-01_res-0.25_levels-37_steps-04.nc" - config_name: source-era5_date-2022-01-01_res-0.25_levels-37_steps-12 data_files: "dataset/source-era5_date-2022-01-01_res-0.25_levels-37_steps-12.nc" - config_name: source-era5_date-2022-01-01_res-1.0_levels-13_steps-01 data_files: "dataset/source-era5_date-2022-01-01_res-1.0_levels-13_steps-01.nc" - config_name: source-era5_date-2022-01-01_res-1.0_levels-13_steps-04 data_files: "dataset/source-era5_date-2022-01-01_res-1.0_levels-13_steps-04.nc" - config_name: source-era5_date-2022-01-01_res-1.0_levels-13_steps-12 data_files: "dataset/source-era5_date-2022-01-01_res-1.0_levels-13_steps-12.nc" - config_name: source-era5_date-2022-01-01_res-1.0_levels-13_steps-20 data_files: "dataset/source-era5_date-2022-01-01_res-1.0_levels-13_steps-20.nc" - config_name: source-era5_date-2022-01-01_res-1.0_levels-13_steps-40 data_files: "dataset/source-era5_date-2022-01-01_res-1.0_levels-13_steps-40.nc" - config_name: source-era5_date-2022-01-01_res-1.0_levels-37_steps-01 data_files: "dataset/source-era5_date-2022-01-01_res-1.0_levels-37_steps-01.nc" - config_name: source-era5_date-2022-01-01_res-1.0_levels-37_steps-04 data_files: "dataset/source-era5_date-2022-01-01_res-1.0_levels-37_steps-04.nc" - config_name: source-era5_date-2022-01-01_res-1.0_levels-37_steps-12 data_files: "dataset/source-era5_date-2022-01-01_res-1.0_levels-37_steps-12.nc" - config_name: source-era5_date-2022-01-01_res-1.0_levels-37_steps-20 data_files: "dataset/source-era5_date-2022-01-01_res-1.0_levels-37_steps-20.nc" --- # ECMWF's ERA5, HRES, (and fake) data, formatted for DeepMind GraphCast Original files are from this Google Cloud Bucket: https://console.cloud.google.com/storage/browser/dm_graphcast This repo contains both the `dataset` and `stats` files needed for GraphCast inference. ## License and Attribution ECMWF data products are subject to the following terms: 1. Copyright statement: Copyright "© 2023 European Centre for Medium-Range Weather Forecasts (ECMWF)". 2. Source www.ecmwf.int 3. Licence Statement: ECMWF data is published under a Creative Commons Attribution 4.0 International (CC BY 4.0). https://creativecommons.org/licenses/by/4.0/ 4. Disclaimer: ECMWF does not accept any liability whatsoever for any error or omission in the data, their availability, or for any loss or damage arising from their use. ## Usage Use the Huggingface Hub file system to load files. The `datasets` library doesn't support netCDF files yet. ```python from huggingface_hub import HfFileSystem, hf_hub_download import xarray fs = HfFileSystem() files = [ file.rsplit("/", 1)[1] for file in fs.ls("datasets/shermansiu/dm_graphcast_datasets/dataset", detail=False) ] local_file: str = hf_hub_download(repo_id="shermansiu/dm_graphcast_datasets", filename=f"dataset/{files[0]}", repo_type="dataset") with open(local_file, "rb") as f: example_batch = xarray.load_dataset(f).compute() ``` ## Citation - Paper: https://www.science.org/doi/10.1126/science.adi2336 - Preprint: https://arxiv.org/abs/2212.12794 ``` @article{ doi:10.1126/science.adi2336, author = {Remi Lam and Alvaro Sanchez-Gonzalez and Matthew Willson and Peter Wirnsberger and Meire Fortunato and Ferran Alet and Suman Ravuri and Timo Ewalds and Zach Eaton-Rosen and Weihua Hu and Alexander Merose and Stephan Hoyer and George Holland and Oriol Vinyals and Jacklynn Stott and Alexander Pritzel and Shakir Mohamed and Peter Battaglia }, title = {Learning skillful medium-range global weather forecasting}, journal = {Science}, volume = {382}, number = {6677}, pages = {1416-1421}, year = {2023}, doi = {10.1126/science.adi2336}, URL = {https://www.science.org/doi/abs/10.1126/science.adi2336}, eprint = {https://www.science.org/doi/pdf/10.1126/science.adi2336}, abstract = {Global medium-range weather forecasting is critical to decision-making across many social and economic domains. Traditional numerical weather prediction uses increased compute resources to improve forecast accuracy but does not directly use historical weather data to improve the underlying model. Here, we introduce GraphCast, a machine learning–based method trained directly from reanalysis data. It predicts hundreds of weather variables for the next 10 days at 0.25° resolution globally in under 1 minute. GraphCast significantly outperforms the most accurate operational deterministic systems on 90\% of 1380 verification targets, and its forecasts support better severe event prediction, including tropical cyclone tracking, atmospheric rivers, and extreme temperatures. GraphCast is a key advance in accurate and efficient weather forecasting and helps realize the promise of machine learning for modeling complex dynamical systems. The numerical models used to predict weather are large, complex, and computationally demanding and do not learn from past weather patterns. Lam et al. introduced a machine learning–based method that has been trained directly from reanalysis data of past atmospheric conditions. In this way, the authors were able to quickly predict hundreds of weather variables globally up to 10 days in advance and at high resolution. Their predictions were more accurate than those of traditional weather models in 90\% of tested cases and displayed better severe event prediction for tropical cyclones, atmospheric rivers, and extreme temperatures. —H. Jesse Smith Machine learning leads to better, faster, and cheaper weather forecasting.}} ```

许可协议:CC BY 4.0(知识共享署名4.0国际许可协议) 标签:天气预报、气候 语言:英语 友好名称:适配DeepMind GraphCast的欧洲中期天气预报中心(ECMWF)ERA5、HRES(及模拟)数据 配置项: - 配置名称:source-era5_date-2022-01-01_res-0.25_levels-13_steps-01,数据文件:"dataset/source-era5_date-2022-01-01_res-0.25_levels-13_steps-01.nc" - 配置名称:source-era5_date-2022-01-01_res-0.25_levels-13_steps-04,数据文件:"dataset/source-era5_date-2022-01-01_res-0.25_levels-13_steps-04.nc" - 配置名称:source-era5_date-2022-01-01_res-0.25_levels-13_steps-12,数据文件:"dataset/source-era5_date-2022-01-01_res-0.25_levels-13_steps-12.nc" - 配置名称:source-era5_date-2022-01-01_res-0.25_levels-13_steps-12,数据文件:"dataset/source-era5_date-2022-01-01_res-0.25_levels-13_steps-12.nc" - 配置名称:source-era5_date-2022-01-01_res-0.25_levels-37_steps-01,数据文件:"dataset/source-era5_date-2022-01-01_res-0.25_levels-37_steps-01.nc" - 配置名称:source-era5_date-2022-01-01_res-0.25_levels-37_steps-04,数据文件:"dataset/source-era5_date-2022-01-01_res-0.25_levels-37_steps-04.nc" - 配置名称:source-era5_date-2022-01-01_res-0.25_levels-37_steps-12,数据文件:"dataset/source-era5_date-2022-01-01_res-0.25_levels-37_steps-12.nc" - 配置名称:source-era5_date-2022-01-01_res-1.0_levels-13_steps-01,数据文件:"dataset/source-era5_date-2022-01-01_res-1.0_levels-13_steps-01.nc" - 配置名称:source-era5_date-2022-01-01_res-1.0_levels-13_steps-04,数据文件:"dataset/source-era5_date-2022-01-01_res-1.0_levels-13_steps-04.nc" - 配置名称:source-era5_date-2022-01-01_res-1.0_levels-13_steps-12,数据文件:"dataset/source-era5_date-2022-01-01_res-1.0_levels-13_steps-12.nc" - 配置名称:source-era5_date-2022-01-01_res-1.0_levels-13_steps-20,数据文件:"dataset/source-era5_date-2022-01-01_res-1.0_levels-13_steps-20.nc" - 配置名称:source-era5_date-2022-01-01_res-1.0_levels-13_steps-40,数据文件:"dataset/source-era5_date-2022-01-01_res-1.0_levels-13_steps-40.nc" - 配置名称:source-era5_date-2022-01-01_res-1.0_levels-37_steps-01,数据文件:"dataset/source-era5_date-2022-01-01_res-1.0_levels-37_steps-01.nc" - 配置名称:source-era5_date-2022-01-01_res-1.0_levels-37_steps-04,数据文件:"dataset/source-era5_date-2022-01-01_res-1.0_levels-37_steps-04.nc" - 配置名称:source-era5_date-2022-01-01_res-1.0_levels-37_steps-12,数据文件:"dataset/source-era5_date-2022-01-01_res-1.0_levels-37_steps-12.nc" - 配置名称:source-era5_date-2022-01-01_res-1.0_levels-37_steps-20,数据文件:"dataset/source-era5_date-2022-01-01_res-1.0_levels-37_steps-20.nc" # 适配DeepMind GraphCast的ECMWF ERA5、HRES(及模拟)数据 原始文件来源于该Google Cloud存储桶:https://console.cloud.google.com/storage/browser/dm_graphcast 本仓库包含GraphCast推理所需的全部`dataset`与`stats`文件。 ## 许可与署名 ECMWF的数据产品遵循以下条款: 1. 版权声明:版权归"© 2023 欧洲中期天气预报中心(ECMWF)"所有。 2. 来源:www.ecmwf.int 3. 许可声明:ECMWF数据采用知识共享署名4.0国际许可协议(CC BY 4.0)发布,链接:https://creativecommons.org/licenses/by/4.0/ 4. 免责声明:ECMWF不对数据中的任何错误、遗漏、可用性问题,或因使用该数据导致的任何损失或损害承担任何责任。 ## 使用方法 请使用Huggingface Hub文件系统加载文件。目前`datasets`库尚不支持netCDF(网络通用数据格式)文件。 python from huggingface_hub import HfFileSystem, hf_hub_download import xarray fs = HfFileSystem() files = [ file.rsplit("/", 1)[1] for file in fs.ls("datasets/shermansiu/dm_graphcast_datasets/dataset", detail=False) ] local_file: str = hf_hub_download(repo_id="shermansiu/dm_graphcast_datasets", filename=f"dataset/{files[0]}", repo_type="dataset") with open(local_file, "rb") as f: example_batch = xarray.load_dataset(f).compute() ## 引用 - 期刊论文:https://www.science.org/doi/10.1126/science.adi2336 - 预印本:https://arxiv.org/abs/2212-12794 bibtex @article{ doi:10.1126/science.adi2336, author = {Remi Lam and Alvaro Sanchez-Gonzalez and Matthew Willson and Peter Wirnsberger and Meire Fortunato and Ferran Alet and Suman Ravuri and Timo Ewalds and Zach Eaton-Rosen and Weihua Hu and Alexander Merose and Stephan Hoyer and George Holland and Oriol Vinyals and Jacklynn Stott and Alexander Pritzel and Shakir Mohamed and Peter Battaglia }, title = {Learning skillful medium-range global weather forecasting}, journal = {Science}, volume = {382}, number = {6677}, pages = {1416-1421}, year = {2023}, doi = {10.1126/science.adi2336}, URL = {https://www.science.org/doi/abs/10.1126/science.adi2336}, eprint = {https://www.science.org/doi/pdf/10.1126/science.adi2336}, abstract = {Global medium-range weather forecasting is critical to decision-making across many social and economic domains. Traditional numerical weather prediction uses increased compute resources to improve forecast accuracy but does not directly use historical weather data to improve the underlying model. Here, we introduce GraphCast, a machine learning–based method trained directly from reanalysis data. It predicts hundreds of weather variables for the next 10 days at 0.25° resolution globally in under 1 minute. GraphCast significantly outperforms the most accurate operational deterministic systems on 90% of 1380 verification targets, and its forecasts support better severe event prediction, including tropical cyclone tracking, atmospheric rivers, and extreme temperatures. GraphCast is a key advance in accurate and efficient weather forecasting and helps realize the promise of machine learning for modeling complex dynamical systems. The numerical models used to predict weather are large, complex, and computationally demanding and do not learn from past weather patterns. Lam et al. introduced a machine learning–based method that has been trained directly from reanalysis data of past atmospheric conditions. In this way, the authors were able to quickly predict hundreds of weather variables globally up to 10 days in advance and at high resolution. Their predictions were more accurate than those of traditional weather models in 90% of tested cases and displayed better severe event prediction for tropical cyclones, atmospheric rivers, and extreme temperatures. —H. Jesse Smith Machine learning leads to better, faster, and cheaper weather forecasting.}}
提供机构:
shermansiu
原始信息汇总

ECMWFs ERA5, HRES, (and fake) data, formatted for DeepMind GraphCast

数据集概述

该数据集包含ECMWF的ERA5、HRES数据以及模拟数据,格式化用于DeepMind GraphCast。数据集包括多个配置文件,每个配置文件对应不同的日期、分辨率、层次和时间步长。

数据集配置

以下是数据集的具体配置信息:

  • config_name: source-era5_date-2022-01-01_res-0.25_levels-13_steps-01

    • data_files: "dataset/source-era5_date-2022-01-01_res-0.25_levels-13_steps-01.nc"
  • config_name: source-era5_date-2022-01-01_res-0.25_levels-13_steps-04

    • data_files: "dataset/source-era5_date-2022-01-01_res-0.25_levels-13_steps-04.nc"
  • config_name: source-era5_date-2022-01-01_res-0.25_levels-13_steps-12

    • data_files: "dataset/source-era5_date-2022-01-01_res-0.25_levels-13_steps-12.nc"
  • config_name: source-era5_date-2022-01-01_res-0.25_levels-37_steps-01

    • data_files: "dataset/source-era5_date-2022-01-01_res-0.25_levels-37_steps-01.nc"
  • config_name: source-era5_date-2022-01-01_res-0.25_levels-37_steps-04

    • data_files: "dataset/source-era5_date-2022-01-01_res-0.25_levels-37_steps-04.nc"
  • config_name: source-era5_date-2022-01-01_res-0.25_levels-37_steps-12

    • data_files: "dataset/source-era5_date-2022-01-01_res-0.25_levels-37_steps-12.nc"
  • config_name: source-era5_date-2022-01-01_res-1.0_levels-13_steps-01

    • data_files: "dataset/source-era5_date-2022-01-01_res-1.0_levels-13_steps-01.nc"
  • config_name: source-era5_date-2022-01-01_res-1.0_levels-13_steps-04

    • data_files: "dataset/source-era5_date-2022-01-01_res-1.0_levels-13_steps-04.nc"
  • config_name: source-era5_date-2022-01-01_res-1.0_levels-13_steps-12

    • data_files: "dataset/source-era5_date-2022-01-01_res-1.0_levels-13_steps-12.nc"
  • config_name: source-era5_date-2022-01-01_res-1.0_levels-13_steps-20

    • data_files: "dataset/source-era5_date-2022-01-01_res-1.0_levels-13_steps-20.nc"
  • config_name: source-era5_date-2022-01-01_res-1.0_levels-13_steps-40

    • data_files: "dataset/source-era5_date-2022-01-01_res-1.0_levels-13_steps-40.nc"
  • config_name: source-era5_date-2022-01-01_res-1.0_levels-37_steps-01

    • data_files: "dataset/source-era5_date-2022-01-01_res-1.0_levels-37_steps-01.nc"
  • config_name: source-era5_date-2022-01-01_res-1.0_levels-37_steps-04

    • data_files: "dataset/source-era5_date-2022-01-01_res-1.0_levels-37_steps-04.nc"
  • config_name: source-era5_date-2022-01-01_res-1.0_levels-37_steps-12

    • data_files: "dataset/source-era5_date-2022-01-01_res-1.0_levels-37_steps-12.nc"
  • config_name: source-era5_date-2022-01-01_res-1.0_levels-37_steps-20

    • data_files: "dataset/source-era5_date-2022-01-01_res-1.0_levels-37_steps-20.nc"

许可证和归属

ECMWF数据产品受以下条款约束:

  1. 版权声明:Copyright "© 2023 European Centre for Medium-Range Weather Forecasts (ECMWF)"。
  2. 来源:www.ecmwf.int
  3. 许可证声明:ECMWF数据发布于Creative Commons Attribution 4.0 International (CC BY 4.0)。
  4. 免责声明:ECMWF不承担任何因数据错误或遗漏、数据可用性或数据使用导致的任何损失或损害的责任。
搜集汇总
数据集介绍
main_image_url
构建方式
在气象科学领域,高精度数值天气预报依赖于大规模历史数据的深度挖掘。该数据集以欧洲中期天气预报中心(ECMWF)的ERA5再分析资料与高分辨率预报系统(HRES)数据为核心来源,经过系统化重构与格式化处理,专为DeepMind GraphCast模型设计。其构建过程严格遵循科学数据规范,将原始全球气象要素网格数据转换为标准化的NetCDF格式,并通过多维度配置实现时空分辨率的灵活组合,涵盖0.25度与1.0度空间分辨率、13与37垂直层级以及1至40个时间步长的多种数据形态,为机器学习模型提供了结构化的训练与验证基础。
特点
该数据集在气象机器学习领域展现出鲜明的专业特性。其采用多层次网格化结构,同时整合地表与高空大气变量,形成覆盖热力、动力及水汽过程的立体数据体系。数据文件以标准化NetCDF格式存储,兼容主流气象分析工具链,并配备完整的统计元数据文件,支持模型推理过程的标准化预处理。通过提供不同时空分辨率与垂直层级的组合配置,该数据集能够适应多尺度天气系统的研究需求,特别是为图神经网络架构提供了契合网格拓扑的数据组织形式,显著提升了气象要素时空演化规律的学习效率。
使用方法
针对当前机器学习气象预测模型的应用需求,该数据集需通过专业技术流程进行加载与处理。用户应借助HuggingFace Hub文件系统接口访问数据仓库,利用hf_hub_download函数下载指定配置的NetCDF文件。由于datasets库暂不支持该格式,需配合xarray科学计算库进行数据读取与解码,通过load_dataset方法将气象变量加载为多维数据数组结构。实际应用中建议根据研究目标选择对应分辨率、垂直层级与时间步长的数据配置,并结合GraphCast论文提供的预处理流程进行归一化处理,最终形成适用于神经网络训练与推理的标准化数据批次。
背景与挑战
背景概述
在气象科学与人工智能交叉领域,全球中期天气预报的精准度提升一直是核心研究议题。2023年,DeepMind团队推出了GraphCast模型,其依托欧洲中期天气预报中心(ECMWF)的ERA5再分析数据,构建了专门用于机器学习训练的数据集。该数据集由研究人员Remi Lam等人主导创建,旨在通过历史天气数据直接优化预报模型,突破传统数值预报依赖计算资源增长的局限。其以0.25°至1.0°的空间分辨率、13或37垂直层级的多时间步配置,为全球天气变量预测提供了结构化基础,显著推动了数据驱动方法在复杂动力系统建模中的应用,并在热带气旋追踪等极端事件预测中展现出变革性影响。
当前挑战
该数据集致力于解决全球中期天气预报中高分辨率、多变量协同预测的难题,其挑战在于如何从海量历史再分析数据中提取有效时空模式,以提升预报精度并降低计算成本。构建过程中,数据整合面临多重障碍:ERA5原始数据规模庞大,需经过复杂的重格式化为NetCDF文件,以适应GraphCast的图神经网络架构;不同分辨率与垂直层级的配置要求精细的数据对齐与一致性处理;同时,数据版权与使用许可遵循CC BY 4.0协议,需严格规范来源标注与免责声明,确保学术应用的合规性。此外,当前工具生态如HuggingFace datasets库对NetCDF格式支持有限,增加了数据加载与分发的技术复杂度。
常用场景
经典使用场景
在气象科学领域,高精度、高效率的全球天气预报一直是核心挑战。该数据集专为DeepMind GraphCast模型设计,整合了欧洲中期天气预报中心(ECMWF)的ERA5再分析数据和高分辨率预报(HRES)数据,以标准化格式提供多分辨率、多层次的大气变量序列。其经典使用场景在于训练和验证基于图神经网络的机器学习模型,用于全球中短期天气预测。研究人员利用该数据集构建时空图结构,模拟大气动力学过程,实现从历史数据中学习复杂天气模式,从而生成未来10天、0.25°分辨率的全球天气预报,显著提升预测速度和准确性。
解决学术问题
传统数值天气预报依赖物理方程和大量计算资源,难以直接从历史数据中学习改进模型。该数据集通过提供结构化、高质量的大气再分析数据,解决了机器学习在气象建模中数据标准化和可访问性的关键问题。它支持研究如何利用图神经网络等先进方法捕捉大气变量的时空依赖关系,突破传统方法在计算效率和预测精度上的瓶颈。其意义在于推动了数据驱动气象学的发展,为复杂动力系统的建模提供了新范式,促进了人工智能与地球科学的交叉融合,对提升极端天气事件(如热带气旋、大气河流)的预测能力具有深远影响。
衍生相关工作
基于该数据集衍生的经典工作以DeepMind的GraphCast模型为代表,该模型在《Science》上发表,标志着机器学习在全球天气预报中的突破性进展。相关研究扩展了图神经网络在时空预测中的应用,催生了一系列改进工作,如优化图结构设计、融合多源数据以及开发更高效的训练算法。这些工作不仅推动了气象AI模型的发展,还启发了其他地球科学领域(如海洋学、气候建模)采用类似方法。此外,开源的数据格式和模型促进了学术界的广泛验证与创新,形成了以数据驱动为核心的气象研究新生态。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作