WeatherBench 2|天气预报数据集|数据驱动模型数据集

arXiv2024-01-26 更新2024-06-21 收录

天气预报

数据驱动模型

下载链接：

https://sites.research.google/weatherbench

下载链接

链接失效反馈

资源简介：

WeatherBench 2是一个用于数据驱动全球天气模型的基准数据集，旨在加速天气建模的进展。该数据集包括一个开源评估框架、公开可用的训练、地面实况和基准数据，以及一个持续更新的网站，提供最新的指标和最先进的模型。数据集支持更高分辨率的数据和评估，并增加了额外的度量标准，用于评估全球、中程（1-14天）天气预报的性能。

提供机构：

欧洲中期天气预报中心

创建时间：

2023-08-30

AI搜集汇总

数据集介绍

构建方式

WeatherBench 2 is meticulously crafted to serve as a comprehensive benchmark for the evaluation of data-driven global weather models, particularly focusing on medium-range forecasts spanning 1 to 14 days. The dataset is constructed by integrating an open-source evaluation framework, publicly accessible training and ground truth data, and a continuously updated website that provides the latest metrics and state-of-the-art models. The evaluation framework adheres to established practices for assessing weather forecasts at leading operational weather centers, ensuring a robust and standardized comparison platform.

特点

WeatherBench 2 distinguishes itself through its high-resolution data support and the inclusion of additional evaluation metrics, which are pivotal for advancing data-driven weather modeling. The dataset emphasizes probabilistic prediction, recognizing the inherent uncertainty in weather forecasting due to chaotic error growth. This focus on probabilistic metrics, such as the Continuous Ranked Probability Score (CRPS) and spread-skill ratio, ensures that the dataset is well-suited for evaluating both deterministic and probabilistic weather forecasts.

使用方法

WeatherBench 2 is designed to be a versatile tool for researchers and practitioners in the field of weather forecasting. Users can leverage the dataset to train and evaluate their models using the provided ground truth data and evaluation code. The dataset supports a wide range of variables and resolutions, allowing for comprehensive model assessments. Additionally, the dynamic and open-source nature of the framework encourages community contributions, facilitating continuous updates and improvements to the benchmark.

背景与挑战

背景概述

WeatherBench 2, an evolution of the original WeatherBench benchmark, was introduced to accelerate advancements in data-driven weather modeling. Developed by a consortium led by Google Research and Google DeepMind, in collaboration with the European Centre for Medium-Range Weather Forecasts, WeatherBench 2 aims to provide a robust evaluation framework for global, medium-range (1–14 day) weather forecasts. The dataset includes an open-source evaluation framework, publicly accessible training and ground truth data, and a continuously updated website featuring the latest metrics and state-of-the-art models. This initiative underscores the growing significance of machine learning in weather prediction, aiming to bridge the gap between traditional physical models and innovative data-driven approaches.

当前挑战

The primary challenge addressed by WeatherBench 2 is the evaluation of data-driven weather models against traditional physical models, particularly in the context of medium-range forecasts. The dataset confronts several key issues: 1) Ensuring the reliability and accuracy of data-driven models in predicting weather variables over extended periods. 2) Addressing the inherent complexity and high dimensionality of weather data, which poses significant challenges in model training and validation. 3) Balancing the need for probabilistic forecasts to account for weather's chaotic nature with the practical requirements of deterministic predictions. 4) The operational feasibility of initializing data-driven models with real-time data, as opposed to reanalysis datasets like ERA5, which are not available in live forecasting scenarios. These challenges highlight the need for a comprehensive and dynamic benchmarking framework to foster continuous improvement in data-driven weather forecasting.

常用场景

经典使用场景

WeatherBench 2 数据集在气象预报领域中被广泛用于评估和比较全球中长期（1-14天）天气预报模型的性能。其经典使用场景包括对物理模型和数据驱动模型的直接预测能力进行基准测试，特别是在高分辨率数据和复杂气象变量的预测上。通过提供公开的评估框架和基准数据，WeatherBench 2 促进了数据驱动天气建模领域的快速发展。

解决学术问题

WeatherBench 2 数据集解决了在数据驱动天气预报模型中常见的学术研究问题，如模型预测的准确性、不确定性和极端天气事件的预测能力。它通过定义一系列关键评分指标，如均方根误差（RMSE）、异常相关系数（ACC）和连续排名概率评分（CRPS），为模型的性能提供了全面的评估。这些指标基于领先的气象中心的实践，确保了评估的科学性和可靠性。

衍生相关工作

WeatherBench 2 数据集的发布催生了一系列相关的经典工作，特别是在深度学习和图神经网络在天气预报中的应用。例如，GraphCast 和 Pangu-Weather 等模型基于该数据集进行了训练和评估，展示了数据驱动方法在天气预报中的潜力。此外，该数据集还促进了混合机器学习-物理模型（如 NeuralGCM）的发展，这些模型结合了数据驱动和物理约束，以提高预测的准确性和可靠性。

以上内容由AI搜集并总结生成

用户留言

有没有相关的论文或文献参考？

这个数据集是基于什么背景创建的？

数据集的作者是谁？

能帮我联系到这个数据集的作者吗？

这个数据集如何下载？

点击留言

数据主题

具身智能

数据集 4098个

机构 8个

大模型

数据集 439个

机构 10个

无人机

数据集 37个

机构 6个

指令微调

数据集 36个

机构 6个

蛋白质结构

数据集 50个

机构 8个

空间智能

数据集 21个

机构 5个

5,000+

优质数据集

54 个

任务类型

进入经典数据集

热门数据集

UIEB, U45, LSUI

本仓库提供了水下图像增强方法和数据集的实现，包括UIEB、U45和LSUI等数据集，用于支持水下图像增强的研究和开发。

github 收录

MedDialog

MedDialog数据集（中文）包含了医生和患者之间的对话（中文）。它有110万个对话和400万个话语。数据还在不断增长，会有更多的对话加入。原始对话来自好大夫网。

github 收录

Beijing Traffic

The Beijing Traffic Dataset collects traffic speeds at 5-minute granularity for 3126 roadway segments in Beijing between 2022/05/12 and 2022/07/25.

Papers with Code 收录

Breast Cancer Dataset

该项目专注于清理和转换一个乳腺癌数据集，该数据集最初由卢布尔雅那大学医学中心肿瘤研究所获得。目标是通过应用各种数据转换技术（如分类、编码和二值化）来创建一个可以由数据科学团队用于未来分析的精炼数据集。

github 收录

CMNEE（Chinese Military News Event Extraction dataset）

CMNEE（Chinese Military News Event Extraction dataset）是国防科技大学、东南大学和清华大学联合构建的一个大规模的、基于文档标注的开源中文军事新闻事件抽取数据集。该数据集包含17,000份文档和29,223个事件，所有事件均基于预定义的军事领域模式人工标注，包括8种事件类型和11种论元角色。数据集构建遵循两阶段多轮次标注策略，首先通过权威网站获取军事新闻文本并预处理，然后依据触发词字典进行预标注，经领域专家审核后形成事件模式。随后，通过人工分批、迭代标注并持续修正，直至满足既定质量标准。CMNEE作为首个专注于军事领域文档级事件抽取的数据集，对推动相关研究具有显著意义。

github 收录