five

DABench|天气预测数据集|数据同化数据集

收藏
arXiv2024-08-21 更新2024-08-23 收录
天气预测
数据同化
下载链接:
https://github.com/your-repo-link-here
下载链接
链接失效反馈
资源简介:
DABench数据集是由国防科技大学和上海人工智能实验室联合创建,专门用于数据驱动的天气数据同化研究。该数据集基于ERA5再分析数据,包含多种天气变量的模拟观测和背景场数据,支持从小时到年度的多尺度分析。数据集的创建过程结合了观测系统模拟实验方法和神经网络技术,旨在为天气预测模型提供精确的初始场,从而提高预测准确性。DABench数据集主要应用于天气预测领域,特别是中长期天气预报,通过提供标准化的数据和评估方法,推动数据驱动天气预测技术的发展。
提供机构:
国防科技大学, 长沙, 中国 上海人工智能实验室, 上海, 中国
创建时间:
2024-08-21
AI搜集汇总
数据集介绍
main_image_url
构建方式
DABench数据集的构建基于ERA5数据,旨在为数据驱动天气预测系统提供基准。数据集由稀疏和带噪声的模拟观测数据、背景场、标准化评估指标和一个强大的基线模型DaT组成。观测数据通过在ERA5数据上添加高斯噪声模拟生成,以模拟真实世界中的观测误差。背景场由预训练的天气预测模型Sformer生成,用于评估数据同化结果对预测的影响。DaT模型将四维变分数据同化(4DVar)的先验知识集成到Transformer模型中,并在物理状态重建方面优于最先进的4DVarNet模型。
特点
DABench数据集具有四个标准特性:1)稀疏和带噪声的模拟观测数据;2)一个用于生成背景场的熟练预训练天气预测模型;3)用于模型比较的标准化评估指标;4)一个强大的基线模型DaT。DaT模型通过利用4DVar成本函数的梯度来聚合观测信息,从而提高模型利用观测数据的能力。
使用方法
DABench数据集可用于开发和测试机器学习模型,特别是用于天气数据同化的模型。数据集包含模拟观测数据、背景场和标准化评估指标,使研究人员能够公平地评估和比较不同的数据驱动数据同化算法。DaT模型可以作为基准,用于评估研究人员开发的模型性能,并推动数据驱动天气预测系统的发展。
背景与挑战
背景概述
DABench数据集的创建旨在解决数据驱动天气数据同化(DA)领域的研究挑战。该数据集由国防科技大学和上海人工智能实验室的研究人员于2024年提出,旨在为数据驱动DA算法提供一个标准的基准数据集。DABench利用ERA5数据作为真实值,为端到端数据驱动天气预测系统的开发提供指导。该数据集提供了四个标准特性:稀疏和噪声模拟观测、具有预测背景场的技能预测模型、标准化评估指标以及一个强大的基线模型DaT。DaT将四维变分DA先验知识集成到Transformer模型中,并在物理状态重建方面优于最先进的4DVarNet模型。DABench的提出对于推动数据驱动天气预测系统的发展具有重要意义,为研究人员提供了一个用于开发和比较DA模型的平台。
当前挑战
DABench数据集所面临的挑战主要包括:1) 数据集构建过程中模拟观测的稀疏性和噪声性,这要求算法能够有效地处理不完整的观测数据;2) 在数据同化过程中,如何融合背景场和观测数据以生成准确的初始场;3) 缺乏标准化基准数据集,难以公平评估不同数据驱动DA算法的性能;4) 如何在端到端数据驱动天气预测系统中实现数据同化算法与预测模型的集成。为了解决这些挑战,DABench提供了模拟观测数据、标准化评估指标以及强大的基线模型,为研究人员提供了一个用于开发和比较DA模型的平台。
常用场景
经典使用场景
DABench数据集作为数据驱动天气数据同化领域的标准测试集,其经典使用场景主要在于为研究者提供一个统一的平台,以便于评估和比较不同数据驱动同化算法的性能。该数据集包含了稀疏和噪声模拟观测数据、预训练的天气预报模型生成的背景场、标准化评估指标以及一个强大的基线模型DA Transformer (DaT)。研究者可以利用DABench来开发自己的模型,并与已建立的基线进行比较,从而推动数据驱动天气预报系统的发展。
衍生相关工作
DABench数据集的发布推动了数据驱动同化领域的研究。基于DABench,研究者开发出了多种新的同化算法,如DaT、4DVarNet等。这些算法在物理状态重建方面表现出色,为天气预报提供了更加精确的初始场。此外,DABench还促进了数据驱动同化领域与其他领域的交叉研究,如与远程 sensing 图像融合、自动驾驶等领域的研究。这些研究有助于提高数据驱动同化算法的鲁棒性和准确性,推动天气预报技术的进一步发展。
数据集最近研究
最新研究方向
DABench数据集的提出填补了数据驱动天气数据同化领域的空白,为研究人员提供了一个标准化的基准数据集,用于评估和比较不同的数据驱动同化算法。该数据集结合了稀疏和噪声模拟观测数据、预训练的天气预测模型、标准化评估指标和一个强大的基线模型DA Transformer (DaT),为数据驱动天气预测系统的开发提供了有力支持。DaT模型在物理状态重建方面超越了现有的最先进模型4DVarNet,显示出其在处理数据同化任务方面的潜力。未来研究可以探索生成模型在稀疏和噪声观测条件下的训练,以及结合传统数据同化方法与大型天气模型(LWMs)的框架开发。DABench数据集的发布将促进数据驱动天气预测系统的进一步发展,提高天气预测的准确性和可靠性。
相关研究论文
  • 1
    DABench: A Benchmark Dataset for Data-Driven Weather Data Assimilation国防科技大学, 长沙, 中国 上海人工智能实验室, 上海, 中国 · 2024年
以上内容由AI搜集并总结生成
用户留言
有没有相关的论文或文献参考?
这个数据集是基于什么背景创建的?
数据集的作者是谁?
能帮我联系到这个数据集的作者吗?
这个数据集如何下载?
点击留言
数据主题
具身智能
数据集  4098个
机构  8个
大模型
数据集  439个
机构  10个
无人机
数据集  37个
机构  6个
指令微调
数据集  36个
机构  6个
蛋白质结构
数据集  50个
机构  8个
空间智能
数据集  21个
机构  5个
5,000+
优质数据集
54 个
任务类型
进入经典数据集
热门数据集

中国区域交通网络数据集

该数据集包含中国各区域的交通网络信息,包括道路、铁路、航空和水路等多种交通方式的网络结构和连接关系。数据集详细记录了各交通节点的位置、交通线路的类型、长度、容量以及相关的交通流量信息。

data.stats.gov.cn 收录

中国食物成分数据库

食物成分数据比较准确而详细地描述农作物、水产类、畜禽肉类等人类赖以生存的基本食物的品质和营养成分含量。它是一个重要的我国公共卫生数据和营养信息资源,是提供人类基本需求和基本社会保障的先决条件;也是一个国家制定相关法规标准、实施有关营养政策、开展食品贸易和进行营养健康教育的基础,兼具学术、经济、社会等多种价值。 本数据集收录了基于2002年食物成分表的1506条食物的31项营养成分(含胆固醇)数据,657条食物的18种氨基酸数据、441条食物的32种脂肪酸数据、130条食物的碘数据、114条食物的大豆异黄酮数据。

国家人口健康科学数据中心 收录

Canadian Census

**Overview** The data package provides demographics for Canadian population groups according to multiple location categories: Forward Sortation Areas (FSAs), Census Metropolitan Areas (CMAs) and Census Agglomerations (CAs), Federal Electoral Districts (FEDs), Health Regions (HRs) and provinces. **Description** The data are available through the Canadian Census and the National Household Survey (NHS), separated or combined. The main demographic indicators provided for the population groups, stratified not only by location but also for the majority by demographical and socioeconomic characteristics, are population number, females and males, usual residents and private dwellings. The primary use of the data at the Health Region level is for health surveillance and population health research. Federal and provincial departments of health and human resources, social service agencies, and other types of government agencies use the information to monitor, plan, implement and evaluate programs to improve the health of Canadians and the efficiency of health services. Researchers from various fields use the information to conduct research to improve health. Non-profit health organizations and the media use the health region data to raise awareness about health, an issue of concern to all Canadians. The Census population counts for a particular geographic area representing the number of Canadians whose usual place of residence is in that area, regardless of where they happened to be on Census Day. Also included are any Canadians who were staying in that area on Census Day and who had no usual place of residence elsewhere in Canada, as well as those considered to be 'non-permanent residents'. National Household Survey (NHS) provides demographic data for various levels of geography, including provinces and territories, census metropolitan areas/census agglomerations, census divisions, census subdivisions, census tracts, federal electoral districts and health regions. In order to provide a comprehensive overview of an area, this product presents data from both the NHS and the Census. NHS data topics include immigration and ethnocultural diversity; aboriginal peoples; education and labor; mobility and migration; language of work; income and housing. 2011 Census data topics include population and dwelling counts; age and sex; families, households and marital status; structural type of dwelling and collectives; and language. The data are collected for private dwellings occupied by usual residents. A private dwelling is a dwelling in which a person or a group of persons permanently reside. Information for the National Household Survey does not include information for collective dwellings. Collective dwellings are dwellings used for commercial, institutional or communal purposes, such as a hotel, a hospital or a work camp. **Benefits** - Useful for canada public health stakeholders, for public health specialist or specialized public and other interested parties. for health surveillance and population health research. for monitoring, planning, implementation and evaluation of health-related programs. media agencies may use the health regions data to raise awareness about health, an issue of concern to all canadians. giving the addition of longitude and latitude in some of the datasets the data can be useful to transpose the values into geographical representations. the fields descriptions along with the dataset description are useful for the user to quickly understand the data and the dataset. **License Information** The use of John Snow Labs datasets is free for personal and research purposes. For commercial use please subscribe to the [Data Library](https://www.johnsnowlabs.com/marketplace/) on John Snow Labs website. The subscription will allow you to use all John Snow Labs datasets and data packages for commercial purposes. **Included Datasets** - [Canadian Population and Dwelling by FSA 2011](https://www.johnsnowlabs.com/marketplace/canadian-population-and-dwelling-by-fsa-2011) - This Canadian Census dataset covers data on population, total private dwellings and private dwellings occupied by usual residents by forward sortation area (FSA). It is enriched with the percentage of the population or dwellings versus the total amount as well as the geographical area, province, and latitude and longitude. The whole Canada's population is marked as 100, referring to 100% for the percentages. - [Detailed Canadian Population Statistics by CMAs and CAs 2011](https://www.johnsnowlabs.com/marketplace/detailed-canadian-population-statistics-by-cmas-and-cas-2011) - This dataset covers the population statistics of Canada by Census Metropolitan Areas (CMAs) and Census Agglomerations (CAs). It is categorized also by citizen/immigration status, ethnic origin, religion, mobility, education, language, work, housing, income etc. There is detailed characteristics categorization within these stated categories that are in 5 layers. - [Detailed Canadian Population Statistics by FED 2011](https://www.johnsnowlabs.com/marketplace/detailed-canadian-population-statistics-by-fed-2011) - This dataset covers the population statistics of Canada from 2011 by Federal Electoral District of 2013 Representation Order. It is categorized also by citizen/immigration status, ethnic origin, religion, mobility, education, language, work, housing, income etc. There is detailed characteristics categorization within these stated categories that are in 5 layers. - [Detailed Canadian Population Statistics by Health Region 2011](https://www.johnsnowlabs.com/marketplace/detailed-canadian-population-statistics-by-health-region-2011) - This dataset covers the population statistics of Canada by health region. It is categorized also by citizen/immigration status, ethnic origin, religion, mobility, education, language, work, housing, income etc. There is detailed characteristics categorization within these stated categories that are in 5 layers. - [Detailed Canadian Population Statistics by Province 2011](https://www.johnsnowlabs.com/marketplace/detailed-canadian-population-statistics-by-province-2011) - This dataset covers the population statistics of Canada by provinces and territories. It is categorized also by citizen/immigration status, ethnic origin, religion, mobility, education, language, work, housing, income etc. There is detailed characteristics categorization within these stated categories that are in 5 layers. **Data Engineering Overview** **We deliver high-quality data** - Each dataset goes through 3 levels of quality review - 2 Manual reviews are done by domain experts - Then, an automated set of 60+ validations enforces every datum matches metadata & defined constraints - Data is normalized into one unified type system - All dates, unites, codes, currencies look the same - All null values are normalized to the same value - All dataset and field names are SQL and Hive compliant - Data and Metadata - Data is available in both CSV and Apache Parquet format, optimized for high read performance on distributed Hadoop, Spark & MPP clusters - Metadata is provided in the open Frictionless Data standard, and its every field is normalized & validated - Data Updates - Data updates support replace-on-update: outdated foreign keys are deprecated, not deleted **Our data is curated and enriched by domain experts** Each dataset is manually curated by our team of doctors, pharmacists, public health & medical billing experts: - Field names, descriptions, and normalized values are chosen by people who actually understand their meaning - Healthcare & life science experts add categories, search keywords, descriptions and more to each dataset - Both manual and automated data enrichment supported for clinical codes, providers, drugs, and geo-locations - The data is always kept up to date – even when the source requires manual effort to get updates - Support for data subscribers is provided directly by the domain experts who curated the data sets - Every data source’s license is manually verified to allow for royalty-free commercial use and redistribution. **Need Help?** If you have questions about our products, contact us at [info@johnsnowlabs.com](mailto:info@johnsnowlabs.com).

Databricks 收录

GME Data

关于2021年GameStop股票活动的数据,包括每日合并的GME短期成交量数据、每日失败交付数据、可借股数、期权链数据以及不同时间框架的开盘/最高/最低/收盘/成交量条形图。

github 收录

中国农村金融统计数据

该数据集包含了中国农村金融的统计信息,涵盖了农村金融机构的数量、贷款余额、存款余额、金融服务覆盖率等关键指标。数据按年度和地区分类,提供了详细的农村金融发展状况。

www.pbc.gov.cn 收录