Atmospheric rivers dataset for machine learning training
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/12177338
下载链接
链接失效反馈官方服务:
资源简介:
A thorough description of the data and how it was created can be found: http://climate-cms.org/CNN-Atmospheric-Rivers/
A Jupyter notebook has also been created where we'll show you how to use this data to train a deep learning model to identify whether an Integrated Vapor Transport map contains an atmosperhic river. it can be found: CNN_AR_tutorial.ipynb and had been published here:
Mesto, M., Hobeichi, S., & Green, S. (2024). CNN-Atmospheric-Rivers (v1.0.0). Zenodo. https://doi.org/10.5281/zenodo.12538779
The data is organised in three folders:
IVT_ERA5_2Deg: Contains global IVT data.
AR_Global: Contains polygons representing AR objects identified and hand-labelled in each IVT map.
Training_Testing_tiles: Contains tiles of IVT data with an annotation file that classifies each tile as one of the following: ‘Atmospheric River’, ‘Ambiguous’. The ‘Ambiguous’ class refers to objects that are not clearly identifiable as atmospheric rivers.
Integrated Vapor Transport maps:
These maps were computed using the magnitude of the vertical integral of northward and eastward water vapour flux variables from ERA5. The IVT values are expressed in units of kg m^-1 s^-1. All the IVT TIFF files were loaded into ArcGIS software and displayed using a colour scheme that allows for the visual identification of atmospheric rivers. Data details:
Folder: IVT_ERA5_2Deg
File format: TIFF
Spatial resolution: 2 degrees
Spatial coverage: Global (longitude: -180 to 180 , latitude: -90 to 90)
Geographic Coordinate System: GCS_WGS_1984
Temporal coverage: 1st – 5th day of January, April, July, October for 2010, 2013, 2015; these years correspond to La Niña, neutral, and El Niño year respectively
Temporal resolution: Daily
Naming of files: ivt_2deg_ddmmyyyy.tif
Number of files: 60 (5 days × 4 months × 3 years)
Number of channels in each file: 1
Atmospheric Rivers in IVT maps:
The annotation tool ‘Label Objects for Deep Learning’ was used to draw polygons to cover the shape of atmospheric rivers on each IVT map. Each polygon was assigned one of two labels: 'Atmospheric Rivers' or 'Ambiguous'. The polygons were drawn based on visual identification of the shape of atmospheric rivers, guided by IVT values close to 500kg m^-1 s^-1 as in Reid et al (2020). The 'Ambiguous' label was assigned to objects that were unclear in their classification as ARs. This ambiguity arose from objects that were shorter, wider, had slightly lower IVT values, or it was hard to tell if they were ARs of tropical cyclones during the early stages of their formation. Data details:
Folder: AR_Global
File format: SHP (shapefile)
Spatial coverage: Global
Geographic Coordinate System: GCS_WGS_1984
Temporal coverage: 1st – 5th day of January, April, July, October for 2010, 2013, 2015 (corresponding to La Niña, neutral, and El Niño year respectively)
Temporal resolution: Daily
Naming of files: ivt_2deg_ddmmyyyy_labelled.shp
Number of files: 60 (5 days × 4 months × 3 years)
Dataset for deep learning training:
The tool 'Export Training Data for Deep Learning' uses the IVT maps in the 'AR_Global' folder and the shapefiles in the 'IVT_ERA5_2Deg' folder to create labelled tiles for deep learning training. Each tile in the map is assigned a label: 'Atmospheric River', 'Ambiguous', or no label if it doesn’t contain any AR or ambiguous shape. The generated map chips are stored in folder Training_Testing_tiles/ RCNN_Masks_All_Tiles (no 3-10-15)/images, and the labels are provided in the textfile 'map.txt' file in folder Training_Testing_tiles/ RCNN_Masks_All_Tiles (no 3-10-15)/. Data details:
Folder: Training_Testing_tiles/ RCNN_Masks_All_Tiles (no 3-10-15)/images
File format: TIFF
Spatial resolution: 2 degrees
Spatial coverage: varies. Width of tile = 40 gridcells. Height of tile = 20grid cells
Geographic Coordinate System: GCS_WGS_1984
Temporal coverage: 1st – 5th day of January, April, July, October for 2010, 2013, 2015. These years correspond to La Niña, neutral, and El Niño year respectively. Please note that data for certain days are missing; these omissions correspond to days with no or only a single atmospheric river detected.
Temporal resolution: Daily
Number of files: varies
Number of channels in each file: 1
本数据集的完整说明及构建流程可访问:http://climate-cms.org/CNN-Atmospheric-Rivers/
我们还制作了一份Jupyter Notebook,用于演示如何利用本数据集训练深度学习模型,以识别集成水汽输送(Integrated Vapor Transport, IVT)图中是否包含大气河流(Atmospheric River, AR)。该Notebook文件为CNN_AR_tutorial.ipynb,已发布于:
Mesto, M., Hobeichi, S., & Green, S. (2024). CNN-Atmospheric-Rivers (v1.0.0). Zenodo. https://doi.org/10.5281/zenodo.12538779
本数据集分为三个文件夹:
IVT_ERA5_2Deg:存储全球IVT数据。
AR_Global:存储各IVT图中经人工标注识别出的大气河流对象的多边形矢量数据。
Training_Testing_tiles:存储IVT数据切块及标注文件,标注文件将每个切块分为以下三类:‘大气河流’、‘模糊样本’。其中‘模糊样本’指无法明确判定为大气河流的对象。
### 集成水汽输送图
此类图由ERA5再分析数据中的北向、东向水汽通量变量的垂直积分幅值计算得到,IVT值单位为kg·m⁻¹·s⁻¹。所有IVT格式为TIFF的文件均导入ArcGIS软件,并通过配色方案可视化显示,以便人工识别大气河流。数据详情如下:
- 文件夹:IVT_ERA5_2Deg
- 文件格式:TIFF
- 空间分辨率:2°
- 空间覆盖范围:全球(经度:-180°~180°,纬度:-90°~90°)
- 地理坐标系:GCS_WGS_1984
- 时间覆盖范围:2010、2013、2015年的1月、4月、7月、10月的1日至5日,上述年份分别对应拉尼娜、中性、厄尔尼诺年
- 时间分辨率:每日
- 文件名格式:ivt_2deg_ddmmyyyy.tif
- 文件总数:60个(5天 × 4个月 × 3年)
- 单文件通道数:1
### IVT图中的大气河流
我们使用深度学习目标标注工具(Label Objects for Deep Learning)在每张IVT图上绘制覆盖大气河流形态的多边形矢量,并为每个多边形分配两类标签之一:‘大气河流’或‘模糊样本’。多边形的绘制基于大气河流的形态视觉识别,并参考Reid等人(2020)的标准,以IVT值接近500kg·m⁻¹·s⁻¹作为辅助依据。‘模糊样本’标签用于分配给无法明确归类为大气河流的对象,这类对象包括形态较短较宽、IVT值略低的特征,或是难以区分是否为形成初期的热带气旋相关大气河流的目标。数据详情如下:
- 文件夹:AR_Global
- 文件格式:SHP(shapefile)
- 空间覆盖范围:全球
- 地理坐标系:GCS_WGS_1984
- 时间覆盖范围:2010、2013、2015年的1月、4月、7月、10月的1日至5日(上述年份分别对应拉尼娜、中性、厄尔尼诺年)
- 时间分辨率:每日
- 文件名格式:ivt_2deg_ddmmyyyy_labelled.shp
- 文件总数:60个(5天 × 4个月 × 3年)
### 深度学习训练数据集
深度学习训练数据导出工具(Export Training Data for Deep Learning)利用‘AR_Global’文件夹中的IVT图与‘IVT_ERA5_2Deg’文件夹中的形状文件,生成用于深度学习训练的带标注数据切块。每张图切块将被赋予以下标签之一:‘大气河流’、‘模糊样本’,若切块未包含任何大气河流或模糊样本对象,则无标签。生成的图像切块存储于Training_Testing_tiles/ RCNN_Masks_All_Tiles (no 3-10-15)/images 文件夹中,标签信息存储于该文件夹下的文本文件map.txt中。数据详情如下:
- 文件夹:Training_Testing_tiles/ RCNN_Masks_All_Tiles (no 3-10-15)/images
- 文件格式:TIFF
- 空间分辨率:2°
- 空间覆盖范围:不固定,切块宽度为40个网格单元,高度为20个网格单元
- 地理坐标系:GCS_WGS_1984
- 时间覆盖范围:2010、2013、2015年的1月、4月、7月、10月的1日至5日(上述年份分别对应拉尼娜、中性、厄尔尼诺年)。请注意,部分日期的数据存在缺失,此类缺失对应当日未检测到或仅检测到一条大气河流的情况。
- 时间分辨率:每日
- 文件总数:不固定
- 单文件通道数:1
创建时间:
2024-06-26



