Part 2 of real-time testing data for: "Identifying data sources and physical strategies used by neural networks to predict TC rapid intensification"
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/13272876
下载链接
链接失效反馈官方服务:
资源简介:
Each file in the dataset contains machine-learning-ready data for one unique tropical cyclone (TC) from the real-time testing dataset. "Machine-learning-ready" means that all data-processing methods described in the journal paper have already been applied. This includes cropping satellite images to make them TC-centered; rotating satellite images to align them with TC motion (TC motion is always towards the +x-direction, or in the direction of increasing column number); flipping satellite images in the southern hemisphere upside-down; and normalizing data via the two-step procedure.
The file name gives you the unique identifier of the TC -- e.g., "learning_examples_2010AL01.nc.gz" contains data for storm 2010AL01, or the first North Atlantic storm of the 2010 season. Each file can be read with the method `example_io.read_file` in the ml4tc Python library (https://zenodo.org/doi/10.5281/zenodo.10268620). However, since `example_io.read_file` is a lightweight wrapper for `xarray.open_dataset`, you can equivalently just use `xarray.open_dataset`. Variables in the table are listed below (the same printout produced by `print(xarray_table)`):
Dimensions: ( satellite_valid_time_unix_sec: 289, satellite_grid_row: 380, satellite_grid_column: 540, satellite_predictor_name_gridded: 1, satellite_predictor_name_ungridded: 16, ships_valid_time_unix_sec: 19, ships_storm_object_index: 19, ships_forecast_hour: 23, ships_intensity_threshold_m_s01: 21, ships_lag_time_hours: 5, ships_predictor_name_lagged: 17, ships_predictor_name_forecast: 129)Coordinates: * satellite_grid_row (satellite_grid_row) int32 2kB ... * satellite_grid_column (satellite_grid_column) int32 2kB ... * satellite_valid_time_unix_sec (satellite_valid_time_unix_sec) int32 1kB ... * ships_lag_time_hours (ships_lag_time_hours) float64 40B ... * ships_intensity_threshold_m_s01 (ships_intensity_threshold_m_s01) float64 168B ... * ships_forecast_hour (ships_forecast_hour) int32 92B ... * satellite_predictor_name_gridded (satellite_predictor_name_gridded) object 8B ... * satellite_predictor_name_ungridded (satellite_predictor_name_ungridded) object 128B ... * ships_valid_time_unix_sec (ships_valid_time_unix_sec) int32 76B ... * ships_predictor_name_lagged (ships_predictor_name_lagged) object 136B ... * ships_predictor_name_forecast (ships_predictor_name_forecast) object 1kB ...Dimensions without coordinates: ships_storm_object_indexData variables: satellite_number (satellite_valid_time_unix_sec) int32 1kB ... satellite_band_number (satellite_valid_time_unix_sec) int32 1kB ... satellite_band_wavelength_micrometres (satellite_valid_time_unix_sec) float64 2kB ... satellite_longitude_deg_e (satellite_valid_time_unix_sec) float64 2kB ... satellite_cyclone_id_string (satellite_valid_time_unix_sec) |S8 2kB ... satellite_storm_type_string (satellite_valid_time_unix_sec) |S2 578B ... satellite_storm_name (satellite_valid_time_unix_sec) |S10 3kB ... satellite_storm_latitude_deg_n (satellite_valid_time_unix_sec) float64 2kB ... satellite_storm_longitude_deg_e (satellite_valid_time_unix_sec) float64 2kB ... satellite_storm_intensity_number (satellite_valid_time_unix_sec) float64 2kB ... satellite_storm_u_motion_m_s01 (satellite_valid_time_unix_sec) float64 2kB ... satellite_storm_v_motion_m_s01 (satellite_valid_time_unix_sec) float64 2kB ... satellite_predictors_gridded (satellite_valid_time_unix_sec, satellite_grid_row, satellite_grid_column, satellite_predictor_name_gridded) float64 474MB ... satellite_grid_latitude_deg_n (satellite_valid_time_unix_sec, satellite_grid_row, satellite_grid_column) float64 474MB ... satellite_grid_longitude_deg_e (satellite_valid_time_unix_sec, satellite_grid_row, satellite_grid_column) float64 474MB ... satellite_predictors_ungridded (satellite_valid_time_unix_sec, satellite_predictor_name_ungridded) float64 37kB ... ships_storm_intensity_m_s01 (ships_valid_time_unix_sec) float64 152B ... ships_storm_type_enum (ships_storm_object_index, ships_forecast_hour) int32 2kB ... ships_forecast_latitude_deg_n (ships_storm_object_index, ships_forecast_hour) float64 3kB ... ships_forecast_longitude_deg_e (ships_storm_object_index, ships_forecast_hour) float64 3kB ... ships_v_wind_200mb_0to500km_m_s01 (ships_storm_object_index, ships_forecast_hour) float64 3kB ... ships_vorticity_850mb_0to1000km_s01 (ships_storm_object_index, ships_forecast_hour) float64 3kB ... ships_vortex_latitude_deg_n (ships_storm_object_index, ships_forecast_hour) float64 3kB ... ships_vortex_longitude_deg_e (ships_storm_object_index, ships_forecast_hour) float64 3kB ... ships_mean_tangential_wind_850mb_0to600km_m_s01 (ships_storm_object_index, ships_forecast_hour) float64 3kB ... ships_max_tangential_wind_850mb_m_s01 (ships_storm_object_index, ships_forecast_hour) float64 3kB ... ships_mean_tangential_wind_1000mb_at500km_m_s01 (ships_storm_object_index, ships_forecast_hour) float64 3kB ... ships_mean_tangential_wind_850mb_at500km_m_s01 (ships_storm_object_index, ships_forecast_hour) float64 3kB ... ships_mean_tangential_wind_500mb_at500km_m_s01 (ships_storm_object_index, ships_forecast_hour) float64 3kB ... ships_mean_tangential_wind_300mb_at500km_m_s01 (ships_storm_object_index, ships_forecast_hour) float64 3kB ... ships_srh_1000to700mb_200to800km_j_kg01 (ships_storm_object_index, ships_forecast_hour) float64 3kB ... ships_srh_1000to500mb_200to800km_j_kg01 (ships_storm_object_index, ships_forecast_hour) float64 3kB ... ships_threshold_exceedance_num_6hour_periods (ships_storm_object_index, ships_intensity_threshold_m_s01) int32 2kB ... ships_v_motion_observed_m_s01 (ships_storm_object_index) float64 152B ... ships_v_motion_1000to100mb_flow_m_s01 (ships_storm_object_index) float64 152B ... ships_v_motion_optimal_flow_m_s01 (ships_storm_object_index) float64 152B ... ships_cyclone_id_string (ships_storm_object_index) object 152B ... ships_storm_latitude_deg_n (ships_storm_object_index) float64 152B ... ships_storm_longitude_deg_e (ships_storm_object_index) float64 152B ... ships_predictors_lagged (ships_valid_time_unix_sec, ships_lag_time_hours, ships_predictor_name_lagged) float64 13kB ... ships_predictors_forecast (ships_valid_time_unix_sec, ships_forecast_hour, ships_predictor_name_forecast) float64 451kB ...
Variable names are meant to be as self-explanatory as possible. Potentially confusing ones are listed below.
The dimension ships_storm_object_index is redundant with the dimension ships_valid_time_unix_sec and can be ignored.
ships_forecast_hour ranges up to values that we do not actually use in the paper. Keep in mind that our max forecast hour used in machine learning is 24.
The dimension ships_intensity_threshold_m_s01 (and any variable including this dimension) can be ignored.
ships_lag_time_hours corresponds to lag times for the SHIPS satellite-based predictors. The only lag time we use in machine learning is "NaN", which is a stand-in for the best available of all lag times. See the discussion of the "priority list" in the paper for more details.
Most of the data variables can be ignored, unless you're doing a deep dive into storm properties. The important variables are satellite_predictors_gridded (full satellite images), ships_predictors_lagged (satellite-based SHIPS predictors), and ships_predictors_forecast (environmental and storm-history-based SHIPS predictors). These variables are all discussed in the paper.
Every variable name (including elements of the coordinate lists ships_predictor_name_lagged and ships_predictor_name_forecast) includes units at the end. For example, "m_s01" = metres per second; "deg_n" = degrees north; "deg_e" = degrees east; "j_kg01" = Joules per kilogram; ...; etc.
本数据集中的每个文件均对应实时测试数据集里单个独特热带气旋(tropical cyclone, TC)的机器学习就绪(machine-learning-ready)数据。所谓“机器学习就绪”,指的是已完整应用该期刊论文中所述的全部数据预处理流程,具体包括:将卫星图像裁剪为以热带气旋为中心的画幅;旋转卫星图像,使其与气旋移动方向对齐(气旋移动方向始终沿+x轴方向,即列号递增的方向);对南半球的卫星图像进行上下翻转;以及通过两步流程完成数据归一化。
文件名包含该热带气旋的唯一标识符——例如,`learning_examples_2010AL01.nc.gz` 包含2010AL01号风暴(即2010年大西洋飓风季首个风暴)的数据。所有文件均可通过ml4tc Python库中的`example_io.read_file`方法读取(访问地址:https://zenodo.org/doi/10.5281/zenodo.10268620)。由于`example_io.read_file`仅是对`xarray.open_dataset`的轻量封装,因此也可直接使用`xarray.open_dataset`进行读取。数据集中的变量如下(与`print(xarray_table)`输出的内容一致):
维度:
* 卫星有效时间(Unix秒级时间戳,satellite_valid_time_unix_sec):289
* 卫星网格行(satellite_grid_row):380
* 卫星网格列(satellite_grid_column):540
* 格点型卫星预报因子名称(satellite_predictor_name_gridded):1
* 非格点型卫星预报因子名称(satellite_predictor_name_ungridded):16
* SHIPS(Statistical Hurricane Intensity Prediction Scheme,热带气旋强度统计预报方案)有效时间(Unix秒级时间戳,ships_valid_time_unix_sec):19
* SHIPS风暴对象索引(ships_storm_object_index):19
* SHIPS预报时效(ships_forecast_hour):23
* SHIPS强度阈值(米每秒,ships_intensity_threshold_m_s01):21
* SHIPS滞后时长(小时,ships_lag_time_hours):5
* 滞后式SHIPS预报因子名称(ships_predictor_name_lagged):17
* 预报式SHIPS预报因子名称(ships_predictor_name_forecast):129
坐标变量:
* 卫星网格行(satellite_grid_row):维度为(satellite_grid_row),类型为int32,占用2kB存储空间……
* 卫星网格列(satellite_grid_column):维度为(satellite_grid_column),类型为int32,占用2kB存储空间……
* 卫星有效时间(Unix秒级时间戳,satellite_valid_time_unix_sec):维度为(satellite_valid_time_unix_sec),类型为int32,占用1kB存储空间……
* SHIPS滞后时长(小时,ships_lag_time_hours):维度为(ships_lag_time_hours),类型为float64,占用40B存储空间……
* SHIPS强度阈值(米每秒,ships_intensity_threshold_m_s01):维度为(ships_intensity_threshold_m_s01),类型为float64,占用168B存储空间……
* SHIPS预报时效(ships_forecast_hour):维度为(ships_forecast_hour),类型为int32,占用92B存储空间……
* 格点型卫星预报因子名称(satellite_predictor_name_gridded):维度为(satellite_predictor_name_gridded),类型为object,占用8B存储空间……
* 非格点型卫星预报因子名称(satellite_predictor_name_ungridded):维度为(satellite_predictor_name_ungridded),类型为object,占用128B存储空间……
* SHIPS有效时间(Unix秒级时间戳,ships_valid_time_unix_sec):维度为(ships_valid_time_unix_sec),类型为int32,占用76B存储空间……
* 滞后式SHIPS预报因子名称(ships_predictor_name_lagged):维度为(ships_predictor_name_lagged),类型为object,占用136B存储空间……
* 预报式SHIPS预报因子名称(ships_predictor_name_forecast):维度为(ships_predictor_name_forecast),类型为object,占用1kB存储空间……
无关联坐标的维度:ships_storm_object_index
数据变量:
* 卫星编号(satellite_number):维度为(satellite_valid_time_unix_sec),类型为int32,占用1kB存储空间……
* 卫星波段编号(satellite_band_number):维度为(satellite_valid_time_unix_sec),类型为int32,占用1kB存储空间……
* 卫星波段波长(微米,satellite_band_wavelength_micrometres):维度为(satellite_valid_time_unix_sec),类型为float64,占用2kB存储空间……
* 卫星经度(东经度数,satellite_longitude_deg_e):维度为(satellite_valid_time_unix_sec),类型为float64,占用2kB存储空间……
* 热带气旋ID字符串(satellite_cyclone_id_string):维度为(satellite_valid_time_unix_sec),类型为|S8,占用2kB存储空间……
* 风暴类型字符串(satellite_storm_type_string):维度为(satellite_valid_time_unix_sec),类型为|S2,占用578B存储空间……
* 风暴名称(satellite_storm_name):维度为(satellite_valid_time_unix_sec),类型为|S10,占用3kB存储空间……
* 风暴纬度(北纬度数,satellite_storm_latitude_deg_n):维度为(satellite_valid_time_unix_sec),类型为float64,占用2kB存储空间……
* 风暴经度(东经度数,satellite_storm_longitude_deg_e):维度为(satellite_valid_time_unix_sec),类型为float64,占用2kB存储空间……
* 风暴强度数值(satellite_storm_intensity_number):维度为(satellite_valid_time_unix_sec),类型为float64,占用2kB存储空间……
* 风暴纬向移动速度(米每秒,satellite_storm_u_motion_m_s01):维度为(satellite_valid_time_unix_sec),类型为float64,占用2kB存储空间……
* 风暴经向移动速度(米每秒,satellite_storm_v_motion_m_s01):维度为(satellite_valid_time_unix_sec),类型为float64,占用2kB存储空间……
* 格点型卫星预报因子(satellite_predictors_gridded):维度为(satellite_valid_time_unix_sec, satellite_grid_row, satellite_grid_column, satellite_predictor_name_gridded),类型为float64,占用474MB存储空间……
* 卫星网格纬度(北纬度数,satellite_grid_latitude_deg_n):维度为(satellite_valid_time_unix_sec, satellite_grid_row, satellite_grid_column),类型为float64,占用474MB存储空间……
* 卫星网格经度(东经度数,satellite_grid_longitude_deg_e):维度为(satellite_valid_time_unix_sec, satellite_grid_row, satellite_grid_column),类型为float64,占用474MB存储空间……
* 非格点型卫星预报因子(satellite_predictors_ungridded):维度为(satellite_valid_time_unix_sec, satellite_predictor_name_ungridded),类型为float64,占用37kB存储空间……
* SHIPS风暴强度(米每秒,ships_storm_intensity_m_s01):维度为(ships_valid_time_unix_sec),类型为float64,占用152B存储空间……
* SHIPS风暴类型枚举值(ships_storm_type_enum):维度为(ships_storm_object_index, ships_forecast_hour),类型为int32,占用2kB存储空间……
* SHIPS预报纬度(北纬度数,ships_forecast_latitude_deg_n):维度为(ships_storm_object_index, ships_forecast_hour),类型为float64,占用3kB存储空间……
* SHIPS预报经度(东经度数,ships_forecast_longitude_deg_e):维度为(ships_storm_object_index, ships_forecast_hour),类型为float64,占用3kB存储空间……
* SHIPS 200百帕0-500公里范围经向风速(米每秒,ships_v_wind_200mb_0to500km_m_s01):维度为(ships_storm_object_index, ships_forecast_hour),类型为float64,占用3kB存储空间……
* SHIPS 850百帕0-1000公里范围涡度(秒的负一次方,ships_vorticity_850mb_0to1000km_s01):维度为(ships_storm_object_index, ships_forecast_hour),类型为float64,占用3kB存储空间……
* SHIPS风暴中心纬度(北纬度数,ships_vortex_latitude_deg_n):维度为(ships_storm_object_index, ships_forecast_hour),类型为float64,占用3kB存储空间……
* SHIPS风暴中心经度(东经度数,ships_vortex_longitude_deg_e):维度为(ships_storm_object_index, ships_forecast_hour),类型为float64,占用3kB存储空间……
* SHIPS 850百帕0-600公里范围平均切向风速(米每秒,ships_mean_tangential_wind_850mb_0to600km_m_s01):维度为(ships_storm_object_index, ships_forecast_hour),类型为float64,占用3kB存储空间……
* SHIPS 850百帕最大切向风速(米每秒,ships_max_tangential_wind_850mb_m_s01):维度为(ships_storm_object_index, ships_forecast_hour),类型为float64,占用3kB存储空间……
* SHIPS 1000百帕500公里处平均切向风速(米每秒,ships_mean_tangential_wind_1000mb_at500km_m_s01):维度为(ships_storm_object_index, ships_forecast_hour),类型为float64,占用3kB存储空间……
* SHIPS 850百帕500公里处平均切向风速(米每秒,ships_mean_tangential_wind_850mb_at500km_m_s01):维度为(ships_storm_object_index, ships_forecast_hour),类型为float64,占用3kB存储空间……
* SHIPS 500百帕500公里处平均切向风速(米每秒,ships_mean_tangential_wind_500mb_at500km_m_s01):维度为(ships_storm_object_index, ships_forecast_hour),类型为float64,占用3kB存储空间……
* SHIPS 300百帕500公里处平均切向风速(米每秒,ships_mean_tangential_wind_300mb_at500km_m_s01):维度为(ships_storm_object_index, ships_forecast_hour),类型为float64,占用3kB存储空间……
* SHIPS 1000-700百帕200-800公里范围风暴相对螺旋度(焦耳每千克,ships_srh_1000to700mb_200to800km_j_kg01):维度为(ships_storm_object_index, ships_forecast_hour),类型为float64,占用3kB存储空间……
* SHIPS 1000-500百帕200-800公里范围风暴相对螺旋度(焦耳每千克,ships_srh_1000to500mb_200to800km_j_kg01):维度为(ships_storm_object_index, ships_forecast_hour),类型为float64,占用3kB存储空间……
* SHIPS 6小时周期内阈值超标次数(ships_threshold_exceedance_num_6hour_periods):维度为(ships_storm_object_index, ships_intensity_threshold_m_s01),类型为int32,占用2kB存储空间……
* SHIPS观测经向移动速度(米每秒,ships_v_motion_observed_m_s01):维度为(ships_storm_object_index),类型为float64,占用152B存储空间……
* SHIPS 1000-100百帕环境流场经向移动速度(米每秒,ships_v_motion_1000to100mb_flow_m_s01):维度为(ships_storm_object_index),类型为float64,占用152B存储空间……
* SHIPS最优流场经向移动速度(米每秒,ships_v_motion_optimal_flow_m_s01):维度为(ships_storm_object_index),类型为float64,占用152B存储空间……
* SHIPS热带气旋ID字符串(ships_cyclone_id_string):维度为(ships_storm_object_index),类型为object,占用152B存储空间……
* SHIPS风暴纬度(北纬度数,ships_storm_latitude_deg_n):维度为(ships_storm_object_index),类型为float64,占用152B存储空间……
* SHIPS风暴经度(东经度数,ships_storm_longitude_deg_e):维度为(ships_storm_object_index),类型为float64,占用152B存储空间……
* 滞后式SHIPS预报因子(ships_predictors_lagged):维度为(ships_valid_time_unix_sec, ships_lag_time_hours, ships_predictor_name_lagged),类型为float64,占用13kB存储空间……
* 预报式SHIPS预报因子(ships_predictors_forecast):维度为(ships_valid_time_unix_sec, ships_forecast_hour, ships_predictor_name_forecast),类型为float64,占用451kB存储空间……
变量名称已尽可能做到自解释,部分易产生混淆的变量说明如下:
维度ships_storm_object_index与ships_valid_time_unix_sec存在冗余,可忽略不计。
SHIPS预报时效的取值范围超出了本论文实际使用的范围,需注意:本研究在机器学习任务中仅使用最大24小时的预报时效。
维度ships_intensity_threshold_m_s01(及所有包含该维度的变量)均可忽略。
ships_lag_time_hours对应SHIPS卫星基预报因子的滞后时长,本研究在机器学习任务中仅使用取值为“NaN”的滞后时长,该值为所有可用滞后时长中最优值的占位符,详细说明请参见论文中关于“优先级列表”的讨论。
多数数据变量仅在深入研究风暴属性时才需使用,核心变量包括:格点型卫星预报因子(satellite_predictors_gridded,完整卫星图像)、滞后式SHIPS预报因子(ships_predictors_lagged,基于卫星的SHIPS预报因子)以及预报式SHIPS预报因子(ships_predictors_forecast,基于环境场与风暴历史的SHIPS预报因子),上述变量均在论文中进行了详细讨论。
所有变量名称(包括坐标列表ships_predictor_name_lagged与ships_predictor_name_forecast的元素)末尾均标注了单位。例如,“m_s01”表示米每秒;“deg_n”表示北纬度数;“deg_e”表示东经度数;“j_kg01”表示焦耳每千克;其余以此类推。
创建时间:
2024-08-08



