surya-bench-flare-forecasting
收藏魔搭社区2025-12-13 更新2025-09-20 收录
下载链接:
https://modelscope.cn/datasets/nasa-ibm-ai4science/surya-bench-flare-forecasting
下载链接
链接失效反馈官方服务:
资源简介:
# Full-disk Solar Flare Forecasting Dataset
## Dataset Summary
This dataset provides labels for solar flare forecasting derived from NOAA GOES flare events from May 2010 to December 2024. Labels are constructed using a 24h rolling prediction window sampled at an hourly cadence. Each window is annotated with both **max GOES class** (based on peak X-ray flux) and **cumulative flare index**.
Two derived binary labels are included for forecasting tasks:
- **`label_max`**: 1 if the maximum flare intensity in the window is ≥ M1.0.
- **`label_cum`**: 1 if the cumulative flare intensity in the window is ≥ 10.
For completeness, we also include (1) max GOES class, which is determined from the peak X-ray flux of the most intense flare in the prediction window; and (2) cumulative index determined from all ≥C-class flares in the prediction window.
## Supported Tasks and Applications
- `binary-classification`: Predict whether a time window will contain significant flaring activity.
`ordinal-classification`: Predict flare-class of a given instance.
- `regression`: Predict cumulative flare index of a given instance.
## Dataset Structure
### Data Files
- `train.csv`: Instances from Feb 15 to Dec 31 in each year between 2010–2019
- `validation.csv`: Instances from Jan 15–31 of each year between 2010–2019
- `test.csv`: All instances from each year between 2020–2024
- `leaky_validation.csv`: Instances from Jan 1–14 and Feb 1-14 of each year between 2010–2019
### Features
Each record includes four label fields:
- **`max_goes_class`**: Maximum GOES flare class (e.g., C5.2, M1.0, X3.2) in the prediction window, or `FQ` if no flares are present.
- **`cumulative_index`**: Weighted sum of flare subclasses ≥C-class in the prediction window.
- C-class contributes weight ×1, M-class ×10, X-class ×100.
- For example, an M2.0 flare adds 20, while an X3.5 flare adds 350.
- **`label_max`**: Binary label, 1 if `goes_class` ≥ M1.0, else 0.
- **`label_cum`**: Binary label, 1 if `cumulative_index` ≥ 10, else 0.
Example entry (in JSON format):
```json
{
"timestamp": "2011-02-14 03:00:00",
"goes_class": "X2.2",
"cumulative_index": 297.1,
"label_max": 1,
"label_cum": 1
}
```
## Dataset Details
| Field | Description |
|------------------------|---------------------------------------------|
| **Temporal Coverage** | May 13, 2010 – Dec 31, 2024 |
| **Data Format** | CSV (.csv), string-based schema |
| **Data Shape** | (1, 4) per instance |
| **Data Size** | Total 128,328 instances |
| **Cadence** | 1 hour |
| **Total File Size** | ~4.4MB |
## Authors
- Jinsu Hong, [jhong36@gsu.edu](mailto:jhong36@gsu.edu)
- Kang Yang, [kyang30@gsu.edu](mailto:kyang30@gsu.edu)
- Berkay Aydin, [baydin2@gsu.edu](mailto:baydin2@gsu.edu)
# 全日面太阳耀斑预测数据集
## 数据集概述
本数据集提供基于2010年5月至2024年12月美国国家海洋和大气管理局(National Oceanic and Atmospheric Administration, NOAA)地球静止业务环境卫星(Geostationary Operational Environmental Satellite, GOES)耀斑事件构建的太阳耀斑预测标签。标签通过以小时为采样频率的24小时滚动预测窗口生成,每个窗口均标注了**最大GOES耀斑等级**(基于峰值X射线通量)与**累积耀斑指数**。
此外还包含两个用于预测任务的衍生二元标签:
- **`label_max`**:若窗口内最大耀斑强度≥M1.0,则标记为1,否则为0。
- **`label_cum`**:若窗口内累积耀斑强度≥10,则标记为1,否则为0。
为保证数据集完整性,我们同时收录了:(1) 最大GOES耀斑等级:由预测窗口内最强耀斑的峰值X射线通量确定;(2) 累积指数:由预测窗口内所有≥C级耀斑计算得到。
## 支持任务与应用场景
- **二元分类**:预测某一时间窗口是否会出现显著耀斑活动。
- **有序分类**:预测给定样本的耀斑等级。
- **回归任务**:预测给定样本的累积耀斑指数。
## 数据集结构
### 数据文件
- `train.csv`:2010–2019年间每年2月15日至12月31日的样本
- `validation.csv`:2010–2019年间每年1月15日至31日的样本
- `test.csv`:2020–2024年间每年全部样本
- `leaky_validation.csv`:2010–2019年间每年1月1日至14日以及2月1日至14日的样本
### 特征说明
每条记录包含四个标签字段:
- **`max_goes_class`**:预测窗口内的最大GOES耀斑等级(例如C5.2、M1.0、X3.2),若无耀斑则标记为`FQ`。
- **`cumulative_index`**:预测窗口内所有≥C级耀斑的加权求和值。
- C级耀斑权重系数为1,M级为10,X级为100。
- 示例:M2.0耀斑贡献值为20,X3.5耀斑贡献值为350。
- **`label_max`**:二元标签,若`goes_class`≥M1.0则为1,否则为0。
- **`label_cum`**:二元标签,若`cumulative_index`≥10则为1,否则为0。
示例条目(JSON格式):
json
{
"timestamp": "2011-02-14 03:00:00",
"goes_class": "X2.2",
"cumulative_index": 297.1,
"label_max": 1,
"label_cum": 1
}
## 数据集详情
| 字段 | 说明 |
|------------------------|---------------------------------------------|
| **时间覆盖范围** | 2010年5月13日 – 2024年12月31日 |
| **数据格式** | CSV(.csv),基于字符串的Schema |
| **单样本结构** | 每条样本含4个字段 |
| **数据规模** | 总计128,328条样本 |
| **采样频率** | 1小时 |
| **总文件大小** | 约4.4MB |
## 作者
- 洪珍秀,[jhong36@gsu.edu](mailto:jhong36@gsu.edu)
- 杨康,[kyang30@gsu.edu](mailto:kyang30@gsu.edu)
- 伯克·艾丁,[baydin2@gsu.edu](mailto:baydin2@gsu.edu)
提供机构:
maas
创建时间:
2025-08-21



