five

List of near-Mercury regions predicted by an LSTM machine learning model during the MESSENGER era

收藏
NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/10276909
下载链接
链接失效反馈
官方服务:
资源简介:
Output of the Smith & Jackman et al. [submitted, JGR, 2023] supervised machine learning approach to classify near-Mercury regions. The list spans the full near-Mercury exploration by the MESSENGER spacecraft from March 23rd 2011 to April 30th 2015, with 1 entry per second. The machine learning model is a Long Short Term Memory (LSTM) Recurrent Neural Network (RNN), trained on the list of magnetospheric regions which is available here: Sun, 2023 https://zenodo.org/records/8298647 and is discussed in Sun et al., 2020. We refer to this list as S2020 hereafter and it includes manual labels for the three key regions of (1) magnetosphere, (2) magnetosheath, (3) solar wind and also for two key transition regions of (4) magnetopause) and (5) bow shock. Our LSTM model returns probabilities for three key regions of (1) magnetosphere, (2) magnetosheath, (3) solar wind. We ran the model on a test set of several months of complete MESSENGER orbits and results are shown in the paper. For the full 4-year duration of MESSENGER's near-Mercury exploration, we had very high confidence in our model performance far from the "transition regions" near magnetospheric boundaries of bow shock and magnetopause. Thus we only ran the full model on transition regions and we distinguish between the predictions/flags in these cases by defaulting to the S2020 column label when the LSTM hasn't made a prediction. LSTM-predicted regions are labelled (1)=Magnetosphere, (2)=Magnetosheath, (3) Solar Wind, and the cases where the predictions default to the S2020 manual labels are marked (1a)=Magnetosphere, (2a)=Magnetosheath, (3a) Solar Wind. When the LSTM is run on a given time step, it returns a probability, each of which are listed in the output file. The region which the highest probability is assigned as the model prediction. We further add a model confidence flag of (1) for good and (2) for poor. These flags are based on the criteria outlined in the Smith&Jackman et al. [submitted, 2023] paper. The key criteria are that (a) a region probability must exceed 0.9, (b) other two region probabilities must be significantly distinct from the highest probability, with values below 0.8, (c) anything inside of 1.33 RM is magnetopause and outside of 7.15 RM is solar wind (based on extensive empirical examination).   The dataset presented here includes the following columns: Date [Day:Month:Year Hour:Minute:Second] S2020 label [1 = Magnetosphere, 2=Magnetosheath, 3=Solar Wind, 4=Magnetopause, 5=Bow Shock] LSTM Model Probability (1) Magnetosphere LSTM Model Probability (2) Magnetosheath LSTM Model Probability (3) Solar Wind LSTM Model Predicted Region (1)=Magnetosphere, (2)=Magnetosheath, (3) Solar Wind Model quality flag (1) = Good, (0) = Poor   Here is an example of an entry in the list: Date S2020 prob_MSP prob_SH prob_SW lstm_prediction qf 26/03/2011 01:14 2 3.81E-07 8.90E-09 1 2 1   We suggest that users can calculate their own Ground Truth Flag to compare against any given manually labelled list of near-magnetosphere regions at Mercury. We note that S2020 (and similar lists) generally label 5 regions whereas our model returns just 3 possible region predictions.   The current version is updated on December 6th 2023 This work was supported by Science Foundation Ireland Grant 18/FRL/6199  (PI Caitriona Jackman), Irish Research Council Laureate Consolidator Award (SOLMEX, PI Caitriona Jackman), NASA Discovery Data Analysis Program (DDAP) Grant #80NSSC22K1061 (PI Weijie Sun), NSF Grant 2321595 (WeiJie Sun), and NERC Independent Research Fellowship NE/W009129/1 (PI Andy Smith).

本数据集为Smith & Jackman等人[已投稿至JGR,2023年]提出的监督机器学习方法的输出结果,用于分类水星近空间区域。该数据集覆盖了信使号(MESSENGER)探测器于2011年3月23日至2015年4月30日期间开展的全部水星近空间探测任务,采样频率为每秒1条数据。 本次采用的机器学习模型为长短期记忆(Long Short Term Memory, LSTM)循环神经网络(Recurrent Neural Network, RNN),其训练数据来自公开的磁层区域标注列表(Sun, 2023,https://zenodo.org/records/8298647),相关内容可参见Sun等人2020年的研究成果。下文将该标注列表称为S2020数据集,其包含5类区域的人工标注标签:(1)磁层、(2)磁鞘、(3)太阳风、(4)磁层顶、(5)弓激波,其中前三者为核心区域,后两者为关键过渡区域。 本LSTM模型仅针对前述3类核心区域输出分类概率。我们曾在为期数月的信使号完整轨道测试集上验证该模型,相关结果已发表于投稿论文。在信使号为期4年的全部水星近空间探测任务中,对于远离弓激波与磁层顶这类磁层边界过渡区域的样本,模型性能拥有极高置信度;因此我们仅在过渡区域上运行完整模型,且默认规则为:当LSTM未生成有效预测时,采用S2020数据集的人工标注作为替代标签。LSTM模型的预测标签对应为:(1)磁层、(2)磁鞘、(3)太阳风;而采用S2020人工标注的情况则标记为(1a)磁层、(2a)磁鞘、(3a)太阳风。 当LSTM在单个时间步运行时,会输出3类核心区域的分类概率,所有概率均已记录在输出文件中。将最高概率对应的区域作为模型的分类预测结果。此外我们额外添加了模型置信度标记:(1)代表置信度良好,(2)代表置信度较差。该标记基于Smith&Jackman等人[2023年投稿论文]中阐述的判定标准,具体包括:(a)目标区域的分类概率需大于0.9;(b)其余两类区域的分类概率需与最高概率存在显著差异,且数值需低于0.8;(c)位于1.33水星半径(RM)以内的区域为磁层顶,位于7.15水星半径(RM)以外的区域为太阳风(该规则基于大量经验验证得出)。 本数据集包含以下字段: 1. 日期 [日:月:年 时:分:秒] 2. S2020标注标签 [1=磁层,2=磁鞘,3=太阳风,4=磁层顶,5=弓激波] 3. LSTM模型概率(1) 磁层 4. LSTM模型概率(2) 磁鞘 5. LSTM模型概率(3) 太阳风 6. LSTM模型预测区域 (1)=磁层,(2)=磁鞘,(3)太阳风 7. 模型质量标记 (1)=良好,(0)=较差 以下为数据集中的一条示例条目: Date S2020 prob_MSP prob_SH prob_SW lstm_prediction qf 26/03/2011 01:14 2 3.81E-07 8.90E-09 1 2 1 我们建议用户可自行计算真值标记,用于与其他已标注的水星近磁层区域数据集进行对比。需要注意的是,S2020数据集(及同类标注列表)共包含5类区域,而本模型仅支持3类区域的分类预测。 本数据集当前版本更新于2023年12月6日。 本研究得到爱尔兰科学基金会项目18/FRL/6199(负责人:Caitriona Jackman)、爱尔兰研究委员会卓越联合奖(SOLMEX,负责人:Caitriona Jackman)、NASA发现项目数据分析计划(DDAP)项目#80NSSC22K1061(负责人:Weijie Sun)、美国国家科学基金会项目2321595(Weijie Sun)以及英国自然环境研究委员会独立研究奖学金NE/W009129/1(负责人:Andy Smith)资助。
创建时间:
2023-12-06
二维码
社区交流群
二维码
科研交流群
商业服务