Outlier classification using autoencoders: application for fluctuation driven flows in fusion plasmas

NIAID Data Ecosystem2026-03-12 收录

下载链接：

https://doi.org/10.7910/DVN/SKEHRJ

下载链接

链接失效反馈

官方服务：

资源简介：

Understanding the statistics of fluctuation driven flows in the boundary layer of magnetically confined plasmas is desired to accurately model the lifetime of the vacuum vessel components. Mirror Langmuir probes (MLPs) are a novel diagnostic that uniquely allow us to sample the plasma parameters on a time scale shorter than the characteristic time scale of their fluctuations. Sudden large-amplitude fluctuations in the plasma degrade the precision and accuracy of the plasma parameters reported by MLPs for cases in which the probe bias range is of insufficient amplitude. While some data samples can readily be classified as valid and invalid, we find that such a classification may be ambiguous for up to 40% of data sampled for the plasma parameters and bias voltages considered in this study. In this contribution, we employ an autoencoder (AE) to learn a low-dimensional representation of valid data samples. By definition, the coordinates in this space are the features that mostly characterize valid data. Ambiguous data samples are classified in this space using standard classifiers for vectorial data. In this way, we avoid defining complicated threshold rules to identify outliers, which require strong assumptions and introduce biases in the analysis. By removing the outliers that are identified in the latent low-dimensional space of the AE, we find that the average conductive and convective radial heat fluxes are between approximately 5% and 15% lower as when removing outliers identified by threshold values. For contributions to the radial heat flux due to triple correlations, the difference is up to 40%.

为精准建模磁约束等离子体边界层内涨落驱动流动的统计特征，进而准确预估真空室组件的服役寿命，亟需开展相关研究。镜像朗缪尔探针（Mirror Langmuir Probes，MLPs）是一种新型等离子体诊断装置，其独特优势在于能够以短于等离子体涨落特征时间尺度的采样频率获取等离子体参数。当探针偏置幅值不足时，等离子体中突发的大振幅涨落会降低镜像朗缪尔探针所测得等离子体参数的精度与准确度。尽管部分数据样本可直接被划分为有效与无效样本，但本研究发现，针对所采集的等离子体参数与偏置电压数据，多达40%的样本存在分类模糊的问题。本研究采用自编码器（autoencoder，AE）对有效数据样本开展低维表征学习，根据定义，该低维隐空间中的坐标即为能够最有效表征有效数据的核心特征；针对模糊样本，可通过面向矢量数据的标准分类器在该空间中完成分类，从而避免定义复杂的阈值规则来识别异常值——这类规则往往需要较强的假设前提，且会在分析中引入系统性偏差。通过移除自编码器低维隐空间中识别出的异常值，我们发现径向热通量的平均传导与对流分量较基于阈值法识别异常值的处理结果分别降低了约5%至15%，而针对由三重关联效应引发的径向热通量贡献项，二者的差异最高可达40%。

创建时间：

2021-06-02

5,000+

优质数据集

54 个

任务类型

进入经典数据集