Outlier classification using autoencoders: application for fluctuation driven flows in fusion plasmas

DataONE2021-06-02 更新2024-06-08 收录

下载链接：

https://search.dataone.org/view/sha256:03f824346376ea7611514d9a6aceb4d1d25e3a4090132a84569ae955fd7c5463

下载链接

链接失效反馈

官方服务：

资源简介：

Understanding the statistics of fluctuation driven flows in the boundary layer of magnetically confined plasmas is desired to accurately model the lifetime of the vacuum vessel components. Mirror Langmuir probes (MLPs) are a novel diagnostic that uniquely allow us to sample the plasma parameters on a time scale shorter than the characteristic time scale of their fluctuations. Sudden large-amplitude fluctuations in the plasma degrade the precision and accuracy of the plasma parameters reported by MLPs for cases in which the probe bias range is of insufficient amplitude. While some data samples can readily be classified as valid and invalid, we find that such a classification may be ambiguous for up to 40% of data sampled for the plasma parameters and bias voltages considered in this study. In this contribution, we employ an autoencoder (AE) to learn a low-dimensional representation of valid data samples. By definition, the coordinates in this space are the features that mostly characterize valid data. Ambiguous data samples are classified in this space using standard classifiers for vectorial data. In this way, we avoid defining complicated threshold rules to identify outliers, which require strong assumptions and introduce biases in the analysis. By removing the outliers that are identified in the latent low-dimensional space of the AE, we find that the average conductive and convective radial heat fluxes are between approximately 5% and 15% lower as when removing outliers identified by threshold values. For contributions to the radial heat flux due to triple correlations, the difference is up to 40%.

为精准建模真空室组件的服役寿命，亟需理解磁约束等离子体边界层内涨落驱动流的统计特性。镜像朗缪尔探针（Mirror Langmuir Probes, MLPs）是一种新型诊断装置，可实现在短于等离子体涨落特征时间尺度的时域内采样等离子体参数。当探针偏置范围幅值不足时，等离子体中突发的大振幅涨落会降低MLPs所报告的等离子体参数的测量精度与准确度。尽管部分数据样本可轻易划分为有效与无效样本，但针对本研究中考察的等离子体参数与偏置电压，仍有至多40%的采样数据存在分类模糊性。本研究采用自编码器（Autoencoder, AE）对有效数据样本进行低维表征学习。根据定义，该低维空间中的坐标即为表征有效数据的核心特征。利用面向矢量数据的标准分类器，即可在该空间内完成模糊数据样本的分类。以此方式，我们无需定义复杂的阈值规则以识别异常值——此类规则往往需要较强的先验假设，且会在分析过程中引入偏倚。相较于通过阈值法识别异常值的处理方式，移除自编码器潜低维空间中检出的异常值后，径向传导热通量与径向对流热通量的平均值分别降低约5%至15%。对于由三重相关性贡献的径向热通量分量而言，二者的差异最高可达40%。

创建时间：

2023-11-14

5,000+

优质数据集

54 个

任务类型

进入经典数据集