five

Analysis of the performance of test statistics for detection of Outliers (Additive, Innovative, Transient and Level Shift) in AR (1) processes

收藏
DataCite Commons2026-02-16 更新2024-07-25 收录
下载链接:
https://tandf.figshare.com/articles/dataset/Analysis_of_the_performance_of_test_statistics_for_detection_of_Outliers_Additive_Innovative_Transient_and_Level_Shift_in_AR_1_processes/1569134/1
下载链接
链接失效反馈
官方服务:
资源简介:
Outlier detection has always been of interest for researchers and data miners. It has been well researched in different knowledge and application domains. This study aims at exploring the correctly identifying outliers using most commonly applied statistics. We evaluate the performance of AO, IO, LS and TC as vulnerability to spurious outliers by means of empirical level of significance, power of the test indicating the sensitivity of the statistical tests in detecting changes and the vulnerability to masking of outliers in terms of misspecification frequencies are determined. We have observed that the sampling distribution of test statistic η<sub><i>tp</i></sub>; <i>tp</i> = <i>AO</i>, <i>IO</i>, <i>LS</i>, <i>TC</i>  in case of AR(1) model is connected with the values of n and φ. The sampling distribution of  η<sub><i>TC</i></sub>  is less concentrated than the sampling distribution of η<sub><i>AO</i></sub>, η<sub><i>IO</i></sub> and η<sub><i>LS</i></sub>. In AR(1) process, empirical critical values for 1%, 5% and 10% upper percentiles are found to be higher than those generally used. We have also found the evidence that the test statistics for transient change (TC) needs to be revisited as the test statistics η<sub><i>TC</i></sub> is found to be eclipsed by η<sub><i>AO</i></sub>, η<sub><i>LS</i></sub> and η<sub><i>IO</i></sub> at different δ values. TC keeps on confusing with IO and AO, and at extreme δ values it just gets equal to AO and LS.

异常值检测一直是研究者与数据挖掘从业者关注的核心课题,已在多类知识与应用领域得到充分研究。本研究旨在借助最常用的统计方法,探索异常值的准确识别路径。本研究通过实证显著性水平、表征统计检验检测变化灵敏度的检验功效,以及基于误设频率衡量的异常值遮蔽脆弱性,评估AO、IO、LS与TC四类统计量应对伪异常值的性能表现。我们观察到,在自回归AR(1)模型场景下,检验统计量η<sub><i>tp</i></sub>(其中<i>tp</i> = AO、IO、LS、TC)的抽样分布与样本量n和参数φ的取值密切相关。η<sub><i>TC</i></sub>的抽样分布相较于η<sub><i>AO</i></sub>、η<sub><i>IO</i></sub>与η<sub><i>LS</i></sub>的抽样分布,集中度更低(离散程度更高)。在AR(1)过程中,1%、5%与10%上侧分位数对应的实证临界值,普遍高于现行通用的临界值。我们还发现,有证据表明需要重新审视瞬态变化(transient change, TC)对应的检验统计量:在不同δ取值下,η<sub><i>TC</i></sub>的检测性能均劣于η<sub><i>AO</i></sub>、η<sub><i>LS</i></sub>与η<sub><i>IO</i></sub>。TC检验时常与IO、AO检验产生混淆,在极端δ取值下,其检测效果与AO、LS检验趋于一致。
提供机构:
Taylor & Francis
创建时间:
2016-01-20
二维码
社区交流群
二维码
科研交流群
商业服务