Error and anomaly detection for intra-participant time-series data
收藏Mendeley Data2024-06-25 更新2024-06-27 收录
下载链接:
https://tandf.figshare.com/articles/dataset/Error_and_anomaly_detection_for_intra-participant_time-series_data/5189002/1
下载链接
链接失效反馈官方服务:
资源简介:
Identification of errors or anomalous values, collectively considered outliers, assists in exploring data or through removing outliers improves statistical analysis. In biomechanics, outlier detection methods have explored the ‘shape’ of the entire cycles, although exploring fewer points using a ‘moving-window’ may be advantageous. Hence, the aim was to develop a moving-window method for detecting trials with outliers in intra-participant time-series data. Outliers were detected through two stages for the strides (mean 38 cycles) from treadmill running. Cycles were removed in stage 1 for one-dimensional (spatial) outliers at each time point using the median absolute deviation, and in stage 2 for two-dimensional (spatial–temporal) outliers using a moving window standard deviation. Significance levels of the t-statistic were used for scaling. Fewer cycles were removed with smaller scaling and smaller window size, requiring more stringent scaling at stage 1 (mean 3.5 cycles removed for 0.0001 scaling) than at stage 2 (mean 2.6 cycles removed for 0.01 scaling with a window size of 1). Settings in the supplied Matlab code should be customised to each data set, and outliers assessed to justify whether to retain or remove those cycles. The method is effective in identifying trials with outliers in intra-participant time series data.
对被统称之为异常值(outliers)的错误或异常数值进行识别,既可辅助数据集探索工作,也可通过移除异常值优化统计分析流程。在生物力学领域,现有异常值检测方法多聚焦于完整运动周期的整体特征,而采用移动窗口(moving-window)对更少数据点进行分析或许更具优势。为此,本研究旨在开发一种移动窗口方法,用于检测受试者内部时间序列数据中存在异常值的试验片段。针对跑台运动中的步幅数据(平均包含38个运动周期),异常值检测分为两个阶段完成:第一阶段采用中位数绝对偏差法,对每个时间点的一维(空间)异常值进行识别并移除对应周期;第二阶段则借助移动窗口标准差法,识别二维(时空)异常值并移除对应周期。本次研究采用t统计量的显著性水平作为缩放阈值依据。缩放系数与窗口尺寸越小,需移除的运动周期数越少;且第一阶段需采用更为严格的缩放规则——当缩放系数为0.0001时,平均移除3.5个周期,远高于第二阶段(当缩放系数为0.01、窗口尺寸为1时,平均仅移除2.6个周期)。所提供的Matlab代码中的参数设置需针对每个数据集进行自定义调整,同时需对异常值进行评估,以论证对应运动周期的保留或移除合理性。该方法可有效识别受试者内部时间序列数据中存在异常值的试验片段。
创建时间:
2023-06-28



