Systematic review of validation of supervised machine learning models in accelerometer-based animal behaviour classification literature

DataONE2025-06-24 更新2025-06-28 收录

下载链接：

https://search.dataone.org/view/sha256:b74cde6b1949fa074617b2f82f578bff3bc11400e801f56bdad70724e673ba8d

下载链接

链接失效反馈

官方服务：

资源简介：

Supervised machine learning has been used to detect fine-scale animal behaviour from accelerometer data, but a standardised protocol for implementing this workflow is currently lacking. As the application of machine learning to ecological problems expands, it is essential to establish technical protocols and validation standards that align with those in other \"big data\" fields. Overfitting is a prevalent and often misunderstood challenge in machine learning. Overfit models overly adapt to the training data to memorise specific instances rather than to discern the underlying signal. Associated results can indicate high performance on the training set, yet these models are unlikely to generalise to new data. Overfitting can be detected through rigorous validation using independent test sets. Our systematic review of 119 studies using accelerometer-based supervised machine learning to classify animal behaviour reveals that 79% (94 papers) did not validate their models sufficiently wel..., We defined eligibility criteria as 'peer-reviewed primary research papers published 2013-present that use supervised machine learning to identify specific behaviours from raw, non-livestock animal accelerometer data'. We elected to ignore analysis of livestock behaviour as agricultural methods often operate within different constraints to the analyses conducted on wild animals and this body of literature has mostly developed in isolation to wild animal research. Our search was conducted on 27/09/2024. Initial keyword search across 3 databases (Google Scholar, PubMed, and Scopus) yielded 249 unique papers. Papers outside of the search criteria â including hardware and software advances, non-ML analysis, insufficient accelerometry application (e.g., research focused on other sensors with accelerometry providing minimal support), unsupervised methods, and research limited to activity intensity or active and inactive statesâ were excluded, resulting in 119 papers., , # Systematic review of validation of supervised machine learning models in accelerometer-based animal behaviour classification literature [https://doi.org/10.5061/dryad.fxpnvx14d](https://doi.org/10.5061/dryad.fxpnvx14d) ## Description of the data and file structure ### Files and variables #### File: Systematic\_Review\_Supplementary.xlsx **Description:**Â Methods information from animal accelerometer-based behaviour classification literature utilising supervised machine learning techniques. #### Variables * **Citation:** Citation information for paper * **Title:** Extracted title from citation information * **Year:** Year of publication * **ModelCategory**: General category of the supervised machine learning model used (e.g., all Support Vector Machines are listed as SVM) * DT â Decision Tree * EM â Expectation Maximisation * Ensemble â Ensemble methods (e.g., boosting, bagging) * HMM â Hidden Markov Model * Isolation Forest â Anomaly detection using Isolation Forest ...,

监督式机器学习（Supervised machine learning）已被用于从加速度计（accelerometer）数据中检测精细尺度的动物行为，但目前仍缺乏一套标准化的工作流实现方案。随着机器学习在生态问题中的应用不断拓展，建立与其他“大数据”领域接轨的技术规范与验证标准至关重要。过拟合（overfitting）是机器学习中普遍存在且常被误解的挑战。过拟合模型会过度适配训练数据，转而记忆特定样本而非识别底层信号。这类模型在训练集上可能表现优异，却难以泛化至新数据。过拟合可通过独立测试集的严格验证得以检出。我们对119项基于加速度计的监督式机器学习分类动物行为研究进行了系统综述（systematic review），结果显示其中79%（共94篇论文）未能充分验证其模型…… 我们将纳入标准定义为“2013年至今发表的同行评议（peer-reviewed）原创研究论文，且需使用监督式机器学习从非家畜的原始动物加速度计数据中识别特定行为”。我们选择不分析家畜行为相关研究，原因在于农业场景的分析约束通常与野生动物研究存在显著差异，且该类文献的发展大多独立于野生动物研究领域。本次检索于2024年9月27日完成。我们在3个数据库（Google Scholar、PubMed及Scopus）中进行了初步关键词检索，共得到249篇独立论文。随后剔除不符合检索标准的文献——包括软硬件进展类研究、非机器学习分析、加速度计应用不足的研究（例如仅以其他传感器为核心、仅辅以少量加速度计数据的研究）、无监督方法类研究，以及仅局限于活动强度或活动/静止状态分析的研究，最终纳入119篇论文。 # 基于加速度计的动物行为分类研究中监督式机器学习模型验证的系统综述 [https://doi.org/10.5061/dryad.fxpnvx14d](https://doi.org/10.5061/dryad.fxpnvx14d) ## 数据与文件结构说明 ### 文件与变量 #### 文件：Systematic_Review_Supplementary.xlsx **说明：** 收录了采用监督式机器学习技术的动物加速度计行为分类相关文献的方法学信息。 #### 变量 * **Citation：** 论文的引用信息 * **Title：** 从引用信息中提取的论文标题 * **Year：** 发表年份 * **ModelCategory：** 所用监督式机器学习模型的通用类别（例如所有支持向量机均归类为SVM） * DT —— 决策树（Decision Tree） * EM —— 期望最大化（Expectation Maximisation） * Ensemble —— 集成学习方法（例如提升、装袋） * HMM —— 隐马尔可夫模型（Hidden Markov Model） * Isolation Forest —— 基于孤立森林的异常检测

创建时间：

2025-06-25

5,000+

优质数据集

54 个

任务类型

进入经典数据集