EMGBench

Name: EMGBench
Creator: 卡内基梅隆大学
Published: 2024-10-31 12:24:03
License: 暂无描述

arXiv2024-10-31 更新2024-11-02 收录

下载链接：

https://github.com/jehanyang/emgbench

下载链接

链接失效反馈

官方服务：

资源简介：

EMGBench是由卡内基梅隆大学创建的一个用于评估肌电图（EMG）分类算法在分布外性能的基准数据集。该数据集包含九个EMG数据集，涵盖了多种电极配置和传感器位置，旨在评估模型在不同用户和时间序列上的泛化能力和适应性。数据集的创建过程包括使用易于穿戴的高密度EMG传感器进行数据采集，并经过标准化预处理。EMGBench主要应用于辅助技术控制领域，如计算机、假肢和移动机械臂的控制，旨在解决EMG信号分类在实际应用中的泛化问题。

EMGBench is a benchmark dataset developed by Carnegie Mellon University for evaluating the out-of-distribution performance of electromyography (EMG) classification algorithms. This dataset comprises nine EMG datasets covering diverse electrode configurations and sensor placements, aiming to assess the generalization ability and adaptability of models across different users and time series. The dataset’s creation involves data collection using wearable high-density EMG sensors, followed by standardized preprocessing. EMGBench is primarily applied in the field of assistive technology control, such as the control of computers, prosthetics, and mobile robotic arms, with the goal of addressing the generalization challenges of EMG signal classification in real-world applications.

提供机构：

卡内基梅隆大学

创建时间：

2024-10-31

原始信息汇总

EMGBench: Benchmarking Out-of-Distribution Generalization and Adaptation for Electromyography

数据集概述

EMGBench 是一个用于评估肌电图（Electromyography, EMG）数据集在分布外泛化和适应能力的基准测试工具。该数据集由多个子数据集组成，包括：

capgmyo
hyser
myoarmbanddataset
ninapro-db5
uciemg
flexwear-hd

数据集使用

安装与设置

安装 Miniforge 版本 >= Miniforge3-22.3.1-0。
在 Linux x86_64 (amd64) 架构上运行，推荐使用 Ubuntu 20.04。
安装必要的软件包： console $ sudo apt update $ sudo apt install git jq git-lfs
创建并激活虚拟环境： console $ git clone https://github.com/maxwellsoh/emgBenchmarking.git $ cd emgBenchmarking/ $ git lfs install $ mamba env create -n emgbench -f environment.yml $ conda activate emgbench

基准测试数据集

CNN_EMG.py 会自动下载所需的每个运行的数据集。
Hyser 数据集可能需要数小时下载。

复制表格

使用配置文件

配置数据集和其他参数，使用位于 ./config/table{i}.yaml 的 YAML 文件。
配置完成后运行：

python run_CNN_EMG.py --table{i}

手动运行

复制第一个表格，运行以下 shell 脚本：

starting_index=1 ending_index=10 # set to the maximum number of participants for the dataset current_dataset=capgmyo # set to the dataset you want to run with number_windows=50 # set to 1/20 of sampling rate or 1/16 of sampling rate for Hyser

for subj in $(seq $starting_index $ending_index) do python CNN_EMG.py --dataset=$current_dataset --seed=0 --model=resnet18 --epochs=100 --project_name_suffix=__preprocessing-comparison --turn_off_scaler_normalization=True --leftout_subject=$subj --leave_one_subject_out=True --transfer_learning=True --train_test_split_for_time_series=True --save_images=True --learning_rate=5e-4 --proportion_transfer_learning=0.2 --proportion_data_from_training_subjects=1.0 --finetuning_epochs=750 --pretrain_and_finetune=True --partial_dataset_ninapro=True; python CNN_EMG.py --dataset=$current_dataset --seed=0 --model=resnet18 --epochs=100 --project_name_suffix=__preprocessing-comparison --turn_off_scaler_normalization=True --leftout_subject=$subj --leave_one_subject_out=True --transfer_learning=True --train_test_split_for_time_series=True --save_images=True --learning_rate=5e-4 --proportion_transfer_learning=0.2 --proportion_data_from_training_subjects=1.0 --turn_on_rms=True --rms_input_windowsize=$number_windows --finetuning_epochs=750 --pretrain_and_finetune=True --partial_dataset_ninapro=True; python CNN_EMG.py --dataset=$current_dataset --seed=0 --model=resnet18 --epochs=100 --project_name_suffix=__preprocessing-comparison --turn_off_scaler_normalization=True --leftout_subject=$subj --leave_one_subject_out=True --transfer_learning=True --train_test_split_for_time_series=True --save_images=True --learning_rate=5e-4 --proportion_transfer_learning=0.2 --proportion_data_from_training_subjects=1.0 --turn_on_spectrogram=True --finetuning_epochs=750 --pretrain_and_finetune=True --partial_dataset_ninapro=True; python CNN_EMG.py --dataset=$current_dataset --seed=0 --model=resnet18 --epochs=100 --project_name_suffix=__preprocessing-comparison --turn_off_scaler_normalization=True --leftout_subject=$subj --leave_one_subject_out=True --transfer_learning=True --train_test_split_for_time_series=True --save_images=True --learning_rate=5e-4 --proportion_transfer_learning=0.2 --proportion_data_from_training_subjects=1.0 --turn_on_cwt=True --finetuning_epochs=750 --pretrain_and_finetune=True --partial_dataset_ninapro=True; wait done
复制第二个表格，运行以下 shell 脚本：

starting_index=1 ending_index=10 # set to the maximum number of participants for the dataset current_dataset=capgmyo # set to the dataset you want to run with preprocessing="--turn_on_cwt=True" # set to "" for raw, "--turn_on_cwt=True" for cwt, or "--turn_on_spectrogram=True" for stft depending on which preprocessing method was the best for the dataset

for subj in $(seq $starting_index $ending_index) do python CNN_EMG.py --dataset=$current_dataset $preprocessing --seed=0 --model=resnet18 --epochs=50 --project_name_suffix=__model-comparison_one-session --turn_off_scaler_normalization=True --leftout_subject=$subj --leave_one_subject_out=True --transfer_learning=True --train_test_split_for_time_series=True --save_images=True --learning_rate=5e-4 --proportion_transfer_learning=0.2 --proportion_data_from_training_subjects=1.0 --finetuning_epochs=375 --pretrain_and_finetune=True --partial_dataset_ninapro=True; python CNN_EMG.py --dataset=$current_dataset $preprocessing --seed=0 --model=vit_tiny_patch16_224 --epochs=50 --project_name_suffix=__model-comparison_one-session --turn_off_scaler_normalization=True --leftout_subject=$subj --leave_one_subject_out=True --transfer_learning=True --train_test_split_for_time_series=True --save_images=True --learning_rate=5e-4 --proportion_transfer_learning=0.2 --proportion_data_from_training_subjects=1.0 --finetuning_epochs=375 --pretrain_and_finetune=True --partial_dataset_ninapro=True; python CNN_EMG.py --dataset=$current_dataset $preprocessing --seed=0 --model=efficientnet_b0 --epochs=50 --project_name_suffix=__model-comparison_one-session --turn_off_scaler_normalization=True --leftout_subject=$subj --leave_one_subject_out=True --transfer_learning=True --train_test_split_for_time_series=True --save_images=True --learning_rate=5e-4 --proportion_transfer_learning=0.2 --proportion_data_from_training_subjects=1.0 --finetuning_epochs=375 --pretrain_and_finetune=True --partial_dataset_ninapro=True; python CNN_EMG.py --dataset=$current_dataset $preprocessing --seed=0 --model=efficientvit_b0 --epochs=50 --project_name_suffix=__model-comparison_one-session --turn_off_scaler_normalization=True --leftout_subject=$subj --leave_one_subject_out=True --transfer_learning=True --train_test_split_for_time_series=True --save_images=True --learning_rate=5e-4 --proportion_transfer_learning=0.2 --proportion_data_from_training_subjects=1.0 --finetuning_epochs=375 --pretrain_and_finetune=True --partial_dataset_ninapro=True; wait done
复制第三个表格的比例，运行以下 shell 脚本：

starting_index=1 ending_index=10 # set to the maximum number of participants for the dataset current_dataset=capgmyo # set to the dataset you want to run with preprocessing="--turn_on_cwt=True" # set to "" for raw, "--turn_on_cwt=True" for cwt, or "--turn_on_spectrogram=True" for stft depending on which preprocessing method was the best for the dataset best_model=resnet18 # set to the model that performed best for the dataset

for subj in $(seq $starting_index $ending_index) do python CNN_EMG.py --dataset=$current_dataset --seed=0 --model=$best_model $preprocessing --epochs=50 --project_name_suffix=__proportion-comparison --turn_off_scaler_normalization=True --leftout_subject=$subj --leave_one_subject_out=True --transfer_learning=True --train_test_split_for_time_series=True --save_images=True --learning_rate=5e-4 --proportion_transfer_learning=0.2 --proportion_data_from_training_subjects=1.0 --finetuning_epochs=375 --pretrain_and_finetune=True --partial_dataset_ninapro=True; python CNN_EMG.py --dataset=$current_dataset --seed=0 --model=$best_model $preprocessing --epochs=50 --project_name_suffix=__proportion-comparison --turn_off_scaler_normalization=True --leftout_subject=$subj --leave_one_subject_out=True --transfer_learning=True --train_test_split_for_time_series=True --save_images=True --learning_rate=5e-4 --proportion_transfer_learning=0.4 --proportion_data_from_training_subjects=1.0 --finetuning_epochs=375 --pretrain_and_finetune=True --partial_dataset_ninapro=True; python CNN_EMG.py --dataset=$current_dataset --seed=0 --model=$best_model $preprocessing --epochs=50 --project_name_suffix=__proportion-comparison --turn_off_scaler_normalization=True --leftout_subject=$subj --leave_one_subject_out=True --transfer_learning=True --train_test_split_for_time_series=True --save_images=True --learning_rate=5e-4 --proportion_transfer_learning=0.6 --proportion_data_from_training_subjects=1.0 --finetuning_epochs=375 --pretrain_and_finetune=True --partial_dataset_ninapro=True; python CNN_EMG.py --dataset=$current_dataset --seed=0 --model=$best_model $preprocessing --epochs=50 --project_name_suffix=__proportion-comparison --turn_off_scaler_normalization=True --leftout_subject=$subj --leave_one_subject_out=True --transfer_learning=True --train_test_split_for_time_series=True --save_images=True --learning_rate=5e-4 --proportion_transfer_learning=0.8 --proportion_data_from_training_subjects=1.0 --finetuning_epochs=375 --pretrain_and_finetune=True --partial_dataset_ninapro=True; wait done
对于具有多个会话的数据集，运行以下 shell 脚本：

starting_index=1 ending_index=10 # set to the maximum number of participants for the dataset current_dataset=capgmyo # set to the dataset you want to run with preprocessing="--turn_on_cwt=True" # set to "" for raw, "--turn_on_cwt=True" for cwt, or "--turn_on_spectrogram=True" for stft depending on which preprocessing method was the best for the dataset best_model=resnet18 # set to the model that performed best for the dataset

for subj in $(seq $starting_index $ending_index) do python CNN_EMG.py --dataset=$current_dataset --seed=0 --model=$best_model $preprocessing --epochs=50 --project_name_suffix=__intersession-comparison --turn_off_scaler_normalization=True --leftout_subject=$subj --leave_one_subject_out=True --leave_one_session_out=True --train_test_split_for_time_series=True --save_images=True --learning_rate=5e-4 --proportion_data_from_training_subjects=1.0 --finetuning_epochs=375 --pretrain_and_finetune=True --partial_dataset_ninapro=True --proportion_unlabeled_data_from_leftout_subject=0.75; wait done

数据集自定义

添加数据集

新数据集可以通过 CNN_EMG.py 进行基准测试，前提是将其处理为 HDF5 文件并保存到以下目录：DatasetsProcessed_hdf5/[DATASET-NAME]/p[N]/participant_[N].hdf5，其中 N 是参与者的编号，范围从 1 到参与者的数量。
每个 HDF5 文件的键应该是每个手势的名称，每个手势的数据应以形状 [# TRIALS, # ELECTRODES, # TIMESTEPS] 存储。
创建一个文件 DatasetsProcessed_hdf5/[DATASET-NAME]/frequency.txt，仅包含数据集的频率（以 Hz 为单位）。

创建自定义运行

run_CNN_EMG.py 也接受配置文件。创建一个新的 yaml 文件并运行：

python run_CNN_EMG.py --config config/example.yaml

故障排除

如果遇到 OSError: [Errno 24] Too many open files 错误，运行以下命令： console $ ulimit -n 65536
如果遇到以下错误：

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/jehan/emgBenchmarking/CNN_EMG.py", line 486, in <module> emg = emg_async.get() # (SUBJECT, TRIAL, CHANNEL, TIME) File "/home/jehan/miniforge3/envs/emgbench/lib/python3.9/multiprocessing/pool.py", line 771, in get raise self._value OSError: Unable to synchronously open file (file signature not found)

可能未安装 git-lfs，请安装并重试。

开发

更新虚拟环境： console $ mamba env update --file environment.yml --prune
保存虚拟环境： console $ mamba env export --no-builds > environment.yml

搜集汇总

数据集介绍

构建方式

EMGBench数据集的构建方式独具匠心，涵盖了九个不同来源的肌电图（EMG）数据集，构成了迄今为止最为全面的EMG数据集基准。其中，一个新颖的高密度可穿戴EMG传感器数据集被引入，该传感器设计简便，易于佩戴，为数据采集提供了便利。数据集的构建过程中，特别关注了跨受试者分类和时间序列的训练-测试分割，以评估模型在不同分布数据上的泛化能力和适应性。

使用方法

EMGBench数据集的使用方法灵活多样，适用于多种机器学习模型的评估和比较。研究者可以通过访问emgbench.github.io获取代码和数据，利用提供的基准代码库进行标准化评估。数据集支持跨受试者分类和时间序列适应性任务的评估，研究者可以在此基础上进行模型的预训练和微调，以评估其在不同数据分布下的性能。此外，数据集的高密度可穿戴传感器数据为个性化模型的开发提供了丰富的资源。

背景与挑战

背景概述

EMGBench数据集由卡内基梅隆大学的研究人员于2024年创建，旨在通过机器学习评估肌电图（EMG）分类算法在分布外（out-of-distribution）性能上的泛化和适应能力。该数据集的核心研究问题是如何使EMG分类器能够处理与训练分布不同的输入，这对于实际部署为控制接口至关重要。通过预测用户意图的手势，EMGBench数据集旨在创建可穿戴解决方案，以控制辅助技术，如计算机、假肢和移动机械手机器人。该数据集包括两个主要任务：跨受试者分类和使用时间序列的训练-测试分割进行适应，涵盖九个数据集，是迄今为止最大的EMG数据集集合。EMGBench的推出填补了EMG研究社区缺乏开源基准的空白，为研究人员提供了一个宝贵的资源，用于分析EMG数据集的分布外性能的实际措施。

当前挑战

EMGBench数据集面临的挑战主要集中在解决领域问题和构建过程中遇到的困难。首先，EMG信号的非平稳性使得基于学习的方法难以实现泛化。其次，EMG信号受到多种因素的影响，如肌肉位置、手臂尺寸、皮肤阻抗和电极放置，这些因素导致了概念漂移，改变了训练集和测试集之间的条件概率。此外，缺乏标准化的基准使得不同论文之间的准确性结果难以比较。构建过程中，研究人员需要处理不同硬件和传感器放置的差异，确保数据集的多样性和代表性。EMGBench通过引入新的高密度可穿戴EMG传感器和跨受试者分类任务，试图解决这些挑战，但其成功仍依赖于进一步的研究和优化。

常用场景

经典使用场景

EMGBench数据集的经典使用场景主要集中在肌电图（EMG）分类算法的分布外泛化性能评估。通过预测用户意图的手势，该数据集支持创建可穿戴解决方案，以控制辅助技术，如计算机、假肢和移动机械手。具体任务包括跨受试者分类和使用时间序列的训练-测试分割进行适应，涵盖九个数据集，是迄今为止最大的EMG数据集集合。

解决学术问题

EMGBench数据集解决了EMG研究领域中缺乏开源基准的问题，使得不同论文之间的准确性结果难以比较。通过提供一个标准化的评估框架，该数据集为研究人员提供了一个宝贵的资源，用于分析EMG数据集的分布外性能的实际措施。这不仅促进了EMG分类算法的鲁棒性和适应性研究，还推动了可穿戴控制接口的实际应用。

实际应用

在实际应用中，EMGBench数据集支持开发更鲁棒和适应性强的EMG控制接口，这些接口可以用于控制假肢、计算机和移动机械手等辅助技术。通过评估跨受试者和跨会话的泛化性能，该数据集有助于减少EMG接口的设置过程，使其更易于新用户使用。这对于上肢或下肢截肢者以及因中风或脊髓损伤导致的瘫痪患者尤为重要。

数据集最近研究