HA4M - Human Action Multi-Modal Monitoring in Manufacturing

Name: HA4M - Human Action Multi-Modal Monitoring in Manufacturing
Creator: Science Data Bank
Published: 2025-04-27 21:28:08
License: 暂无描述

DataCite Commons2025-04-27 更新2025-05-18 收录

下载链接：

https://www.scidb.cn/detail?dataSetId=c8d743ad2ea549dfa938cea320a38c46

下载链接

链接失效反馈

官方服务：

资源简介：

OverviewThe HA4M dataset is a collection of multi-modal data relative to actions performed by different subjects in an assembly scenario for manufacturing. It has been collected to provide a good test-bed for developing, validating and testing techniques and methodologies for the recognition of assembly actions. To the best of the authors' knowledge, few vision-based datasets exist in the context of object assembly.The HA4M dataset provides a considerable variety of multi-modal data compared to existing datasets. Six types of simultaneous data are supplied: RGB frames, Depth maps, IR frames, RGB-Depth-Aligned frames, Point Clouds and Skeleton data.These data allow the scientific community to make consistent comparisons among processing approaches or machine learning approaches by using one or more data modalities. Researchers in computer vision, pattern recognition and machine learning can use/reuse the data for different investigations in different application domains such as motion analysis, human-robot cooperation, action recognition, and so on.Dataset detailsThe dataset includes 12 assembly actions performed by 41 subjects for building an Epicyclic Gear Train (EGT).The assembly task involves three phases first, the assembly of Block 1 and Block 2 separately, and then the final setting up of both Blocks to build the EGT. The EGT is made up of a total of 12 components divided into two sets: the first eight components for building Block 1 and the remaining four components for Block 2. Finally, two screws are fixed with an Allen Key to assemble the two blocks and thus obtain the EGT.Acquisition setupThe acquisition experiment took place in two laboratories (one in Italy and one in Spain), where an acquisition area was reserved for the experimental setup. A Microsoft Azure Kinect camera acquires videos during the execution of the assembly task. It is placed in front of the operator and the table where the components are spread over. The camera is place on a tripod at an height h of 1.54 m and a distance of 1.78m. The camera is down-tilted by an angle of 17 degrees.Technical informationThe HA4M dataset contains 217 videos of the assembly task performed by 41 subjects (15 females and 26 males). Their ages ranged from 23 to 60. All the subjects participated voluntarily and were provided with a written description of the experiment. Each subject was asked to execute the task several times and to perform the actions at their own convenience (e.g. with both hands), independently from their dominant hand. The HA4M project is a growing project. So new acquisitions, planned in the next future, will expand the current dataset.ActionsTwelve actions are considered in HA4M. Actions from 1 to 4 are needed to build Block 1, then actions from 5 to 8 for building Block 2 and finally, the actions from 9 to 12 for completing the EGT. Actions are listed below:Pick up/Place CarrierPick up/Place Gear Bearings (x3)Pick up/Place Planet Gears (x3)Pick up/Place Carrier ShaftPick up/Place Sun ShaftPick up/Place Sun GearPick up/Place Sun Gear BearingPick up/Place Ring BearPick up Block 2 and place it on Block 1Pick up/Place CoverPick up/Place Screws (x2)Pick up/Place Allen Key, Turn Screws, Return Allen Key and EGTAnnotationData annotation concerns the labeling of the different actions in the video sequences.The annotation of the actions has been manually done by observing the RGB videos, frame by frame. The start frame of each action is identified as the subject starts to move the arm to the component to be grasped. The end frame, instead, is recorded when the subject releases the component, so the next frame becomes the start frame of the subsequent action.The total number of actions annotated in this study is 4123, including the “don't care” action (ID=0) and the action repetitions in the case of actions 2, 3 and 11.Available codeThe dataset has been acquired using the Multiple Azure Kinect GUI software, available at https://gitlab.com/roberto.marani/multiple-azure-kinect-gui, based on the Azure Kinect Sensor SDK v1.4.1 and Azure Kinect Body Tracking SDK v1.1.2.The software records device data to a Matroska (.mkv) file, containing video tracks, IMU samples, and device calibration. In this work, IMU samples are not considered.The same Multiple Azure Kinect GUI software processes the Matroska file and returns the different types of data provided with our dataset: RGB images, RGB-depth-Aligned (RGB-A) images, Depth images, IR images, Point Cloud and Skeleton data.

概述：HA4M数据集是一组多模态数据集合，采集自制造场景下不同受试者完成装配操作的过程，旨在为开发、验证与测试装配动作识别相关技术与方法提供优质测试基准。据作者所知，当前面向物体装配场景的视觉类数据集较为稀缺。相较于现有数据集，HA4M数据集提供了更为丰富的多模态数据类型，共包含六种同步采集的数据模态：RGB帧、深度图、红外（IR）帧、RGB-深度对齐帧、点云与骨骼数据。该数据集可支持科研领域基于单种或多种数据模态对处理方法、机器学习方法开展一致性对比研究。计算机视觉、模式识别与机器学习领域的研究者可将该数据集复用至运动分析、人机协作、动作识别等多个应用场景的相关研究中。数据集详情：本数据集包含41名受试者完成行星齿轮系（Epicyclic Gear Train, EGT）装配的12类动作数据。装配任务共分为三个阶段：首先分别完成组件1与组件2的装配，随后将两个组件组合完成行星齿轮系的最终搭建。该行星齿轮系总计包含12个零件，分为两组：前8个零件用于组装组件1，剩余4个零件用于组装组件2；最后使用内六角扳手拧紧两颗螺钉以固定两个组件，最终得到完整的行星齿轮系。采集设置：本次采集实验在两处实验室开展（分别位于意大利与西班牙），两处实验室均预留了专属采集区域用于实验部署。装配任务执行过程中，采用Microsoft Azure Kinect相机录制视频。相机架设于操作员与零件摆放工作台的前方，通过三脚架固定，高度为1.54米，与工作台的水平距离为1.78米，相机向下倾斜17度。技术参数：HA4M数据集共包含217段装配任务视频，由41名受试者完成（其中女性15名，男性26名），受试者年龄跨度为23至60岁。所有受试者均自愿参与实验，并提前获取了实验书面说明。每名受试者需多次完成装配任务，且可按照自身习惯执行动作（例如使用双手），无需受利手限制。HA4M项目属于持续迭代的开放项目，未来计划开展新增数据采集，以进一步扩充现有数据集规模。动作类别：HA4M数据集共涵盖12类装配动作，其中动作1至4用于组装组件1，动作5至8用于组装组件2，动作9至12用于完成行星齿轮系的最终装配。具体动作列表如下： 1. 取放行星架（Carrier） 2. 取放齿轮轴承（×3） 3. 取放行星齿轮（×3） 4. 取放行星架轴 5. 取放太阳轴 6. 取放太阳轮 7. 取放太阳轮轴承 8. 取放环形轴承 9. 拿起组件2并放置于组件1上 10. 取放端盖 11. 取放螺钉（×2） 12. 取放内六角扳手、拧紧螺钉、归还内六角扳手并完成行星齿轮系搭建标注信息：数据标注工作针对视频序列中的各类装配动作开展。所有动作标注均通过逐帧观测RGB视频手动完成：单动作的起始帧为受试者开始移动手臂前往待抓取零件的时刻，终止帧为受试者松开零件的时刻，下一帧则为后续动作的起始帧。本研究中共完成4123个动作的标注，其中包含“无关动作”（ID=0）以及动作2、3、11的重复执行样本。配套代码与工具：本数据集采用基于Azure Kinect传感器SDK v1.4.1与Azure Kinect人体追踪SDK v1.1.2开发的Multiple Azure Kinect GUI软件完成数据采集，该软件的开源地址为https://gitlab.com/roberto.marani/multiple-azure-kinect-gui。软件会将设备数据录制为Matroska (.mkv)格式文件，其中包含视频轨道、惯性测量单元（IMU）采样数据与设备校准信息，本研究未使用IMU采样数据。同款Multiple Azure Kinect GUI软件可对.mkv文件进行处理，输出本数据集提供的各类数据：RGB图像、RGB-深度对齐（RGB-A）图像、深度图像、红外（IR）图像、点云与骨骼数据。

提供机构：

Science Data Bank

创建时间：

2022-07-06

搜集汇总

数据集介绍