HA4M - Human Action Multi-Modal Monitoring in Manufacturing
收藏www.doi.org2025-03-25 收录
下载链接:
https://www.doi.org/10.57760/sciencedb.01872
下载链接
链接失效反馈官方服务:
资源简介:
OverviewThe HA4M dataset is a collection of multi-modal data relative to actions performed by different subjects in an assembly scenario for manufacturing. It has been collected to provide a good test-bed for developing, validating and testing techniques and methodologies for the recognition of assembly actions. To the best of the authors' knowledge, few vision-based datasets exist in the context of object assembly.The HA4M dataset provides a considerable variety of multi-modal data compared to existing datasets. Six types of simultaneous data are supplied: RGB frames, Depth maps, IR frames, RGB-Depth-Aligned frames, Point Clouds and Skeleton data.These data allow the scientific community to make consistent comparisons among processing approaches or machine learning approaches by using one or more data modalities. Researchers in computer vision, pattern recognition and machine learning can use/reuse the data for different investigations in different application domains such as motion analysis, human-robot cooperation, action recognition, and so on.Dataset detailsThe dataset includes 12 assembly actions performed by 41 subjects for building an Epicyclic Gear Train (EGT).The assembly task involves three phases first, the assembly of Block 1 and Block 2 separately, and then the final setting up of both Blocks to build the EGT. The EGT is made up of a total of 12 components divided into two sets: the first eight components for building Block 1 and the remaining four components for Block 2. Finally, two screws are fixed with an Allen Key to assemble the two blocks and thus obtain the EGT.Acquisition setupThe acquisition experiment took place in two laboratories (one in Italy and one in Spain), where an acquisition area was reserved for the experimental setup. A Microsoft Azure Kinect camera acquires videos during the execution of the assembly task. It is placed in front of the operator and the table where the components are spread over. The camera is place on a tripod at an height h of 1.54 m and a distance of 1.78m. The camera is down-tilted by an angle of 17 degrees.Technical informationThe HA4M dataset contains 217 videos of the assembly task performed by 41 subjects (15 females and 26 males). Their ages ranged from 23 to 60. All the subjects participated voluntarily and were provided with a written description of the experiment. Each subject was asked to execute the task several times and to perform the actions at their own convenience (e.g. with both hands), independently from their dominant hand. The HA4M project is a growing project. So new acquisitions, planned in the next future, will expand the current dataset.ActionsTwelve actions are considered in HA4M. Actions from 1 to 4 are needed to build Block 1, then actions from 5 to 8 for building Block 2 and finally, the actions from 9 to 12 for completing the EGT. Actions are listed below:Pick up/Place CarrierPick up/Place Gear Bearings (x3)Pick up/Place Planet Gears (x3)Pick up/Place Carrier ShaftPick up/Place Sun ShaftPick up/Place Sun GearPick up/Place Sun Gear BearingPick up/Place Ring BearPick up Block 2 and place it on Block 1Pick up/Place CoverPick up/Place Screws (x2)Pick up/Place Allen Key, Turn Screws, Return Allen Key and EGTAnnotationData annotation concerns the labeling of the different actions in the video sequences.The annotation of the actions has been manually done by observing the RGB videos, frame by frame. The start frame of each action is identified as the subject starts to move the arm to the component to be grasped. The end frame, instead, is recorded when the subject releases the component, so the next frame becomes the start frame of the subsequent action.The total number of actions annotated in this study is 4123, including the “don't care” action (ID=0) and the action repetitions in the case of actions 2, 3 and 11.Available codeThe dataset has been acquired using the Multiple Azure Kinect GUI software, available at https://gitlab.com/roberto.marani/multiple-azure-kinect-gui, based on the Azure Kinect Sensor SDK v1.4.1 and Azure Kinect Body Tracking SDK v1.1.2.The software records device data to a Matroska (.mkv) file, containing video tracks, IMU samples, and device calibration. In this work, IMU samples are not considered.The same Multiple Azure Kinect GUI software processes the Matroska file and returns the different types of data provided with our dataset: RGB images, RGB-depth-Aligned (RGB-A) images, Depth images, IR images, Point Cloud and Skeleton data.
{'Overview': 'HA4M数据集汇集了与制造场景中不同主体执行的动作相关的多模态数据。该数据集的收集旨在为开发、验证和测试组装动作识别的技术和方法提供良好的测试平台。据作者所知,在物体组装的背景下,基于视觉的视觉数据集寥寥无几。与现有数据集相比,HA4M数据集提供了丰富多样的多模态数据。该数据集提供了六种同步数据类型:RGB帧、深度图、红外帧、RGB-深度对齐帧、点云数据和骨骼数据。这些数据使得科学界能够通过使用一种或多种数据模态,在不同处理方法或机器学习方法之间进行一致的比较。计算机视觉、模式识别和机器学习的研究人员可以使用/重复使用这些数据,在运动分析、人机协作、动作识别等不同的应用领域进行不同的研究。', 'Dataset details': '该数据集包括41个受试者(其中女性15名,男性26名)完成的12种组装动作,用于构建行星齿轮传动装置(EGT)。组装任务分为三个阶段:首先分别组装块1和块2,然后最终组装这两个块以构建EGT。EGT由12个组件组成,分为两组:前八个组件用于构建块1,剩余四个组件用于块2。最后,使用艾伦钥匙固定两个螺丝以组装这两个块,从而获得EGT。', 'Acquisition setup': '采集实验在两个实验室(一个位于意大利,另一个位于西班牙)进行,其中为实验设置预留了采集区域。Microsoft Azure Kinect相机在执行组装任务时采集视频。相机放置在操作员和放置组件的桌子的前方,高度为1.54米,距离为1.78米。相机以17度的角度向下倾斜。', 'Technical information': 'HA4M数据集包含41个受试者(15名女性和26名男性)完成的217个组装任务视频。他们的年龄在23岁到60岁之间。所有受试者均自愿参加,并提供了实验的书面描述。每个受试者都被要求执行任务多次,并在自己的方便时(例如,使用双手)执行动作,不受其优势手的影响。HA4M项目是一个不断发展的项目,因此计划在未来进行的新采集将扩大当前的数据集。', 'Actions': 'HA4M中考虑了12种动作。动作1至4用于构建块1,然后动作5至8用于构建块2,最后动作9至12用于完成EGT。以下是动作列表:
拾取/放置载体
拾取/放置齿轮轴承(x3)
拾取/放置行星齿轮(x3)
拾取/放置载体轴
拾取/放置太阳轴
拾取/放置太阳齿轮
拾取/放置太阳齿轮轴承
拾取/放置环形轴承
拾取块2并将其放置在块1上
拾取/放置盖子
拾取/放置螺丝(x2)
拾取/放置艾伦钥匙,拧螺丝,归还艾伦钥匙和EGT', 'Annotation': '数据标注涉及视频序列中不同动作的标记。动作的标注是通过逐帧观察RGB视频手动完成的。每个动作的起始帧被识别为受试者开始移动手臂到要抓取的组件时。相反,结束帧是在受试者释放组件时记录的,因此下一帧成为后续动作的起始帧。本研究中标注的动作总数为4123,包括“无关紧要”的动作(ID=0)和动作2、3和11中的动作重复。', 'Available code': '数据集是使用Multiple Azure Kinect GUI软件采集的,该软件可在https://gitlab.com/roberto.marani/multiple-azure-kinect-gui找到,基于Azure Kinect Sensor SDK v1.4.1和Azure Kinect Body Tracking SDK v1.1.2。该软件将设备数据记录到Matroska (.mkv)文件中,包含视频轨迹、IMU样本和设备校准。在本工作中,未考虑IMU样本。相同的Multiple Azure Kinect GUI软件处理Matroska文件,并返回我们的数据集提供的数据类型:RGB图像、RGB-深度对齐(RGB-A)图像、深度图像、红外图像、点云数据和骨骼数据。'}
提供机构:
www.doi.org



