EGOFALLS: A visual-audio dataset and benchmark for fall detection using egocentric cameras

Mendeley Data2024-03-27 更新2024-06-29 收录

下载链接：

https://dataverse.nl/citation?persistentId=doi:10.34894/HO5GE3

下载链接

链接失效反馈

官方服务：

资源简介：

We've provided a readme.pdf to explain how to use the dataset. Here, we reiterate some of that information to assist others in utilizing the dataset. Please be aware that the files and the dataset are large (approx. 200GB.). It is advised to make sure there is ample storage space for downloading and unzipping. Please download one file at a time. Dataset: Our data collection method involved cameras, subjects, environments, and guidelines for data simulation to elucidate the specifics of our process. Notably, our dataset, comprising 10,948 clips, stands out as the largest when compared to others that focus on falls recorded through egocentric cameras. Equipment: Data was amassed using two wearable camera types: the OnReal G1 and CAMMHD Bodycams. The OnReal G1 is a compact mini action camera, with dimensions of 420 x 420 x 200 mm, and can capture videos in resolutions as high as 1080P at 30 fps. Conversely, the CAMMHD Bodycam, a larger camera measuring 800 x 500 x 300 mm, is outfitted with infrared sensors suitable for night vision. These cameras were strategically affixed to the human body at places like the waist and neck, allowing the collection of extensive visual, motion, and audio information across varied environments. The standard setting for data capture was the 1080p video mode at 30 frames per second. It's worth noting that the OnReal G1 frames consist of distinct R, G, B channels, whereas CAMMHD Bodycam frames feature three identical grayscale channels. This dataset, therefore, is a pivotal resource for this thesis, facilitating a thorough analysis of different events and activities. Subject: For this study, we had 14 volunteer participants: 12 males and 2 females. This included 12 young, healthy individuals and 2 elderly subjects. All participants gave informed consent, understanding their data might be utilized for research and potentially be publicized. Most subjects (11 out of 14) finished the data collection encompassing four types of falls and nine types of non-falls, both indoor and outdoor. However, three participants couldn't complete the entire data collection due to personal reasons. This study yields significant insights into falls and non-fall behaviors, underscoring the dedication of the majority of our participants. Environment: Our aim was a comprehensive study of both indoor and outdoor environments. We captured data across 14 different outdoor settings and 15 unique indoor spaces. To introduce variability, participants were prompted to change their positions or directions post each activity. Such an approach ensures a diversified dataset, letting us derive more reliable conclusions and insights. Data Collection: Our data collection approach encompasses two main perspectives: visual and auditory. For visual data, we adhered to guidelines from existing literature; typical falls and related activities have a duration of 1-3 seconds. We proposed an exhaustive set of trials that cover 20 types of falls, each varying in direction and object interaction. Contrarily, specific guidelines for audio data are scarce, as past research largely centered on visual cues. Our audio dataset comprises three categories: subject audio, subject-object audio, and environment audio. To provide participants a realistic feel of falls, we showed them online videos of real-world fall incidents. These videos accurately render the auditory and visual elements of these events. Upon manual inspection of all clips, we discerned prevalent audio patterns. For falls, subject audio includes elements like yelling and moaning; subject-object audio encapsulates sounds of impacts, and environmental audio captures background noises like traffic or television. Importantly, not all clips contained every sound type. Non-fall activities were bifurcated into three groups based on their audio intensity. Our findings shed light on the audio patterns across activities, potentially enhancing subsequent research in this domain.

我们已提供readme.pdf文件以说明该数据集的使用方法。下文将重申其中部分内容，以协助其他使用者调用本数据集。请注意，本数据集及相关文件体量较大（约200GB），请确保拥有充足存储空间用于下载与解压操作，且建议单次仅下载单个文件。数据集：本数据集的数据采集方案涵盖相机设备、受试对象、实验环境以及数据仿真规范，以阐明完整采集流程细节。值得注意的是，本数据集共包含10948段视频片段，在所有基于第一人称视角相机（egocentric camera）录制的跌倒相关数据集当中，其规模处于领先地位。设备：本研究采用两款可穿戴相机完成数据采集，分别为OnReal G1与CAMMHD Bodycam。 OnReal G1为紧凑型迷你运动相机，尺寸为420×420×200毫米，最高可支持30帧每秒（fps）的1080P分辨率视频录制。与之相对的CAMMHD Bodycam体型更大，尺寸为800×500×300毫米，搭载适用于夜视场景的红外传感器。上述两款相机均被固定于人体腰部、颈部等合理位置，可在多样化环境中采集丰富的视觉、运动与音频信息。本次数据采集的标准参数为1080P视频模式，帧率30帧每秒（fps）。需特别说明的是，OnReal G1采集的画面包含独立的红（R）、绿（G）、蓝（B）三通道，而CAMMHD Bodycam的画面则由三个完全一致的灰度通道构成。因此，本数据集为本论文提供了关键的研究支撑，可助力对各类事件与行为开展全面深入的分析。受试对象：本研究共招募14名志愿受试者，其中12名男性、2名女性，包含12名年轻健康个体与2名老年受试者。所有受试者均已签署知情同意书，明确知晓其数据将被用于科研用途，且可能对外公开。其中11名受试者完成了全部数据采集流程，涵盖室内外场景下的4类跌倒行为与9类非跌倒行为。另有3名受试者因个人原因未能完成全部数据采集工作。本研究针对跌倒与非跌倒行为取得了具有价值的研究结论，也充分体现了多数受试者的配合与付出。实验环境：本研究旨在全面覆盖室内与室外场景，共在14种不同的户外环境与15种独特的室内空间中完成了数据采集。为提升数据集的多样性，每完成一项活动后，受试者均会被要求变更站位或朝向。该方案可确保数据集覆盖更多元的场景，助力我们得出更具可靠性的研究结论与见解。数据采集方案：本研究的数据采集主要包含两大维度——视觉维度与听觉维度。针对视觉数据，我们参考了现有研究的规范：典型跌倒及相关行为的持续时长通常为1~3秒。我们设计了覆盖20类跌倒场景的完整试验方案，每类跌倒在动作方向与交互对象上均存在差异。与之相对，目前针对音频数据的明确规范较为匮乏，过往研究大多仅聚焦于视觉线索。本研究的音频数据集共分为三类：受试者自身音频、受试者与物体交互音频以及环境背景音频。为让受试者更直观地感知跌倒场景，我们向其展示了真实跌倒事件的网络视频，此类视频可精准还原事件的听觉与视觉细节。通过对所有视频片段的人工审核，我们总结出了主流的音频模式：针对跌倒场景，受试者自身音频包含呼喊、呻吟等元素；受试者与物体交互音频涵盖碰撞声响；环境背景音频则包含交通、电视杂音等背景噪声。需注意的是，并非所有视频片段都包含全部三类音频。非跌倒行为则依据音频强度被划分为三个组别。本研究的结论揭示了不同行为对应的音频模式特征，有望为该领域的后续研究提供参考价值。

创建时间：

2023-09-20

搜集汇总

数据集介绍

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集