Egocentric Video Dataset — 4,050 Hours of First-Person Videos for Physical AI, Robotics & ...

Name: Egocentric Video Dataset — 4,050 Hours of First-Person Videos for Physical AI, Robotics & ...
Creator: Unidata
License: 暂无描述

Databricks2026-05-16 收录

下载链接：

https://marketplace.databricks.com/details/ef2acb9d-e296-4309-8332-99963a375785/Unidata_Egocentric-Video-Dataset-—-4,050-Hours-of-First-Person-Videos-for-Physical-AI,-Robotics-&-

下载链接

链接失效反馈

官方服务：

资源简介：

Overview The Egocentric Video Dataset is a large-scale, multimodal collection produced by Unidata, containing 4,050 hours of first-person egocentric videos of daily household activities recorded in home environments. The dataset is purpose-built for Physical AI development, robotic systems training, manipulation tasks research, egocentric vision benchmarking, and human motion analysis. Unlike existing egocentric datasets that rely on synthetic data or laboratory setups, this collection captures genuine human motion variability in naturalistic home settings — covering kitchens, bathrooms, living rooms, and other everyday spaces. Varied recording speeds, hardware configurations, and hand visibility conditions make it a practical foundation for building models that generalize to real-world deployment. Hardware Configuration The dataset was captured using two complementary hardware setups: - Pico + Motion Trackers — 2,321 hours (57.3% of total footage): egocentric video from the Pico 4 Ultra VR headset combined with body motion tracker data, covering natural-speed domestic scenarios, slow-motion recordings with both hands always in frame, and real-speed object transfer tasks. - Zed + Pico + Motion Trackers — 1,729 hours (42.7% of total footage): scripted object transfer scenarios captured simultaneously via stereo Zed cameras and Pico VR headset, providing both spatial depth and egocentric first-person perspective in a single synchronized recording. Sensor & Orientation Data Quaternion-based orientation is derived from onboard sensor fusion across all recordings, providing 3D pose estimations without the need for external optical tracking. IMU signals are captured alongside video, enabling researchers to analyze angular velocity, linear acceleration, and motion trajectories in parallel with visual data. This multimodal structure supports egocentric tracking, hand-object interactions analysis, and 3D reconstructions from first-person perspectives. Scripted Activity Scenarios The dataset covers 13 structured daily activity scenarios. Each is designed for cyclical repetition, enabling high-density data collection per session and consistent action boundaries for annotation. - Sorting unsorted items to designated locations (800 hours) - Arranging new products by category on a display shelf (800 hours) - Collecting items from a table into a container (400 hours) - Transferring items from a drawer to a table (400 hours) - Wardrobe, table, and bag — three-point object transfer (400 hours) - Transport box and display table (400 hours) - Folding fabric items into a stack (400 hours). - Lids, cookware, and kitchen drawers (200 hours) - Transferring bulk goods with a spoon (50 hours) - Transferring objects with tongs (50 hours) - Two-handed sorting (50 hours) - Assembly and disassembly of small-part constructions (50 hours) Technical Specifications - Total footage: 4,050 hours - Number of scenarios: 13 - Video sources: Pico 4 Ultra VR headset (egocentric); Zed stereo cameras (spatial depth) - Hardware setups: Pico + Trackers (2,321 hours); Zed + Pico + Trackers (1,729 hours) - Recording speeds: real-time and slow-motion - Hand visibility: as needed / both hands always in frame (slow-motion subset) - Orientation format: quaternions from onboard sensor fusion - Sensor data: IMU signals (accelerometer, gyroscope, magnetometer) - Annotation format: .txt per recording - Environments: kitchen, bathroom, living room, and other home settings Use Cases - Physical AI & Robotics. First-person video paired with quaternion orientation and IMU signals gives robotic systems the spatial and temporal context needed for learning manipulation tasks. The slow-motion recordings expose detailed hand kinematics unavailable in standard-speed footage, making the dataset directly applicable to training robotic arms for pick-and-place and object transfer operations. Thirteen scripted scenarios — from spoon transfers to assembly tasks — cover the range of daily household actions that real-world robotic systems must recognize and replicate. - Egocentric Vision & Action Recognition. Researchers building egocentric action recognition and first-person video understanding models benefit from recordings across different speeds, hand visibility conditions, and hardware setups — a level of intra-dataset variability that most existing egocentric datasets do not offer. The combination of hand-object interactions, multimodal data, and 3D pose estimations supports more accurate activity recognition across dynamic scenes in home environments. - Healthcare & Motor Rehabilitation. Detailed hand motion data across scripted and naturalistic scenarios is directly applicable to occupational therapy and motor rehabilitation research. The slow-motion recordings make fine-grained movement patterns visible in ways that real-time recording misses. Pose estimations and egocentric tracking of hand-object interactions provide quantifiable visual data for assessing motor performance and patient progress. - Augmented & Virtual Reality Development. VR and AR developers training hand tracking and gesture recognition algorithms can use this egocentric data to build models grounded in real home conditions. IMU signals and quaternion orientation from Pico VR headsets directly reflect how hands move during daily activity, supporting the first-person perspectives and motion fidelity that immersive virtual applications require. Compliance & Data Storage All recordings were collected under informed consent from participants. The dataset complies with GDPR and applicable data protection regulations. Data is stored on AWS cloud infrastructure certified to ISO 27001 and ISO 27701 standards. Summary The Egocentric Video Dataset is a 4,050-hour multimodal collection of first-person egocentric videos and synchronized sensor data, produced by Unidata for Physical AI, robotics, egocentric vision, and human motion research. Thirteen activity scenarios — covering sorting, transferring, folding, assembly, two-handed manipulation, and tool use — combined with quaternion orientation, IMU signals, and dual hardware coverage (Pico + Zed), make this dataset a comprehensive resource for teams building the next generation of Physical AI and robotic systems.

提供机构：

Unidata

5,000+

优质数据集

54 个

任务类型

进入经典数据集