1K+ Hours of Selfie Video Data | AI Training Data | Annotated Video for AI | Bounding Boxes, ...
收藏Databricks2026-01-27 收录
下载链接:
https://marketplace.databricks.com/details/3784465c-399c-4fe8-a4ff-dedbf2360b19/Data-Seeds_1K+-Hours-of-Selfie-Video-Data-AI-Training-Data-Annotated-Video-for-AI-Bounding-Boxes,-
下载链接
链接失效反馈官方服务:
资源简介:
This dataset contains over 1,000 hours of facial expression selfie video recordings captured worldwide. Designed for AI and machine-learning applications, it offers richly annotated, context-dense video data ideal for training vision-language models, action-recognition systems, and multimodal reasoning.
Key Features
1. Comprehensive Video Annotation Layers
Each video includes synchronized metadata across visual and audio channels, such as:
Object annotations (bounding boxes, segmentation masks)
Action labels and activity timelines
Temporal event boundaries
Transcripts for scenes containing speech
Visual scene descriptions covering environment, objects, actions, and context
Camera metadata (motion type, angle, field of view, lighting conditions)
This enables training for activity detection, video captioning, tracking, VLM grounding, and multimodal understanding.
2. Unique Sourcing Capabilities
Videos are collected through controlled contribution pipelines designed to generate authentic, unscripted real-world footage. This provides:
Natural human movement and behavior
Diverse environments and camera devices
Continuous flow of fresh recordings
Ability to generate custom datasets (e.g., specific actions, locations, lighting, demographics, or motion patterns)
3. Global Visual & Cultural Diversity
Contributors from 100+ countries supply:
Indoor and outdoor recordings
Urban, rural, and specialized environments
Varied cultural behaviors, activities, and settings
Multiple languages and speaking styles where speech is present
This ensures robust generalization for global deployment.
4. High-Quality, Realistic Video Capture
Data includes a wide range of visual conditions:
4K, HD, and consumer-grade recordings
Static, handheld, and moving cameras
Low-light, daylight, and variable lighting
Clean vs. noisy audio channels
Natural occlusions, motion blur, and complex backgrounds
This diversity supports training models for real-world reliability and robustness.
5. AI-Ready Dataset Architecture
Optimized for modern ML workflows, enabling:
Video classification and action recognition
Video captioning and summarization
Vision-language model (VLM) alignment
Multimodal reasoning and grounding
Safety, moderation, and risk detection
Tracking, segmentation, and object detection
Compatible with leading ML frameworks and training pipelines.
6. Licensing & Compliance
Fully compliant with global privacy standards
Explicit contributor consent for video usage
Documented rights and usage permissions
Vetted for commercial and research use
Use Cases
Training video classification and action-recognition models
Vision-language model pretraining
Multimodal AI for enterprise and consumer applications
Safety, moderation, and anomaly detection
Video captioning, retrieval, and summarization
Research in activity analysis, human behavior, and multimodal grounding
提供机构:
Data Seeds



