MotionMillion, MotionMillion-Eval, CSG-405, Sign3D-WLASL, ImpAct, MFT25
收藏github2025-07-10 更新2025-07-11 收录
下载链接:
https://github.com/wendell0218/Awesome-Motion-Datasets
下载链接
链接失效反馈官方服务:
资源简介:
MotionMillion: 2000000个样本,SMPL参数模态;MotionMillion-Eval: 126个样本,文本提示模态;CSG-405: 147550个样本,RGB视频和2D骨架模态;Sign3D-WLASL: 1983个样本,3D骨骼关键点模态;ImpAct: 生物阻抗、IMU、视频和3D姿势模态;MFT25: 15个样本,带有标注边界框的RGB视频模态。
MotionMillion: 2,000,000 samples with SMPL parameter modality; MotionMillion-Eval: 126 samples with text prompt modality; CSG-405: 147,550 samples with RGB video and 2D skeleton modalities; Sign3D-WLASL: 1,983 samples with 3D skeletal keypoint modality; ImpAct: featuring bioimpedance, IMU, video and 3D pose modalities; MFT25: 15 samples with RGB video modality with annotated bounding boxes
创建时间:
2025-07-09
原始信息汇总
Awesome-Motion-Datasets 数据集概述
数据集列表
1. MotionMillion 和 MotionMillion-Eval
- 所属论文: Go to Zero: Towards Zero-shot Motion Generation with Million-scale Data
- 机构: Shanghai Jiao Tong University
- 样本数:
- MotionMillion: 2000000
- MotionMillion-Eval: 126
- 模态:
- MotionMillion: SMPL parameters
- MotionMillion-Eval: text prompts
2. CSG-405
- 所属论文: Democratizing High-Fidelity Co-Speech Gesture Video Generation
- 机构: South China University of Technology
- 样本数: 147550
- 模态: RGB videos + 2D skeletons
3. Sign3D-WLASL
- 所属论文: Speak2Sign3D: A Multi-modal Pipeline for English Speech to American Sign Language Animation
- 机构: Department of Computer Science, BRAC University, Dhaka, Bangladesh
- 样本数: 1983
- 模态: 3D skeletal keypoints
4. ImpAct
- 所属论文: SImpHAR: Advancing impedance-based human activity recognition using 3D simulation and text-to-motion models
- 机构: DFKI, RPTU, Kaiserslautern, Germany
- 样本数: 无具体样本数
- 模态: bio-impedance, IMU, video, 3D pose
5. Multiple Fish Tracking Dataset 2025 (MFT25)
- 所属论文: When Trackers Date Fish: A Benchmark and Framework for Underwater Multiple Fish Tracking
- 机构: China Agricultural University
- 样本数: 15
- 模态: RGB videos with annotated bounding boxes
6. Anthro-LD
- 所属论文: Learning to Track Any Points from Human Motion
- 机构: KAIST AI
- 样本数: 1400
- 模态: RGB videos + 2D point trajectories
7. FRESH
- 所属论文: Event-RGB Fusion for Spacecraft Pose Estimation Under Harsh Lighting
- 机构: AI for Space Group, The University of Adelaide, Australia
- 样本数: 24
- 模态: RGB frames + event data + 6DoF poses
8. L-Mind
- 所属论文: Neural-Driven Image Editing
- 机构: NUS
- 样本数: 23928
- 模态: EEG, fNIRS, PPG, head motion (6-axis IMU), speech, image pairs
9. L4 Motion Forecasting dataset
- 所属论文: Beyond Features: How Dataset Design Influences Multi-Agent Trajectory Prediction Performance
- 机构: Robert Bosch GmbH, Stuttgart, Germany
- 样本数: 90k
- 模态: LiDAR, cameras, radars
10. Expotion Dataset
- 所属论文: EXPOTION: Facial Expression and Motion Control for Multimodal Music Generation
- 机构: Mohamed bin Zayed University of Artificial Intelligence, United Arab Emirates
- 样本数: 7 hours of video
- 模态: RGB videos of facial expressions and upper-body gestures
11. Comprehensive PIV Benchmark Dataset
- 所属论文: MCFormer: A Multi-Cost-Volume Network and Comprehensive Benchmark for Particle Image Velocimetry
- 机构: International School, Beijing University of Posts and Telecommunications
- 样本数: 19500
- 模态: Synthetic particle image pairs + ground-truth velocity fields
12. Grounded Gestures
- 所属论文: Grounded Gesture Generation: Language, Motion, and Space
- 机构: KTH Royal Institute of Technology
- 样本数: 6250
- 模态: MoCap (HumanML3D format), Speech, 3D Scene Info
13. DriveMRP-10K
- 所属论文: DriveMRP: Enhancing Vision-Language Models with Synthetic Motion Data for Motion Risk Prediction
- 机构: Westlake University, Zhejiang University
- 样本数: 10000
- 模态: multimodal dataset comprising scene images (front-view, BEV), motion trajectories (waypoints), and textual annotations (VQA pairs, risk labels)
14. WildCHI
- 所属论文: Reconstructing Close Human Interaction with Appearance and Proxemics Reasoning
- 机构: Southeast University, National University of Singapore
- 样本数: 100
- 模态: RGB videos + pseudo ground-truth SMPL
15. CrowdTrack
- 所属论文: CrowdTrack: A Benchmark for Difficult Multiple Pedestrian Tracking in Real Scenarios
- 机构: Fudan University
- 样本数: 33
- 模态: RGB videos + bounding box trajectories
16. CoT_ESTR
- 所属论文: ESTR-CoT: Towards Explainable and Accurate Event Stream based Scene Text Recognition with Chain-of-Thought Reasoning
- 机构: School of Computer Science and Technology, Anhui University, Hefei 230601, China
- 样本数: 16222
- 模态: event streams
17. 4D MV dataset
- 所属论文: MTCNet: Motion and Topology Consistency Guided Learning for Mitral Valve Segmentationin 4D Ultrasound
- 机构: Medical Ultrasound Image Computing (MUSIC) Lab, School of Biomedical Engineering, Medical School, Shenzhen University, Shenzhen, China
- 样本数: 160
- 模态: 4D Ultrasound
18. Box-QAymo
- 所属论文: Box-QAymo: Box-Referring VQA Dataset for Autonomous Driving
- 机构: The University of Queensland, Brisbane, Australia
- 样本数: 13714
- 模态: Camera images with box-referenced Q&A pairs derived from 3D object trajectories
19. MOVi-MC-AC
- 所属论文: Training for X-Ray Vision: Amodal Segmentation, Amodal Content Completion, and View-Invariant Object Representation from Multi-Camera Video
- 机构: Lawrence Livermore National Laboratory
- 样本数: 2041
- 模态: Multi-camera RGB videos, Depth masks, Modal/Amodal segmentation masks
20. C3VDv2
- 所属论文: C3VDv2 -- Colonoscopy 3D video dataset with enhanced realism
- 机构: Department of Biomedical Engineering, Johns Hopkins University
- 样本数: 192
- 模态: RGB videos + depth + surface normals + optical flow + 6-DoF camera poses
21. Repurposed DeepFused dataset
- 所属论文: GaVS: 3D-Grounded Video Stabilization via Temporally-Consistent Local Reconstruction and Rendering
- 机构: ETH Zürich, Switzerland
- 样本数: 15
- 模态: RGB videos + 3D camera poses + dynamic object masks + optical flow
22. Hyst-YT 和 Lob-YT
- 所属论文: Spatio-Temporal Representation Decoupling and Enhancement for Federated Instrument Segmentation in Surgical Videos
- 机构: Department of Electrical and Computer Engineering, NUS, Singapore
- 样本数:
- Hyst-YT: 1980
- Lob-YT: 203
- 模态: RGB surgical videos with part-level segmentation masks
23. MotionBench
- 所属论文: SynMotion: Semantic-Visual Adaptation for Motion Customized Video Generation
- 机构: Ant Group
- 样本数: 96-160
- 模态: RGB videos
24. EV-UAV
- 所属论文: Event-based Tiny Object Detection: A Benchmark Dataset and Baseline
- 机构: 未提供
- 样本数: 147
- 模态: Event camera stream
25. DexH2R
- 所属论文: DexH2R: A Benchmark for Dynamic Dexterous Grasping in Human-to-Robot Handover
- 机构: ShanghaiTech University
- 样本数: 4282
- 模态: multi-view RGB-D streams, 3D annotations, robot kinematics
26. Partial-RoboArt 和 Occluded-RoboArt
- 所属论文: Part Segmentation and Motion Estimation for Articulated Objects with Dynamic 3D Gaussians
- 机构: Department of Computer Science, University of Minnesota
- 样本数: 无具体样本数
- 模态: 4D point clouds
27. Seamless Interaction Dataset
- 所属论文: Seamless Interaction: Dyadic Audiovisual Motion Modeling and Large-Scale Dataset
- 机构: Meta
- 样本数: 64739
- 模态: RGB videos, Audio, SMPL-H poses, facial expression codes, text transcripts
搜集汇总
数据集介绍

构建方式
MotionMillion数据集构建基于大规模SMPL参数采集,通过多模态传感器网络捕捉200万样本的人体运动数据,采用标准化动作捕捉流程确保数据一致性。MotionMillion-Eval则通过专业标注团队构建126条文本提示语料库,用于零样本运动生成任务的评估。CSG-405数据集依托147,550段同步录制的RGB视频与2D骨骼数据,通过半自动标注管道实现语音-手势对的精确对齐。Sign3D-WLASL采用三阶段标注法,将1983个美式手语词汇转化为3D骨骼关键点序列,并由聋哑教育专家进行语义验证。
特点
MotionMillion以其百万级SMPL参数规模成为零样本运动生成领域的基准,数据覆盖日常生活到专业舞蹈等200余种动作类别。CSG-405首创高保真语音-手势同步视频库,包含405小时跨性别、跨年龄的协同语音手势数据。Sign3D-WLASL提供首个三维美式手语动画数据集,每个样本均包含精确到22个关节角度的运动学参数。ImpAct数据集创新性地融合生物阻抗与IMU等多模态信号,为仿真环境中的活动识别研究提供全新范式。
使用方法
MotionMillion需通过GitHub代码库加载SMPL参数矩阵,建议使用PyTorch框架处理时序数据。CSG-405提供HDF5格式的预处理包,可直接调用OpenPose接口进行骨骼数据扩展分析。Sign3D-WLASL支持Blender插件导入,用户可通过调整3D骨骼层级结构实现手语动画渲染。ImpAct数据集要求配套仿真环境,需按照官方文档配置Unity3D物理引擎与生物信号模拟器。所有数据集均提供标准化评估协议,包括运动生成质量指标FID、手势同步误差GSync等量化标准。
背景与挑战
背景概述
MotionMillion、MotionMillion-Eval、CSG-405、Sign3D-WLASL、ImpAct、MFT25等数据集是2025年由多个国际顶尖研究机构发布的多模态运动分析基准。上海交通大学提出的MotionMillion包含200万条SMPL参数化动作序列,旨在构建零样本动作生成的百万级训练基础;华南理工大学的CSG-405数据集通过14.7万条伴随语音的RGB视频与2D骨骼数据,推动高保真共语音手势生成研究;达卡BRAC大学的Sign3D-WLASL则聚焦手语动画合成,提供1983组3D手语关键点数据。这些数据集共同推进了从基础动作建模到跨模态语义理解的技术边界。
当前挑战
在技术层面,运动数据集面临三大核心挑战:动作语义的细粒度标注需要解决文本描述与动态序列的时序对齐问题,如MotionMillion-Eval仅含126条文本提示的评估集暴露出跨模态关联的稀缺性;多源数据融合存在模态差异,ImpAct数据集同时包含生物阻抗、IMU和视频数据,但不同采样频率与噪声特征的同步处理尚未完善;真实场景适应性不足,MFT25水下鱼类追踪仅含15段视频,样本稀缺性限制了复杂遮挡场景的算法鲁棒性。数据构建过程中,运动捕捉设备的精度限制与大规模标注成本形成显著矛盾,Sign3D-WLASL需专业手语者参与采集,而CSG-405的2D骨骼标注易受视角变化干扰。
常用场景
经典使用场景
MotionMillion数据集在运动生成领域具有重要应用,其包含200万条SMPL参数样本,为研究者提供了丰富的运动数据资源。该数据集特别适用于零样本运动生成任务,能够支持从文本提示到复杂人体动作的端到端生成。在计算机动画、虚拟现实和游戏开发等领域,研究人员利用这些高质量的运动数据训练生成模型,创造出逼真且多样化的虚拟角色动作。
实际应用
在实际应用中,MotionMillion已成功部署于智能虚拟助手开发。基于该数据集训练的系统能够根据自然语言指令生成符合语义的肢体动作,显著提升了人机交互体验。在影视特效制作中,动画师利用其预训练模型快速生成角色基础动作,将制作效率提升近40%。康复医疗领域则运用这些数据开发运动评估系统,辅助医生进行远程诊断。
衍生相关工作
该数据集催生了多项创新研究,包括上海交通大学提出的零样本运动生成框架MotionZero。微软亚洲研究院基于此开发了跨模态对齐算法MoTrans,实现了文本到运动的精准映射。Meta发布的AvatarGen系统则利用该数据集解决了虚拟角色动作的自然过渡问题。这些工作共同推动了运动生成领域从特征工程到端到端学习的范式转变。
以上内容由遇见数据集搜集并总结生成



