NTU RGB-D 公开数据集

Name: NTU RGB-D 公开数据集
Creator: 山西大学
License: 暂无描述

国家基础学科公共科学数据中心2026-01-30 收录

下载链接：

https://nbsdc.cn/general/dataDetail?id=67d5118a195d260905af9ff5&type=1

下载链接

链接失效反馈

官方服务：

资源简介：

NTU RGB-D 数据集由新加坡南洋理工大学（NTU）于2016年发布，专为人体动作识别研究设计，使用微软Kinect v2传感器采集多模态数据，涵盖3个摄像机视角和40名不同年龄段的参与者，旨在解决复杂场景下的动作分析与跨视角泛化问题。作为多模态动作识别领域的标杆数据集，其深度信息与3D骨骼数据突破了传统RGB视频的局限性，推动算法在医疗监护、人机交互等场景的应用，并为跨视角、跨对象的动作识别模型提供了标准化评测基准。该数据集在多智能体系统的应用中具有极大的潜力，尤其是在多重知识融合的超图表示方面。在本项目中，通过多模态数据的结合，可以建立更加丰富的模型来表示不同智能体之间的复杂关系。在多智能体系统中，不同来源的信息（如RGB图像、深度图、骨骼数据等）能够在超图的框架下融合，以捕捉各智能体之间的依赖关系和交互作用。这种超图表示可以有效地解决多智能体系统中数据的不确定性和异质性问题，增强动作识别模型的鲁棒性和泛化能力。数据集的采集过程包括参与者执行60类动作，每个动作由3台Kinect同步捕获RGB视频、深度图、红外序列及25个关节点的3D骨骼坐标。每个样本包含四类数据——RGB帧、深度图序列、骨骼关节时空坐标（包括x/y/z及姿态参数）和红外视频，数据以 .skeleton 文本格式及视频文件存储。原始版本包含56,880个样本，按跨对象和跨视角两种标准划分数据集，分别为40,320个训练集和16,560个测试集，及相机视角的相关数据。扩展版本（NTU RGB+D 120）增至114,480个样本，动作类别扩展至120种，参与者增至106人，支持更复杂行为的研究。

NTU RGB-D Dataset was released in 2016 by Nanyang Technological University (NTU), Singapore, and is specifically designed for human action recognition research. It employs Microsoft Kinect v2 sensors to acquire multimodal data, covering three camera viewpoints and 40 participants across different age groups, with the goal of addressing action analysis and cross-view generalization in complex scenarios. As a benchmark dataset in the domain of multimodal action recognition, its depth information and 3D skeleton data break through the limitations of traditional RGB videos, facilitating the application of algorithms in scenarios such as medical monitoring and human-computer interaction, and providing a standardized evaluation benchmark for cross-view and cross-subject action recognition models. This dataset exhibits significant potential for applications in multi-agent systems, particularly in hypergraph representation with multi-knowledge fusion. In this project, the integration of multimodal data enables the construction of a more comprehensive model to characterize the complex relationships between distinct AI agents. In multi-agent systems, information from diverse sources (e.g., RGB images, depth maps, skeleton data, etc.) can be fused within a hypergraph framework to capture the dependencies and interactions among individual agents. This hypergraph representation can effectively mitigate the issues of data uncertainty and heterogeneity in multi-agent systems, thereby enhancing the robustness and generalization capability of action recognition models. The data collection procedure involves participants performing 60 action categories, with each action synchronously captured by three Kinect devices, yielding RGB videos, depth maps, infrared sequences, and 3D skeleton coordinates of 25 joint points. Each sample encompasses four types of data: RGB frames, depth map sequences, spatiotemporal coordinates of skeleton joints (including x/y/z coordinates and pose parameters), and infrared videos. The data is stored in both .skeleton text format and video files. The original version of the dataset contains 56,880 samples, and is split according to two standard protocols: cross-subject and cross-view, consisting of 40,320 training samples and 16,560 test samples, alongside relevant camera viewpoint data. The extended version (NTU RGB+D 120) has expanded to 114,480 samples, with the number of action categories increased to 120 and the participant count raised to 106, enabling research on more complex behaviors.

提供机构：

山西大学

搜集汇总

数据集介绍

背景与挑战

背景概述

NTU RGB-D数据集是一个用于人体动作识别的多模态数据集，包含RGB视频、深度图、红外序列及3D骨骼坐标。原始版本有56,880个样本，扩展版本增至114,480个样本，支持跨视角和跨对象的动作识别研究。

以上内容由遇见数据集搜集并总结生成