five

RICH

收藏
OpenXLab2026-04-18 收录
下载链接:
https://openxlab.org.cn/datasets/OpenDataLab/RICH
下载链接
链接失效反馈
官方服务:
资源简介:
Inferring human-scene contact (HSC) is the first step toward understanding how humans interact with their surroundings. While detecting 2D human-object interaction (HOI) and reconstructing 3D human pose and shape (HPS) have enjoyed significant progress, reasoning about 3D human-scene contact from a single image is still challenging. Existing HSC detection methods consider only a few types of predefined contact, often reduce body and scene to a small number of primitives, and even overlook image evidence. To predict human-scene contact from a single image, we address the limitations above from both data and algorithmic perspectives. We capture a new dataset called RICH for “Real scenes, Interaction, Contact and Humans.” RICH contains multiview outdoor/indoor video sequences at 4K resolution, ground-truth 3D human bodies captured using markerless motion capture, 3D body scans, and high resolution 3D scene scans. A key feature of RICH is that it also contains accurate vertex-level contact labels on the body. Using RICH, we train a network that predicts dense body-scene contacts from a single RGB image. Our key insight is that regions in contact are always occluded so the network needs the ability to explore the whole image for evidence. We use a transformer to learn such non-local relationships and propose a new Body-Scene contact TRansfOrmer (BSTRO). Very few methods explore 3D contact; those that do focus on the feet only, detect foot contact as a post-processing step, or infer contact from body pose without looking at the scene. To our knowledge, BSTRO is the first method to directly estimate 3D body-scene contact from a single image. We demonstrate that BSTRO significantly outperforms the prior art.

推断人体-场景接触(human-scene contact, HSC)是理解人类与周遭环境互动方式的首要步骤。尽管二维人体-物体交互(human-object interaction, HOI)检测以及三维人体姿态与形状(human pose and shape, HPS)重建已取得显著进展,但从单张图像推理三维人体-场景接触仍极具挑战。现有HSC检测方法仅考虑少量预定义接触类型,常将人体与场景简化为少量图元,甚至忽略图像层面的证据。为实现从单张图像预测人体-场景接触,我们从数据与算法两个维度着手解决上述局限。我们采集了一款名为RICH的新数据集,全称为「真实场景、交互、接触与人类(Real scenes, Interaction, Contact and Humans)」。RICH包含4K分辨率的多视角室外/室内视频序列、采用无标记运动捕捉技术获取的真实三维人体数据、三维人体扫描模型,以及高分辨率三维场景扫描数据。RICH的一项核心特性是,其还包含人体表面精确的顶点级接触标注。依托RICH数据集,我们训练了一个可从单张RGB图像预测稠密人体-场景接触的神经网络。我们的核心洞察在于:接触区域往往处于被遮挡状态,因此网络需要具备探索整张图像以获取有效证据的能力。我们采用Transformer学习这类非局部关联,并提出一种全新的人体-场景接触Transformer(Body-Scene contact TRansfOrmer, BSTRO)。当前极少有方法探索三维接触相关研究,已有的相关方法要么仅聚焦于足部接触,要么将足部接触检测作为后处理步骤,或是仅基于人体姿态而非场景信息推理接触情况。据我们所知,BSTRO是首个直接从单张图像估算三维人体-场景接触的方法。我们的实验证明,BSTRO的性能显著优于现有前沿技术。
提供机构:
OpenDataLab
创建时间:
2024-05-14
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作