PoseScript

Name: PoseScript
Creator: OpenDataLab
Published: 2026-05-17 06:30:33
License: 暂无描述

OpenDataLab2026-05-17 更新2024-05-09 收录

下载链接：

https://opendatalab.org.cn/OpenDataLab/PoseScript

下载链接

链接失效反馈

官方服务：

资源简介：

自然语言在许多计算机视觉任务中被利用，例如图像字幕，跨模式检索或视觉问答，以提供细粒度的语义信息。虽然人类姿势是人类理解的关键，但当前的3D人类姿势数据集缺乏详细的语言描述。在这项工作中，我们介绍了PoseScript数据集，该数据集将来自AMASS的数千个3D人体姿势与对身体部位及其空间关系的丰富的人类注释描述配对。为了将此数据集的大小增加到与典型的数据饥饿学习算法兼容的规模，我们提出了一个精心设计的字幕过程，该过程从给定的3D关键点以自然语言生成自动合成描述。此过程使用3D关键点上的一组简单但通用的规则来提取低级姿势信息 (posecdes)。然后使用句法规则将posecdes组合为更高级别的文本描述。自动注释大大增加了可用数据量，并可以有效地对深层模型进行预训练以对人类字幕进行微调。为了演示带注释的姿势的潜力，我们展示了PoseScript数据集在从大规模数据集中检索相关姿势以及基于文本姿势描述的合成姿势生成中的应用。

Natural language is utilized in numerous computer vision tasks, such as image captioning, cross-modal retrieval, and visual question answering, to provide fine-grained semantic information. While human pose is crucial for human understanding, existing 3D human pose datasets lack detailed linguistic descriptions. In this work, we introduce the PoseScript dataset, which pairs thousands of 3D human poses from AMASS with rich human-annotated descriptions of body parts and their spatial relationships. To scale this dataset to a size compatible with typical data-hungry learning algorithms, we propose a carefully designed captioning pipeline that generates automatically synthesized descriptions in natural language from given 3D keypoints. This pipeline employs a set of simple yet general rules over 3D keypoints to extract low-level pose information (posecdes). These posecdes are then combined into higher-level textual descriptions using syntactic rules. Automatic annotations greatly increase the volume of available data, and can effectively pre-train deep models for subsequent fine-tuning on human-curated captions. To demonstrate the potential of annotated poses, we showcase the applications of the PoseScript dataset in retrieving relevant poses from large-scale datasets and generating synthesized poses based on textual pose descriptions.

提供机构：

OpenDataLab

创建时间：

2022-11-17

搜集汇总

数据集介绍