CATER-GENs

Name: CATER-GENs
Creator: OpenDataLab
License: 暂无描述

OpenXLab2026-04-18 收录

下载链接：

https://openxlab.org.cn/datasets/OpenDataLab/CATER-GENs

下载链接

链接失效反馈

官方服务：

资源简介：

CATER-GEN-v1是一个更简单的版本，它由两个对象 (锥形和告密者) 和一个大的 “桌子” 平面继承自CATER。存在四个原子动作: “旋转”，“包含”，“选择位置” 和 “滑动”。每个视频随机包含一个或两个动作。在生成描述时，我们设计一个预定义的句子模板来填充主题、动作和可选对象。还为动作 “拾取位置” 和 “滑动” 提供了最终位置。通过使用精确的坐标或象限区域指定最终位置，分别为确定性和多样化视频生成提供了明确的描述和模棱两可的描述。 CATER-GEN-v2是一个更复杂的数据集，每个视频中包含3〜8个对象。每个对象都有4个属性，这些属性是从五种形状，三种尺寸，九种颜色和两种材质中随机选择的。原子作用与CATER-GEN-v1相同。为了在文本描述中产生歧义，我们不仅替换了最终坐标，而且随机丢弃了每个对象的属性，因此由于引用表达式的不确定性，该对象可能不是唯一的。

CATER-GEN-v1 is a simplified variant inherited from CATER, which includes two object types (cone and cylinder) and a large "table" plane. There are four atomic actions: "rotate", "contain", "select position" and "slide". Each video randomly contains one or two actions. When generating descriptions, we design predefined sentence templates to fill in subjects, actions and optional objects. Final positions are also provided for the actions "pick up position" and "slide". By specifying the final positions using precise coordinates or quadrant regions, explicit descriptions and ambiguous descriptions are provided for deterministic and diverse video generation, respectively. CATER-GEN-v2 is a more complex dataset, where each video contains 3 to 8 objects. Each object has four attributes randomly selected from five shapes, three sizes, nine colors and two materials. The atomic actions are identical to those of CATER-GEN-v1. To create ambiguity in text descriptions, we not only replace the final coordinates but also randomly discard the attributes of each object, thus making the object non-unique due to the uncertainty of referring expressions.

提供机构：

OpenDataLab

创建时间：

2023-02-13

5,000+

优质数据集

54 个

任务类型

进入经典数据集