docci

Name: docci
Creator: maas
Published: 2025-12-05 12:14:07
License: 暂无描述

魔搭社区2025-12-05 更新2025-04-26 收录

下载链接：

https://modelscope.cn/datasets/google/docci

下载链接

链接失效反馈

官方服务：

资源简介：

# Dataset Card for DOCCI ## Table of Contents - [Table of Contents](#table-of-contents) - [Dataset Description](#dataset-description) - [Dataset Summary](#dataset-summary) - [Supported Tasks](#supported-tasks) - [Languages](#languages) - [Dataset Structure](#dataset-structure) - [Data Instances](#data-instances) - [Data Fields](#data-fields) - [Data Splits](#data-splits) - [Dataset Creation](#dataset-creation) - [Curation Rationale](#curation-rationale) - [Source Data](#source-data) - [Annotations](#annotations) - [Personal and Sensitive Information](#personal-and-sensitive-information) - [Considerations for Using the Data](#considerations-for-using-the-data) - [Social Impact of Dataset](#social-impact-of-dataset) - [Discussion of Biases](#discussion-of-biases) - [Other Known Limitations](#other-known-limitations) - [Additional Information](#additional-information) - [Dataset Curators](#dataset-curators) - [Licensing Information](#licensing-information) - [Citation Information](#citation-information) - [Contributions](#contributions) ## Dataset Description - **Homepage:** https://google.github.io/docci - **Paper:** [arXiv](https://arxiv.org/pdf/2404.19753) - **Data Explorer:** [Check images and descriptions](https://google.github.io/docci/viz.html?c=&p=1) - **Point of Contact:** docci-dataset@google.com - **Report an Error:** [Google Forms](https://forms.gle/v8sUoXWHvuqrWyfe9) ### Dataset Summary DOCCI (Descriptions of Connected and Contrasting Images) is a collection of images paired with detailed descriptions. The descriptions explain the key elements of the images, as well as secondary information such as background, lighting, and settings. The images are specifically taken to help assess the precise visual properties of images. DOCCI also includes many related images that vary in having key differences from the others. All descriptions are manually annotated to ensure they adequately distinguish each image from its counterparts. ### Supported Tasks Text-to-Image and Image-to-Text generation ### Languages English ## Dataset Structure ### Data Instances ``` { 'image': <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=1536x2048>, 'example_id': 'qual_dev_00000', 'description': 'An indoor angled down medium close-up front view of a real sized stuffed dog with white and black colored fur wearing a blue hard hat with a light on it. A couple inches to the right of the dog is a real sized black and white penguin that is also wearing a blue hard hat with a light on it. The dog is sitting, and is facing slightly towards the right while looking to its right with its mouth slightly open, showing its pink tongue. The dog and penguin are placed on a gray and white carpet, and placed against a white drawer that has a large gray cushion on top of it. Behind the gray cushion is a transparent window showing green trees on the outside.' } ``` ### Data Fields Name | Explanation --- | --- `image` | PIL.JpegImagePlugin.JpegImageFile `example_id` | The unique ID of an example follows this format: `<SPLIT_NAME>_<EXAMPLE_NUMBER>`. `description` | Text description of the associated image. ### Data Splits Dataset | Train | Test | Qual Dev | Qual Test ---| ---: | ---: | ---: | ---: DOCCI | 9,647 | 5,000 | 100 | 100 DOCCI-AAR | 4,932 | 5,000 | -- | -- ## Dataset Creation ### Curation Rationale DOCCI is designed as an evaluation dataset for both text-to-image (T2I) and image-to-text (I2T) generation. Please see our paper for more details. ### Source Data #### Initial Data Collection All images were taken by one of the authors and their family. ### Annotations #### Annotation process All text descriptions were written by human annotators. We do not rely on any automated process in our data annotation pipeline. Please see Appendix A of [our paper](https://arxiv.org/pdf/2404.19753) for details about image curation. ### Personal and Sensitive Information We manually reviewed all images for personally identifiable information (PII), removing some images and blurring detected faces, phone numbers, and URLs to protect privacy. For text descriptions, we instructed annotators to exclude any PII, such as people's names, phone numbers, and URLs. After the annotation phase, we employed automatic tools to scan for PII, ensuring the descriptions remained free of such information. ## Considerations for Using the Data ### Social Impact of Dataset [More Information Needed] ### Discussion of Biases [More Information Needed] ### Other Known Limitations [More Information Needed] ### Licensing Information CC BY 4.0 ### Citation Information ``` @inproceedings{OnoeDocci2024, author = {Yasumasa Onoe and Sunayana Rane and Zachary Berger and Yonatan Bitton and Jaemin Cho and Roopal Garg and Alexander Ku and Zarana Parekh and Jordi Pont-Tuset and Garrett Tanzer and Su Wang and Jason Baldridge}, title = {{DOCCI: Descriptions of Connected and Contrasting Images}}, booktitle = {ECCV}, year = {2024} } ```

# DOCCI 数据集卡片 ## 目录 - [目录](#目录) - [数据集描述](#数据集描述) - [数据集概述](#数据集概述) - [支持任务](#支持任务) - [语言](#语言) - [数据集结构](#数据集结构) - [数据实例](#数据实例) - [数据字段](#数据字段) - [数据划分](#数据划分) - [数据集构建](#数据集构建) - [构建初衷](#构建初衷) - [源数据](#源数据) - [标注流程](#标注流程) - [个人与敏感信息](#个人与敏感信息) - [数据集使用注意事项](#数据集使用注意事项) - [数据集的社会影响](#数据集的社会影响) - [偏见讨论](#偏见讨论) - [其他已知局限性](#其他已知局限性) - [附加信息](#附加信息) - [数据集策展人](#数据集策展人) - [许可信息](#许可信息) - [引用信息](#引用信息) - [贡献说明](#贡献说明) ## 数据集描述 - **主页**：https://google.github.io/docci - **论文**：[arXiv](https://arxiv.org/pdf/2404.19753) - **数据浏览器**：[查看图像与描述](https://google.github.io/docci/viz.html?c=&p=1) - **联系人**：docci-dataset@google.com - **错误反馈**：[Google 表单](https://forms.gle/v8sUoXWHvuqrWyfe9) ### 数据集概述 DOCCI（Descriptions of Connected and Contrasting Images，关联与对比图像描述集）是一组配有详细描述文本的图像集合。其描述文本不仅阐释了图像的核心元素，还涵盖背景、光照与拍摄场景等辅助信息。该数据集的图像均为专门拍摄，用于精准评估图像的视觉属性。此外，DOCCI 包含大量存在关键差异的关联图像，所有描述文本均经过人工标注，以确保能够清晰区分每幅图像与其同类图像。 ### 支持任务文本到图像生成（Text-to-Image）与图像到文本生成（Image-to-Text） ### 语言英语 ## 数据集结构 ### 数据实例 { 'image': <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=1536x2048>, 'example_id': 'qual_dev_00000', 'description': '这是一张室内俯拍的中近景正面照片，主体为一只真实尺寸的毛绒玩具狗，其毛发为黑白两色，头戴一顶带有照明灯的蓝色安全帽。在玩具狗右侧几英寸处，有一只同样真实尺寸的黑白配色企鹅毛绒玩具，它也戴着一顶带有照明灯的蓝色安全帽。玩具狗呈坐姿，面部略微朝向右侧，正看向自己的右方，嘴巴微张，露出粉色舌头。玩具狗与企鹅被放置在一块灰白相间的地毯上，背景是一个白色抽屉柜，柜顶摆放着一块大型灰色靠垫。靠垫后方是一扇透明玻璃窗，窗外可见绿色树木。' } ### 数据字段 | 字段名 | 说明 | | --- | --- | `image` | PIL.JpegImagePlugin.JpegImageFile 格式图像文件 `example_id` | 数据实例的唯一标识符，格式为：`<划分名称>_<实例编号>` `description` | 对应关联图像的文本描述 ### 数据划分 | 数据集 | 训练集 | 测试集 | 定性开发集 | 定性测试集 | | --- | ---: | ---: | ---: | ---: | DOCCI | 9,647 | 5,000 | 100 | 100 DOCCI-AAR | 4,932 | 5,000 | -- | -- ## 数据集构建 ### 构建初衷 DOCCI 被设计为面向文本到图像（T2I）与图像到文本（I2T）生成任务的评测数据集。详细信息请参阅我们的论文。 ### 源数据 #### 初始数据采集所有图像均由本文作者及其家属拍摄。 ### 标注流程 #### 标注过程所有文本描述均由人工标注员撰写，我们的标注流程未依赖任何自动化工具。关于图像筛选的细节，请参阅[我们的论文](https://arxiv.org/pdf/2404.19753)的附录A。 ### 个人与敏感信息我们对所有图像进行了人工审核，以排查个人可识别信息（PII, Personally Identifiable Information），移除了部分存在此类信息的图像，并对检测到的人脸、电话号码与网址进行模糊处理以保护隐私。对于文本描述，我们要求标注员排除任何包含个人可识别信息的内容，例如人名、电话号码与网址。标注阶段结束后，我们还使用自动化工具扫描描述文本，确保其中不包含此类敏感信息。 ## 数据集使用注意事项 ### 数据集的社会影响 [需补充更多信息] ### 偏见讨论 [需补充更多信息] ### 其他已知局限性 [需补充更多信息] ## 附加信息 ### 数据集策展人 [未提供具体名单] ### 许可信息 CC BY 4.0 ### 引用信息 @inproceedings{OnoeDocci2024, author = {Yasumasa Onoe and Sunayana Rane and Zachary Berger and Yonatan Bitton and Jaemin Cho and Roopal Garg and Alexander Ku and Zarana Parekh and Jordi Pont-Tuset and Garrett Tanzer and Su Wang and Jason Baldridge}, title = {{DOCCI: Descriptions of Connected and Contrasting Images}}, booktitle = {ECCV}, year = {2024} } ### 贡献说明 [未提供具体内容]

提供机构：

maas

创建时间：

2025-04-21

搜集汇总

数据集介绍