imageinwords

Name: imageinwords
Creator: maas
Published: 2025-12-05 12:14:07
License: 暂无描述

魔搭社区2025-12-05 更新2025-04-26 收录

下载链接：

https://modelscope.cn/datasets/google/imageinwords

下载链接

链接失效反馈

官方服务：

资源简介：

<h2>ImageInWords: Unlocking Hyper-Detailed Image Descriptions</h2> Please visit the [webpage](https://google.github.io/imageinwords) for all the information about the IIW project, data downloads, visualizations, and much more. <img src="https://github.com/google/imageinwords/blob/main/static/images/Abstract/1_white_background.png?raw=true"> <img src="https://github.com/google/imageinwords/blob/main/static/images/Abstract/2_white_background.png?raw=true"> Please reach out to iiw-dataset@google.com for thoughts/feedback/questions/collaborations. <h3>🤗Hugging Face🤗</h3> <li><a href="https://huggingface.co/datasets/google/imageinwords">IIW-Benchmark Eval Dataset</a></li> ```python from datasets import load_dataset # `name` can be one of: IIW-400, DCI_Test, DOCCI_Test, CM_3600, LocNar_Eval # refer: https://github.com/google/imageinwords/tree/main/datasets dataset = load_dataset('google/imageinwords', token=None, name="IIW-400", trust_remote_code=True) ``` <li><a href="https://huggingface.co/spaces/google/imageinwords-explorer">Dataset-Explorer</a></li> ## Dataset Description - **Paper:** [arXiv](https://arxiv.org/abs/2405.02793) - **Homepage:** https://google.github.io/imageinwords/ - **Point of Contact:** iiw-dataset@google.com - **Dataset Explorer:** [ImageInWords-Explorer](https://huggingface.co/spaces/google/imageinwords-explorer) ### Dataset Summary ImageInWords (IIW), a carefully designed human-in-the-loop annotation framework for curating hyper-detailed image descriptions and a new dataset resulting from this process. We validate the framework through evaluations focused on the quality of the dataset and its utility for fine-tuning with considerations for readability, comprehensiveness, specificity, hallucinations, and human-likeness. This Data Card describes **IIW-Benchmark: Eval Datasets**, a mixture of human annotated and machine generated data intended to help create and capture rich, hyper-detailed image descriptions. IIW dataset has two parts: human annotations and model outputs. The main purposes of this dataset are: 1) to provide samples from SoTA human authored outputs to promote discussion on annotation guidelines to further improve the quality 2) to provide human SxS results and model outputs to promote development of automatic metrics to mimic human SxS judgements. ### Supported Tasks Text-to-Image, Image-to-Text, Object Detection ### Languages English ## Dataset Structure ### Data Instances ### Data Fields For details on the datasets and output keys, please refer to our [GitHub data](https://github.com/google/imageinwords/tree/main/datasets) page inside the individual folders. IIW-400: - `image/key` - `image/url` - `IIW`: Human generated image description - `IIW-P5B`: Machine generated image description - `iiw-human-sxs-gpt4v` and `iiw-human-sxs-iiw-p5b`: human SxS metrics - metrics/Comprehensiveness - metrics/Specificity - metrics/Hallucination - metrics/First few line(s) as tldr - metrics/Human Like DCI_Test: - `image` - `image/url` - `ex_id` - `IIW`: Human authored image description - `metrics/Comprehensiveness` - `metrics/Specificity` - `metrics/Hallucination` - `metrics/First few line(s) as tldr` - `metrics/Human Like` DOCCI_Test: - `image` - `image/thumbnail_url` - `IIW`: Human generated image description - `DOCCI`: Image description from DOCCI - `metrics/Comprehensiveness` - `metrics/Specificity` - `metrics/Hallucination` - `metrics/First few line(s) as tldr` - `metrics/Human Like` LocNar_Eval: - `image/key` - `image/url` - `IIW-P5B`: Machine generated image description CM_3600: - `image/key` - `image/url` - `IIW-P5B`: Machine generated image description Please note that all fields are string. ### Data Splits Dataset | Size ---| ---: IIW-400 | 400 DCI_Test | 112 DOCCI_Test | 100 LocNar_Eval | 1000 CM_3600 | 1000 ### Annotations #### Annotation process Some text descriptions were written by human annotators and some were generated by machine models. The metrics are all from human SxS. ### Personal and Sensitive Information The images that were used for the descriptions and the machine generated text descriptions are checked (by algorithmic methods and manual inspection) for S/PII, pornographic content, and violence and any we found may contain such information have been filtered. We asked that human annotators use an objective and respectful language for the image descriptions. ### Licensing Information CC BY 4.0 ### Citation Information ``` @misc{garg2024imageinwords, title={ImageInWords: Unlocking Hyper-Detailed Image Descriptions}, author={Roopal Garg and Andrea Burns and Burcu Karagol Ayan and Yonatan Bitton and Ceslee Montgomery and Yasumasa Onoe and Andrew Bunner and Ranjay Krishna and Jason Baldridge and Radu Soricut}, year={2024}, eprint={2405.02793}, archivePrefix={arXiv}, primaryClass={cs.CV} } ```

<h2>ImageInWords：解锁超精细化图像描述</h2> 请访问[项目网页](https://google.github.io/imageinwords)获取ImageInWords（简称IIW）项目的完整信息、数据集下载、可视化内容及更多相关资源。 ![示例图1](https://github.com/google/imageinwords/blob/main/static/images/Abstract/1_white_background.png?raw=true) ![示例图2](https://github.com/google/imageinwords/blob/main/static/images/Abstract/2_white_background.png?raw=true) 如有任何想法、反馈、疑问或合作意向，请联系邮箱iiw-dataset@google.com。 <h3>🤗 Hugging Face 🤗</h3> <li><a href="https://huggingface.co/datasets/google/imageinwords">IIW基准评估数据集</a></li> python from datasets import load_dataset # 可选参数`name`取值包括：IIW-400、DCI_Test、DOCCI_Test、CM_3600、LocNar_Eval，具体可参考：https://github.com/google/imageinwords/tree/main/datasets dataset = load_dataset('google/imageinwords', token=None, name="IIW-400", trust_remote_code=True) <li><a href="https://huggingface.co/spaces/google/imageinwords-explorer">数据集浏览器</a></li> ## 数据集说明 - **论文**：[arXiv](https://arxiv.org/abs/2405.02793) - **项目主页**：https://google.github.io/imageinwords/ - **联系方式**：iiw-dataset@google.com - **数据集浏览器**：[ImageInWords-Explorer](https://huggingface.co/spaces/google/imageinwords-explorer) ### 数据集概述 ImageInWords（简称IIW）是一套精心设计的人机协同标注框架，用于构建超精细化图像描述，同时也是基于该流程生成的全新数据集。我们通过多维度评估验证了该框架的有效性，评估维度涵盖数据集质量、微调实用性，并重点考量了描述文本的可读性、全面性、特异性、幻觉性与类人性。本数据卡片介绍的是**IIW基准评估数据集**，该数据集由人工标注数据与机器生成数据混合组成，旨在助力生成与捕捉丰富且超精细化的图像描述。 IIW数据集包含两部分：人工标注数据与模型生成输出。该数据集的核心用途有二： 1) 提供当前最优（State-of-the-art，简称SoTA）的人工撰写描述样本，推动标注指南的讨论迭代，以进一步提升数据集质量； 2) 提供人工两两对比（Side-by-Side，简称SxS）评估结果与模型输出，助力自动评估指标的研发，使其能够模拟人类的两两对比判断逻辑。 ### 支持任务文本到图像、图像到文本、目标检测 ### 语言英语 ## 数据集结构 ### 数据实例 ### 数据字段如需了解各数据集与输出字段的详细信息，请访问我们的[GitHub数据集页面](https://github.com/google/imageinwords/tree/main/datasets)，查看对应子文件夹中的内容。 **IIW-400数据集字段：** - `image/key`：图像唯一标识 - `image/url`：图像链接 - `IIW`：人工生成的图像描述 - `IIW-P5B`：机器生成的图像描述 - `iiw-human-sxs-gpt4v` 与 `iiw-human-sxs-iiw-p5b`：人工两两对比评估指标 - `metrics/Comprehensiveness`：全面性指标 - `metrics/Specificity`：特异性指标 - `metrics/Hallucination`：幻觉性指标 - `metrics/First few line(s) as tldr`：前若干行作为摘要的指标 - `metrics/Human Like`：类人性指标 **DCI_Test数据集字段：** - `image`：图像数据 - `image/url`：图像链接 - `ex_id`：示例唯一标识 - `IIW`：人工撰写的图像描述 - `metrics/Comprehensiveness`：全面性指标 - `metrics/Specificity`：特异性指标 - `metrics/Hallucination`：幻觉性指标 - `metrics/First few line(s) as tldr`：前若干行作为摘要的指标 - `metrics/Human Like`：类人性指标 **DOCCI_Test数据集字段：** - `image`：图像数据 - `image/thumbnail_url`：图像缩略图链接 - `IIW`：人工生成的图像描述 - `DOCCI`：基于DOCCI模型生成的图像描述 - `metrics/Comprehensiveness`：全面性指标 - `metrics/Specificity`：特异性指标 - `metrics/Hallucination`：幻觉性指标 - `metrics/First few line(s) as tldr`：前若干行作为摘要的指标 - `metrics/Human Like`：类人性指标 **LocNar_Eval数据集字段：** - `image/key`：图像唯一标识 - `image/url`：图像链接 - `IIW-P5B`：机器生成的图像描述 **CM_3600数据集字段：** - `image/key`：图像唯一标识 - `image/url`：图像链接 - `IIW-P5B`：机器生成的图像描述请注意，所有字段的类型均为字符串。 ### 数据划分 | 数据集名称 | 样本数量 | |------------------|---------:| | IIW-400 | 400 | | DCI_Test | 112 | | DOCCI_Test | 100 | | LocNar_Eval | 1000 | | CM_3600 | 1000 | ### 标注信息 #### 标注流程部分文本描述由人工标注员撰写，部分由机器学习模型生成。所有评估指标均基于人工两两对比得出。 ### 个人与敏感信息用于生成描述的图像与机器生成的文本描述均通过算法检测与人工审查的方式，筛查了敏感/个人可识别信息、色情内容与暴力内容，所有疑似包含此类信息的样本均已被过滤。我们要求人工标注员在撰写图像描述时使用客观且尊重的语言。 ### 授权协议 CC BY 4.0协议 ### 引用信息 @misc{garg2024imageinwords, title={ImageInWords: Unlocking Hyper-Detailed Image Descriptions}, author={Roopal Garg and Andrea Burns and Burcu Karagol Ayan and Yonatan Bitton and Ceslee Montgomery and Yasumasa Onoe and Andrew Bunner and Ranjay Krishna and Jason Baldridge and Radu Soricut}, year={2024}, eprint={2405.02793}, archivePrefix={arXiv}, primaryClass={cs.CV} }

提供机构：

maas

创建时间：

2025-04-21

5,000+

优质数据集

54 个

任务类型

进入经典数据集