Fhrozen/openimages-narratives-v2

Name: Fhrozen/openimages-narratives-v2
Creator: Fhrozen
Published: 2025-11-17 23:19:20
License: 暂无描述

Hugging Face2025-11-17 更新2025-12-20 收录

下载链接：

https://hf-mirror.com/datasets/Fhrozen/openimages-narratives-v2

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: apache-2.0 configs: - config_name: default data_files: - split: train path: data/train_* - split: train_0 path: data/train_0-* - split: train_1 path: data/train_1-* - split: train_2 path: data/train_2-* - split: train_3 path: data/train_3-* - split: train_4 path: data/train_4-* - split: train_5 path: data/train_5-* - split: train_6 path: data/train_6-* - split: train_7 path: data/train_7-* - split: train_8 path: data/train_8-* - split: train_9 path: data/train_9-* - split: train_a path: data/train_a-* - split: train_b path: data/train_b-* - split: train_c path: data/train_c-* - split: train_d path: data/train_d-* - split: train_e path: data/train_e-* - split: train_f path: data/train_f-* task_categories: - image-text-to-text language: - en tags: - image size_categories: - 1M<n<10M --- # Open Images Narratives v2 [Original Source](https://storage.googleapis.com/openimages/web/index.html) | [Google Localized Narrative](https://google.github.io/localized-narratives/) ## 📌 Introduction This dataset comprises images and annotations from the original Open Images Dataset V7. Out of the 9M images, a subset of 1.9M images has been annotated with automatic methods (Image-text-to-text models). ## Description This dataset comprises all 1.9M [images with bounding boxes annotations](https://github.com/cvdfoundation/open-images-dataset?tab=readme-ov-file#download-images-with-bounding-boxes-annotations) from the Open Images V7 project. ### Captions The annotations include an `caption` column, which is a string description of the image obtained from a Qwen3 VLM (https://huggingface.co/Qwen/Qwen3-VL-30B-A3B-Thinking-FP8). The request prompt to obtain the description is: ```python prompt = ( 'Describe the image using raw text as output. ' 'The description should contain: - Focus on concrete objects ' '(e.g. cow, grass, person, kite, road, sky). ' '- Do not comment on things you cannot directly see in the image ' '(e.g., feelings that the image evokes, or what might happen in the future). ' '- Indicate an object roughly specifying its location and size. ' '- Say the relationship between two objects, e.g., "a man `is flying` a kite", ' '"a bottle `is on` the table". - If relevant, also mention attributes of the objects (e.g., `old` car)' ) ``` The request JSON is: ```python data = { "model": "llm-model", "messages": [ {"role": "system", "content": [{"type": "text", "text": sys_prompt}]}, {"role": "user", "content": [ {"type": "text", "text": prompt}, {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{encoded_image}"} } ]} ], "stream": False, "temperature": 0.7, "max_completion_tokens": 256, } ``` ### Negatives In addition, a column with `negatives` words is also added. These negatives can be employed for finetuning a model with DPO training. The negatives are formatted as a list of dictionaries with a `positive` word, which is available in the caption string, and a `negative` word that will change the meaning of the caption. The negatives were obtained with LLM model ([GPT](https://huggingface.co/openai/gpt-oss-20b)) using the following prompt: ```python prompt = ( "I will give you a text paragraph. " "From the paragraph, select three to ten words, mainly sustantives and adjectives." "Verbs are also allowed. For each selected word, provide a `negative` word that " "will change the meaning of the text. Output the selected words in JSON format as: " "`{'word 1': 'negative 1', 'word 2': 'negative 2', ..., 'word n': 'negative n'}`." "Provide as output ONLY the JSON format. " f"The text is:\n{data['caption']}" ) ``` ## 🙏 Acknowledgement All credits to the original Open Images Dataset V7 team. ## 📜 Cite Please consider citing the following related papers: 1. ["Extreme clicking for efficient object annotation"](https://arxiv.org/abs/1708.02750), Papadopolous et al., ICCV 2017. 2. ["We don't need no bounding-boxes: Training object class detectors using only human verification"](https://arxiv.org/abs/1602.08405), Papadopolous et al., CVPR 2016. 3. ["The Open Images Dataset V4: Unified image classification, object detection, and visual relationship detection at scale"](https://arxiv.org/abs/1811.00982), Kuznetsova et al., arXiv:1811.00982 2018. 4. ["Large-scale interactive object segmentation with human annotators"](https://arxiv.org/pdf/1903.10830), Benenson et al., CVPR 2019. 5. ["Natural Vocabulary Emerges from Free-Form Annotations"](https://arxiv.org/abs/1906.01542), Pont-Tuset et al., arXiv 2019. 6. ["From couloring-in to pointillism: revisiting semantic segmentation supervision"](https://storage.googleapis.com/openimages/web_v7/2022_pointillism_arxiv.pdf), Benenson et al., arXiv 2022.

提供机构：

Fhrozen

5,000+

优质数据集

54 个

任务类型

进入经典数据集