Fhrozen/sbucaptions-narratives

Name: Fhrozen/sbucaptions-narratives
Creator: Fhrozen
Published: 2025-11-17 23:18:43
License: 暂无描述

Hugging Face2025-11-17 更新2025-12-20 收录

下载链接：

https://hf-mirror.com/datasets/Fhrozen/sbucaptions-narratives

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: apache-2.0 configs: - config_name: default data_files: - split: train path: data/train-* dataset_info: features: - name: key dtype: string - name: descript dtype: string - name: caption dtype: string - name: width dtype: int64 - name: height dtype: int64 - name: image dtype: image - name: negatives list: - name: negative dtype: string - name: positive dtype: string splits: - name: train num_bytes: 20518053392 num_examples: 840417 download_size: 20154096126 dataset_size: 20518053392 task_categories: - image-text-to-text language: - en tags: - image size_categories: - 100K<n<1M --- # sbuCaptions Narratives SBU captions: images and captions [Original Source](https://www.kaggle.com/datasets/akashnuka/sbucaptions) This version includes descriptions and negatives generated by a Qwen VLM. ### Captions The annotations include an `caption` column, which is a string description of the image obtained from a Qwen3 VLM (https://huggingface.co/Qwen/Qwen3-VL-30B-A3B-Thinking-FP8). The request prompt to obtain the description is: ```python prompt = ( 'Describe the image using raw text as output. ' 'The description should contain: - Focus on concrete objects ' '(e.g. cow, grass, person, kite, road, sky). ' '- Do not comment on things you cannot directly see in the image ' '(e.g., feelings that the image evokes, or what might happen in the future). ' '- Indicate an object roughly specifying its location and size. ' '- Say the relationship between two objects, e.g., "a man `is flying` a kite", ' '"a bottle `is on` the table". - If relevant, also mention attributes of the objects (e.g., `old` car)' ) ``` The request JSON is: ```python data = { "model": "llm-model", "messages": [ {"role": "system", "content": [{"type": "text", "text": sys_prompt}]}, {"role": "user", "content": [ {"type": "text", "text": prompt}, {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{encoded_image}"} } ]} ], "stream": False, "temperature": 0.7, "max_completion_tokens": 256, } ``` ### Negatives In addition, a column with `negatives` words is also added. These negatives can be employed for finetuning a model with DPO training. The negatives are formatted as a list of dictionaries with a `positive` word, which is available in the caption string, and a `negative` word that will change the meaning of the caption. The negatives were obtained with LLM model ([GPT](https://huggingface.co/openai/gpt-oss-20b)) using the following prompt: ```python prompt = ( "I will give you a text paragraph. " "From the paragraph, select three to ten words, mainly sustantives and adjectives." "Verbs are also allowed. For each selected word, provide a `negative` word that " "will change the meaning of the text. Output the selected words in JSON format as: " "`{'word 1': 'negative 1', 'word 2': 'negative 2', ..., 'word n': 'negative n'}`." "Provide as output ONLY the JSON format. " f"The text is:\n{data['caption']}" ) ``` ## 📌 Introduction This dataset collects the images and annotations from the original SBUcaptions project. ## 🙏 Acknowledgement All credits to the original SBUcaptions project teams.

许可证：Apache-2.0 配置项： - 配置名称：default 数据文件： - 拆分集：训练集（train）路径：data/train-* 数据集信息：特征字段： - 名称：key，数据类型：字符串（string） - 名称：descript，数据类型：字符串（string） - 名称：caption，数据类型：字符串（string） - 名称：width，数据类型：64位整数（int64） - 名称：height，数据类型：64位整数（int64） - 名称：image，数据类型：图像（image） - 名称：negatives，数据类型：列表，列表项字段： - 名称：negative，数据类型：字符串（string） - 名称：positive，数据类型：字符串（string）拆分集信息： - 名称：训练集（train），总字节数：20518053392，样本数量：840417 下载大小：20154096126 数据集总大小：20518053392 任务类别：图像-文本转文本（image-text-to-text）语言：英语（en）标签：图像（image）样本量区间：100K < n < 1M # SBU字幕叙事数据集 SBU字幕：图像与对应字幕 [原始来源](https://www.kaggle.com/datasets/akashnuka/sbucaptions) 本版本包含由Qwen多模态大语言模型（Qwen VLM）生成的图像描述与负样本文本。 ## 字幕字段说明注释中包含`caption`字段，该字段为由Qwen3多模态大语言模型（Qwen3 VLM，https://huggingface.co/Qwen/Qwen3-VL-30B-A3B-Thinking-FP8）生成的图像字符串描述。用于生成该描述的请求提示词如下： python prompt = ( 'Describe the image using raw text as output. ' 'The description should contain: - Focus on concrete objects ' '(e.g. cow, grass, person, kite, road, sky). ' '- Do not comment on things you cannot directly see in the image ' '(e.g., feelings that the image evokes, or what might happen in the future). ' '- Indicate an object roughly specifying its location and size. ' '- Say the relationship between two objects, e.g., "a man `is flying` a kite", ' '"a bottle `is on` the table". - If relevant, also mention attributes of the objects (e.g., `old` car)' ) 对应的请求JSON格式如下： python data = { "model": "llm-model", "messages": [ {"role": "system", "content": [{"type": "text", "text": sys_prompt}]}, {"role": "user", "content": [ {"type": "text", "text": prompt}, {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{encoded_image}"} } ]} ], "stream": False, "temperature": 0.7, "max_completion_tokens": 256, } ## 负样本字段说明此外，数据集还新增了`negatives`字段，该字段可用于基于直接偏好优化（Direct Preference Optimization，DPO）训练的模型微调。负样本格式为字典列表，每个字典包含`positive`与`negative`两个字段：`positive`为字幕文本中出现的词汇，`negative`则为可改变原字幕语义的替换词汇。负样本由大语言模型（Large Language Model，LLM）[GPT](https://huggingface.co/openai/gpt-oss-20b)基于下述提示词生成： python prompt = ( "I will give you a text paragraph. " "From the paragraph, select three to ten words, mainly sustantives and adjectives." "Verbs are also allowed. For each selected word, provide a `negative` word that " "will change the meaning of the text. Output the selected words in JSON format as: " "`{'word 1': 'negative 1', 'word 2': 'negative 2', ..., 'word n': 'negative n'}`." "Provide as output ONLY the JSON format. " f"The text is: {data['caption']}" ) ## 📌 数据集简介本数据集收录了原始SBUcaptions项目中的图像与注释数据。 ## 🙏 致谢本数据集所有荣誉归属于原始SBUcaptions项目团队。

提供机构：

Fhrozen

5,000+

优质数据集

54 个

任务类型

进入经典数据集