five

Fhrozen/sbucaptions-narratives

收藏
Hugging Face2025-11-17 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/Fhrozen/sbucaptions-narratives
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 configs: - config_name: default data_files: - split: train path: data/train-* dataset_info: features: - name: key dtype: string - name: descript dtype: string - name: caption dtype: string - name: width dtype: int64 - name: height dtype: int64 - name: image dtype: image - name: negatives list: - name: negative dtype: string - name: positive dtype: string splits: - name: train num_bytes: 20518053392 num_examples: 840417 download_size: 20154096126 dataset_size: 20518053392 task_categories: - image-text-to-text language: - en tags: - image size_categories: - 100K<n<1M --- # sbuCaptions Narratives SBU captions: images and captions [Original Source](https://www.kaggle.com/datasets/akashnuka/sbucaptions) This version includes descriptions and negatives generated by a Qwen VLM. ### Captions The annotations include an `caption` column, which is a string description of the image obtained from a Qwen3 VLM (https://huggingface.co/Qwen/Qwen3-VL-30B-A3B-Thinking-FP8). The request prompt to obtain the description is: ```python prompt = ( 'Describe the image using raw text as output. ' 'The description should contain: - Focus on concrete objects ' '(e.g. cow, grass, person, kite, road, sky). ' '- Do not comment on things you cannot directly see in the image ' '(e.g., feelings that the image evokes, or what might happen in the future). ' '- Indicate an object roughly specifying its location and size. ' '- Say the relationship between two objects, e.g., "a man `is flying` a kite", ' '"a bottle `is on` the table". - If relevant, also mention attributes of the objects (e.g., `old` car)' ) ``` The request JSON is: ```python data = { "model": "llm-model", "messages": [ {"role": "system", "content": [{"type": "text", "text": sys_prompt}]}, {"role": "user", "content": [ {"type": "text", "text": prompt}, {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{encoded_image}"} } ]} ], "stream": False, "temperature": 0.7, "max_completion_tokens": 256, } ``` ### Negatives In addition, a column with `negatives` words is also added. These negatives can be employed for finetuning a model with DPO training. The negatives are formatted as a list of dictionaries with a `positive` word, which is available in the caption string, and a `negative` word that will change the meaning of the caption. The negatives were obtained with LLM model ([GPT](https://huggingface.co/openai/gpt-oss-20b)) using the following prompt: ```python prompt = ( "I will give you a text paragraph. " "From the paragraph, select three to ten words, mainly sustantives and adjectives." "Verbs are also allowed. For each selected word, provide a `negative` word that " "will change the meaning of the text. Output the selected words in JSON format as: " "`{'word 1': 'negative 1', 'word 2': 'negative 2', ..., 'word n': 'negative n'}`." "Provide as output ONLY the JSON format. " f"The text is:\n{data['caption']}" ) ``` ## 📌 Introduction This dataset collects the images and annotations from the original SBUcaptions project. ## 🙏 Acknowledgement All credits to the original SBUcaptions project teams.

许可证:Apache-2.0 配置项: - 配置名称:default 数据文件: - 拆分集:训练集(train) 路径:data/train-* 数据集信息: 特征字段: - 名称:key,数据类型:字符串(string) - 名称:descript,数据类型:字符串(string) - 名称:caption,数据类型:字符串(string) - 名称:width,数据类型:64位整数(int64) - 名称:height,数据类型:64位整数(int64) - 名称:image,数据类型:图像(image) - 名称:negatives,数据类型:列表,列表项字段: - 名称:negative,数据类型:字符串(string) - 名称:positive,数据类型:字符串(string) 拆分集信息: - 名称:训练集(train),总字节数:20518053392,样本数量:840417 下载大小:20154096126 数据集总大小:20518053392 任务类别:图像-文本转文本(image-text-to-text) 语言:英语(en) 标签:图像(image) 样本量区间:100K < n < 1M # SBU字幕叙事数据集 SBU字幕:图像与对应字幕 [原始来源](https://www.kaggle.com/datasets/akashnuka/sbucaptions) 本版本包含由Qwen多模态大语言模型(Qwen VLM)生成的图像描述与负样本文本。 ## 字幕字段说明 注释中包含`caption`字段,该字段为由Qwen3多模态大语言模型(Qwen3 VLM,https://huggingface.co/Qwen/Qwen3-VL-30B-A3B-Thinking-FP8)生成的图像字符串描述。 用于生成该描述的请求提示词如下: python prompt = ( 'Describe the image using raw text as output. ' 'The description should contain: - Focus on concrete objects ' '(e.g. cow, grass, person, kite, road, sky). ' '- Do not comment on things you cannot directly see in the image ' '(e.g., feelings that the image evokes, or what might happen in the future). ' '- Indicate an object roughly specifying its location and size. ' '- Say the relationship between two objects, e.g., "a man `is flying` a kite", ' '"a bottle `is on` the table". - If relevant, also mention attributes of the objects (e.g., `old` car)' ) 对应的请求JSON格式如下: python data = { "model": "llm-model", "messages": [ {"role": "system", "content": [{"type": "text", "text": sys_prompt}]}, {"role": "user", "content": [ {"type": "text", "text": prompt}, {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{encoded_image}"} } ]} ], "stream": False, "temperature": 0.7, "max_completion_tokens": 256, } ## 负样本字段说明 此外,数据集还新增了`negatives`字段,该字段可用于基于直接偏好优化(Direct Preference Optimization,DPO)训练的模型微调。 负样本格式为字典列表,每个字典包含`positive`与`negative`两个字段:`positive`为字幕文本中出现的词汇,`negative`则为可改变原字幕语义的替换词汇。 负样本由大语言模型(Large Language Model,LLM)[GPT](https://huggingface.co/openai/gpt-oss-20b)基于下述提示词生成: python prompt = ( "I will give you a text paragraph. " "From the paragraph, select three to ten words, mainly sustantives and adjectives." "Verbs are also allowed. For each selected word, provide a `negative` word that " "will change the meaning of the text. Output the selected words in JSON format as: " "`{'word 1': 'negative 1', 'word 2': 'negative 2', ..., 'word n': 'negative n'}`." "Provide as output ONLY the JSON format. " f"The text is: {data['caption']}" ) ## 📌 数据集简介 本数据集收录了原始SBUcaptions项目中的图像与注释数据。 ## 🙏 致谢 本数据集所有荣誉归属于原始SBUcaptions项目团队。
提供机构:
Fhrozen
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作