Fhrozen/sbucaptions-narratives
收藏Hugging Face2025-11-17 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/Fhrozen/sbucaptions-narratives
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
dataset_info:
features:
- name: key
dtype: string
- name: descript
dtype: string
- name: caption
dtype: string
- name: width
dtype: int64
- name: height
dtype: int64
- name: image
dtype: image
- name: negatives
list:
- name: negative
dtype: string
- name: positive
dtype: string
splits:
- name: train
num_bytes: 20518053392
num_examples: 840417
download_size: 20154096126
dataset_size: 20518053392
task_categories:
- image-text-to-text
language:
- en
tags:
- image
size_categories:
- 100K<n<1M
---
# sbuCaptions Narratives
SBU captions: images and captions
[Original Source](https://www.kaggle.com/datasets/akashnuka/sbucaptions)
This version includes descriptions and negatives generated by a Qwen VLM.
### Captions
The annotations include an `caption` column, which is a string description of the image obtained from a Qwen3 VLM (https://huggingface.co/Qwen/Qwen3-VL-30B-A3B-Thinking-FP8).
The request prompt to obtain the description is:
```python
prompt = (
'Describe the image using raw text as output. '
'The description should contain: - Focus on concrete objects '
'(e.g. cow, grass, person, kite, road, sky). '
'- Do not comment on things you cannot directly see in the image '
'(e.g., feelings that the image evokes, or what might happen in the future). '
'- Indicate an object roughly specifying its location and size. '
'- Say the relationship between two objects, e.g., "a man `is flying` a kite", '
'"a bottle `is on` the table". - If relevant, also mention attributes of the objects (e.g., `old` car)'
)
```
The request JSON is:
```python
data = {
"model": "llm-model",
"messages": [
{"role": "system", "content": [{"type": "text", "text": sys_prompt}]},
{"role": "user", "content": [
{"type": "text", "text": prompt},
{"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{encoded_image}"} }
]}
],
"stream": False,
"temperature": 0.7,
"max_completion_tokens": 256,
}
```
### Negatives
In addition, a column with `negatives` words is also added. These negatives can be employed for finetuning a model with DPO training.
The negatives are formatted as a list of dictionaries with a `positive` word, which is available in the caption string, and a `negative` word
that will change the meaning of the caption.
The negatives were obtained with LLM model ([GPT](https://huggingface.co/openai/gpt-oss-20b)) using the following prompt:
```python
prompt = (
"I will give you a text paragraph. "
"From the paragraph, select three to ten words, mainly sustantives and adjectives."
"Verbs are also allowed. For each selected word, provide a `negative` word that "
"will change the meaning of the text. Output the selected words in JSON format as: "
"`{'word 1': 'negative 1', 'word 2': 'negative 2', ..., 'word n': 'negative n'}`."
"Provide as output ONLY the JSON format. "
f"The text is:\n{data['caption']}"
)
```
## 📌 Introduction
This dataset collects the images and annotations from the original SBUcaptions project.
## 🙏 Acknowledgement
All credits to the original SBUcaptions project teams.
许可证:Apache-2.0
配置项:
- 配置名称:default
数据文件:
- 拆分集:训练集(train)
路径:data/train-*
数据集信息:
特征字段:
- 名称:key,数据类型:字符串(string)
- 名称:descript,数据类型:字符串(string)
- 名称:caption,数据类型:字符串(string)
- 名称:width,数据类型:64位整数(int64)
- 名称:height,数据类型:64位整数(int64)
- 名称:image,数据类型:图像(image)
- 名称:negatives,数据类型:列表,列表项字段:
- 名称:negative,数据类型:字符串(string)
- 名称:positive,数据类型:字符串(string)
拆分集信息:
- 名称:训练集(train),总字节数:20518053392,样本数量:840417
下载大小:20154096126
数据集总大小:20518053392
任务类别:图像-文本转文本(image-text-to-text)
语言:英语(en)
标签:图像(image)
样本量区间:100K < n < 1M
# SBU字幕叙事数据集
SBU字幕:图像与对应字幕
[原始来源](https://www.kaggle.com/datasets/akashnuka/sbucaptions)
本版本包含由Qwen多模态大语言模型(Qwen VLM)生成的图像描述与负样本文本。
## 字幕字段说明
注释中包含`caption`字段,该字段为由Qwen3多模态大语言模型(Qwen3 VLM,https://huggingface.co/Qwen/Qwen3-VL-30B-A3B-Thinking-FP8)生成的图像字符串描述。
用于生成该描述的请求提示词如下:
python
prompt = (
'Describe the image using raw text as output. '
'The description should contain: - Focus on concrete objects '
'(e.g. cow, grass, person, kite, road, sky). '
'- Do not comment on things you cannot directly see in the image '
'(e.g., feelings that the image evokes, or what might happen in the future). '
'- Indicate an object roughly specifying its location and size. '
'- Say the relationship between two objects, e.g., "a man `is flying` a kite", '
'"a bottle `is on` the table". - If relevant, also mention attributes of the objects (e.g., `old` car)'
)
对应的请求JSON格式如下:
python
data = {
"model": "llm-model",
"messages": [
{"role": "system", "content": [{"type": "text", "text": sys_prompt}]},
{"role": "user", "content": [
{"type": "text", "text": prompt},
{"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{encoded_image}"} }
]}
],
"stream": False,
"temperature": 0.7,
"max_completion_tokens": 256,
}
## 负样本字段说明
此外,数据集还新增了`negatives`字段,该字段可用于基于直接偏好优化(Direct Preference Optimization,DPO)训练的模型微调。
负样本格式为字典列表,每个字典包含`positive`与`negative`两个字段:`positive`为字幕文本中出现的词汇,`negative`则为可改变原字幕语义的替换词汇。
负样本由大语言模型(Large Language Model,LLM)[GPT](https://huggingface.co/openai/gpt-oss-20b)基于下述提示词生成:
python
prompt = (
"I will give you a text paragraph. "
"From the paragraph, select three to ten words, mainly sustantives and adjectives."
"Verbs are also allowed. For each selected word, provide a `negative` word that "
"will change the meaning of the text. Output the selected words in JSON format as: "
"`{'word 1': 'negative 1', 'word 2': 'negative 2', ..., 'word n': 'negative n'}`."
"Provide as output ONLY the JSON format. "
f"The text is:
{data['caption']}"
)
## 📌 数据集简介
本数据集收录了原始SBUcaptions项目中的图像与注释数据。
## 🙏 致谢
本数据集所有荣誉归属于原始SBUcaptions项目团队。
提供机构:
Fhrozen



