Caption3o-Opt-v2
收藏魔搭社区2025-12-03 更新2025-07-12 收录
下载链接:
https://modelscope.cn/datasets/prithivMLmods/Caption3o-Opt-v2
下载链接
链接失效反馈官方服务:
资源简介:
# **Caption3o-Opt-v2**
**Caption3o-Opt-v2** is a high-quality, compact image-caption dataset designed for training and evaluating image-to-text models. Derived from the larger [BLIP3o/BLIP3o-Pretrain-Long-Caption](https://huggingface.co/datasets/BLIP3o/BLIP3o-Pretrain-Long-Caption), this optimized subset emphasizes long-form captions and covers a wide range of real-world and artistic scenes.
## Dataset Summary
* **Size**: 10,277 image-caption pairs
* **Format**: Parquet
* **Image resolution**: 512x512
* **Languages**: English
* **Modality**: Image-to-Text
* **License**: Apache-2.0
* **Split**: `train` (10.3k rows)
Each image is paired with a detailed, descriptive caption generated to support long-context understanding and fine-grained reasoning in vision-language tasks.
## Features
* `image`: 512x512 RGB image
* `caption`: Long-form English text (average length \~500 characters)
Example:
```text
The image depicts the upper section of a classical-style building, featuring a decorative frieze with relief sculptures...
```
## Use Cases
* Pretraining or finetuning vision-language models (e.g., BLIP, Flamingo, SigLIP)
* Evaluating long-form image captioning capabilities
* Enhancing datasets for visual storytelling, scene understanding, and artistic interpretation
## How to Use
You can load the dataset using the Hugging Face `datasets` library:
```python
from datasets import load_dataset
dataset = load_dataset("prithivMLmods/Caption3o-Opt-v2", split="train")
```
## Citation
If you use this dataset, please cite the original dataset:
> **BLIP3o/BLIP3o-Pretrain-Long-Caption**
> [https://huggingface.co/datasets/BLIP3o/BLIP3o-Pretrain-Long-Caption](https://huggingface.co/datasets/BLIP3o/BLIP3o-Pretrain-Long-Caption)
And reference this curated derivative:
> **Caption3o-Opt-v2 by prithivMLmods**
# **Caption3o-Opt-v2**
**Caption3o-Opt-v2** 是一款高质量、轻量化的图像-文本数据集,专为图像转文本模型的训练与评估打造。其源自大规模数据集[BLIP3o/BLIP3o-Pretrain-Long-Caption](https://huggingface.co/datasets/BLIP3o/BLIP3o-Pretrain-Long-Caption),经过优化的子集聚焦于长格式字幕,覆盖了广泛的现实场景与艺术创作场景。
## 数据集概览
* **规模**:10277组图像-文本配对样本
* **存储格式**:Parquet
* **图像分辨率**:512×512
* **语言**:英语
* **模态**:图像到文本
* **许可协议**:Apache-2.0
* **数据划分**:训练集(`train`,共约10.3k条数据)
每组图像均搭配一段详细的描述性字幕,旨在支撑视觉语言任务中的长上下文理解与细粒度推理。
## 数据集字段
* `image`:512×512的RGB图像
* `caption`:长格式英文文本(平均长度约500个字符)
示例:
text
本图像展示了一座古典风格建筑的上部区域,带有装饰性的雕带及浮雕雕塑……
## 应用场景
* 视觉语言模型(Vision-Language Models,如BLIP、Flamingo、SigLIP)的预训练与微调
* 长格式图像字幕生成能力的评估
* 优化用于视觉叙事、场景理解与艺术诠释的数据集
## 使用方法
你可以通过Hugging Face的`datasets`库加载该数据集:
python
from datasets import load_dataset
dataset = load_dataset("prithivMLmods/Caption3o-Opt-v2", split="train")
## 引用说明
若你使用本数据集,请引用其原始数据集:
> **BLIP3o/BLIP3o-Pretrain-Long-Caption**
> [https://huggingface.co/datasets/BLIP3o/BLIP3o-Pretrain-Long-Caption](https://huggingface.co/datasets/BLIP3o/BLIP3o-Pretrain-Long-Caption)
同时请标注此经过精选的衍生数据集:
> **prithivMLmods 制作的 Caption3o-Opt-v2**
提供机构:
maas
创建时间:
2025-07-10



