Caption3o-Opt-v2

Name: Caption3o-Opt-v2
Creator: maas
Published: 2025-12-03 17:17:25
License: 暂无描述

魔搭社区2025-12-03 更新2025-07-12 收录

下载链接：

https://modelscope.cn/datasets/prithivMLmods/Caption3o-Opt-v2

下载链接

链接失效反馈

官方服务：

资源简介：

# **Caption3o-Opt-v2** **Caption3o-Opt-v2** is a high-quality, compact image-caption dataset designed for training and evaluating image-to-text models. Derived from the larger [BLIP3o/BLIP3o-Pretrain-Long-Caption](https://huggingface.co/datasets/BLIP3o/BLIP3o-Pretrain-Long-Caption), this optimized subset emphasizes long-form captions and covers a wide range of real-world and artistic scenes. ## Dataset Summary * **Size**: 10,277 image-caption pairs * **Format**: Parquet * **Image resolution**: 512x512 * **Languages**: English * **Modality**: Image-to-Text * **License**: Apache-2.0 * **Split**: `train` (10.3k rows) Each image is paired with a detailed, descriptive caption generated to support long-context understanding and fine-grained reasoning in vision-language tasks. ## Features * `image`: 512x512 RGB image * `caption`: Long-form English text (average length \~500 characters) Example: ```text The image depicts the upper section of a classical-style building, featuring a decorative frieze with relief sculptures... ``` ## Use Cases * Pretraining or finetuning vision-language models (e.g., BLIP, Flamingo, SigLIP) * Evaluating long-form image captioning capabilities * Enhancing datasets for visual storytelling, scene understanding, and artistic interpretation ## How to Use You can load the dataset using the Hugging Face `datasets` library: ```python from datasets import load_dataset dataset = load_dataset("prithivMLmods/Caption3o-Opt-v2", split="train") ``` ## Citation If you use this dataset, please cite the original dataset: > **BLIP3o/BLIP3o-Pretrain-Long-Caption** > [https://huggingface.co/datasets/BLIP3o/BLIP3o-Pretrain-Long-Caption](https://huggingface.co/datasets/BLIP3o/BLIP3o-Pretrain-Long-Caption) And reference this curated derivative: > **Caption3o-Opt-v2 by prithivMLmods**

# **Caption3o-Opt-v2** **Caption3o-Opt-v2** 是一款高质量、轻量化的图像-文本数据集，专为图像转文本模型的训练与评估打造。其源自大规模数据集[BLIP3o/BLIP3o-Pretrain-Long-Caption](https://huggingface.co/datasets/BLIP3o/BLIP3o-Pretrain-Long-Caption)，经过优化的子集聚焦于长格式字幕，覆盖了广泛的现实场景与艺术创作场景。 ## 数据集概览 * **规模**：10277组图像-文本配对样本 * **存储格式**：Parquet * **图像分辨率**：512×512 * **语言**：英语 * **模态**：图像到文本 * **许可协议**：Apache-2.0 * **数据划分**：训练集（`train`，共约10.3k条数据）每组图像均搭配一段详细的描述性字幕，旨在支撑视觉语言任务中的长上下文理解与细粒度推理。 ## 数据集字段 * `image`：512×512的RGB图像 * `caption`：长格式英文文本（平均长度约500个字符）示例： text 本图像展示了一座古典风格建筑的上部区域，带有装饰性的雕带及浮雕雕塑…… ## 应用场景 * 视觉语言模型（Vision-Language Models，如BLIP、Flamingo、SigLIP）的预训练与微调 * 长格式图像字幕生成能力的评估 * 优化用于视觉叙事、场景理解与艺术诠释的数据集 ## 使用方法你可以通过Hugging Face的`datasets`库加载该数据集： python from datasets import load_dataset dataset = load_dataset("prithivMLmods/Caption3o-Opt-v2", split="train") ## 引用说明若你使用本数据集，请引用其原始数据集： > **BLIP3o/BLIP3o-Pretrain-Long-Caption** > [https://huggingface.co/datasets/BLIP3o/BLIP3o-Pretrain-Long-Caption](https://huggingface.co/datasets/BLIP3o/BLIP3o-Pretrain-Long-Caption) 同时请标注此经过精选的衍生数据集： > **prithivMLmods 制作的 Caption3o-Opt-v2**

提供机构：

maas

创建时间：

2025-07-10

5,000+

优质数据集

54 个

任务类型

进入经典数据集