Caption3o-LongCap-v4

Name: Caption3o-LongCap-v4
Creator: maas
Published: 2026-01-06 16:46:25
License: 暂无描述

魔搭社区2026-01-06 更新2025-12-06 收录

下载链接：

https://modelscope.cn/datasets/prithivMLmods/Caption3o-LongCap-v4

下载链接

链接失效反馈

官方服务：

资源简介：

![21.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/1EwrfiS5ea101LxDCeF1S.png) # **Caption3o-LongCap-v4** **Caption3o-LongCap-v4** is a large-scale, high-quality image-caption dataset designed for training and evaluating image-to-text models. Derived from [prithivMLmods/blip3o-caption-mini-arrow](https://huggingface.co/datasets/prithivMLmods/blip3o-caption-mini-arrow) and additional curated sources, this optimized version emphasizes long-form captions and covers a wide range of real-world and artistic scenes. ## Dataset Summary * **Image resolution**: 512x512 * **Languages**: English * **Modality**: Image-to-Text * **License**: Apache-2.0 * **Split**: `train` (\~522,825 rows) Each image is paired with a detailed, descriptive caption generated to support long-context understanding and fine-grained reasoning in vision-language tasks. ## Features * `image`: 512x512 RGB image * `caption`: Long-form English text (average length \~500 characters) Example: ```text The image depicts a serene cemetery with neatly arranged gravestones and headstones, set against a backdrop of lush green grass. The scene is framed by tall trees on either side, their leaves providing dappled shade over the area... ``` ## Use Cases 1. Pretraining or finetuning vision-language models (e.g., BLIP, Flamingo, SigLIP) 2. Evaluating long-form image captioning capabilities 3. Enhancing datasets for visual storytelling, scene understanding, and artistic interpretation ## How to Use You can load the dataset using the Hugging Face `datasets` library: ```python from datasets import load_dataset dataset = load_dataset("prithivMLmods/Caption3o-LongCap-v4", split="train") ``` ## Citation If you use this dataset, please cite the original dataset: And reference this curated derivative: > **Caption3o-LongCap-v4 by prithivMLmods**

![21.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/1EwrfiS5ea101LxDCeF1S.png) # **Caption3o-LongCap-v4** **Caption3o-LongCap-v4** 是一款大规模高质量图像-文本标注数据集，专为图像转文本模型的训练与评估打造。该数据集源自 [prithivMLmods/blip3o-caption-mini-arrow](https://huggingface.co/datasets/prithivMLmods/blip3o-caption-mini-arrow) 及其他精选数据源，此优化版本着重支持长文本标注，涵盖了丰富的现实场景与艺术场景。 ## 数据集概览 * **图像分辨率**：512×512 * **语言**：英语 * **模态**：图像-文本 * **许可协议**：Apache-2.0 * **数据划分**：训练集（`train`，约522,825条数据）每张图像均配有详细的描述性标注文本，可支撑视觉语言任务中的长上下文理解与细粒度推理。 ## 数据字段 * `image`：512×512分辨率的RGB图像 * `caption`：长格式英文文本（平均长度约500个字符）示例： text The image depicts a serene cemetery with neatly arranged gravestones and headstones, set against a backdrop of lush green grass. The scene is framed by tall trees on either side, their leaves providing dappled shade over the area... ## 应用场景 1. 视觉语言模型的预训练或微调（如BLIP、Flamingo、SigLIP） 2. 长文本图像标注能力的评估 3. 为视觉叙事、场景理解与艺术诠释相关数据集提供增强支持 ## 使用方法可通过Hugging Face的`datasets`库加载该数据集： python from datasets import load_dataset dataset = load_dataset("prithivMLmods/Caption3o-LongCap-v4", split="train") ## 引用说明若使用本数据集，请引用原始数据集，并标注此衍生整理版本： > **prithivMLmods 出品的 Caption3o-LongCap-v4 数据集**

提供机构：

maas

创建时间：

2025-09-15

5,000+

优质数据集

54 个

任务类型

进入经典数据集