five

Caption3o-XL-v4

收藏
魔搭社区2025-12-03 更新2025-10-11 收录
下载链接:
https://modelscope.cn/datasets/prithivMLmods/Caption3o-XL-v4
下载链接
链接失效反馈
官方服务:
资源简介:
![22.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/AbNlLWZMg-XbMHWZgCMTe.png) # **Caption3o-XL-v4** **Caption3o-XL-v4** is a large-scale, high-quality image-caption dataset designed for training and evaluating image-to-text models. Derived from [prithivMLmods/blip3o-caption-mini-arrow](https://huggingface.co/datasets/prithivMLmods/blip3o-caption-mini-arrow) and additional curated sources, this optimized version emphasizes long-form captions and covers a wide range of real-world and artistic scenes. ## Dataset Summary * **Format**: Parquet * **Image resolution**: 512x512 * **Languages**: English * **Modality**: Image-to-Text * **License**: Apache-2.0 * **Split**: `train` (\~ 52,800 rows) Each image is paired with a detailed, descriptive caption generated to support long-context understanding and fine-grained reasoning in vision-language tasks. ## Features * `image`: 512x512 RGB image * `caption`: Long-form English text (average length \~500 characters) Example: ```text The image depicts a serene cemetery with neatly arranged gravestones and headstones, set against a backdrop of lush green grass. The scene is framed by tall trees on either side, their leaves providing dappled shade over the area... ``` ## Use Cases 1. Pretraining or finetuning vision-language models (e.g., BLIP, Flamingo, SigLIP) 2. Evaluating long-form image captioning capabilities 3. Enhancing datasets for visual storytelling, scene understanding, and artistic interpretation ## How to Use You can load the dataset using the Hugging Face `datasets` library: ```python from datasets import load_dataset dataset = load_dataset("prithivMLmods/Caption3o-XL-v4", split="train") ``` ## Citation If you use this dataset, please cite the original dataset: And reference this curated derivative: > **Caption3o-XL-v4 by prithivMLmods**

![22.png]("https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/AbNlLWZMg-XbMHWZgCMTe.png") # **Caption3o-XL-v4** **Caption3o-XL-v4** 是一款专为图像-文本(Image-to-Text)模型的训练与评估打造的大规模高质量图像-字幕数据集。其衍生自 [prithivMLmods/blip3o-caption-mini-arrow]("https://huggingface.co/datasets/prithivMLmods/blip3o-caption-mini-arrow") 及其他精选数据源,该优化版本着重支持长格式字幕创作,涵盖广泛的现实场景与艺术场景。 ## 数据集概览 * **格式**:Parquet * **图像分辨率**:512×512 * **语言**:英语 * **模态**:图像-文本(Image-to-Text) * **许可协议**:Apache-2.0 * **数据集拆分**:训练集(`train`,约52,800条数据) 每张图像均配有详尽的描述性字幕,旨在助力视觉语言任务中的长上下文理解与细粒度推理。 ## 数据集字段 * `image`:512×512 RGB图像 * `caption`:长格式英文文本(平均长度约500字符) 示例如下: text The image depicts a serene cemetery with neatly arranged gravestones and headstones, set against a backdrop of lush green grass. The scene is framed by tall trees on either side, their leaves providing dappled shade over the area... ## 应用场景 1. 视觉语言模型(如BLIP、Flamingo、SigLIP)的预训练与微调 2. 长格式图像字幕生成能力的评估 3. 优化视觉叙事、场景理解与艺术诠释相关的数据集 ## 使用方法 可通过 Hugging Face `datasets` 库加载该数据集: python from datasets import load_dataset dataset = load_dataset("prithivMLmods/Caption3o-XL-v4", split="train") ## 引用说明 若使用本数据集,请引用原始数据集,并标注此衍生整理版本: > **prithivMLmods 整理的 Caption3o-XL-v4**
提供机构:
maas
创建时间:
2025-09-16
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作