Caption3o-Opt

Name: Caption3o-Opt
Creator: maas
Published: 2025-12-03 17:17:25
License: 暂无描述

魔搭社区2025-12-03 更新2025-07-05 收录

下载链接：

https://modelscope.cn/datasets/prithivMLmods/Caption3o-Opt

下载链接

链接失效反馈

官方服务：

资源简介：

# **Caption3o-Opt** **Caption3o-Opt** is a compact, high-quality image-caption dataset derived from the original [BLIP3o/BLIP3o-Pretrain-Long-Caption](https://huggingface.co/datasets/BLIP3o/BLIP3o-Pretrain-Long-Caption). This refined subset focuses on optimized long-form captioning, curated for real-world and artistic image understanding across vision-language models. ## Overview - **Total Samples**: 10,278 - **Modality**: Image ↔ Text - **Format**: Arrow (auto-converted to Parquet) - **License**: Apache 2.0 - **Language**: English - **Size**: ~500 MB ## Dataset Structure | Field | Type | Description | | ------- | ------ | ----------------------------------------------- | | image | image | Input image in binary format | | caption | string | Long-form, descriptive caption for the image | ## Quickstart with 🤗 Datasets ```bash pip install datasets ```` ```python from datasets import load_dataset # Load dataset dataset = load_dataset("prithivMLmods/Caption3o-Opt", split="train") # View a sample print(dataset[0]) ``` ## Example Entries 1. **Image**: Religious statue **Caption**: *The image depicts a religious figure adorned in elaborate, ornate attire, likely a statue or icon of a saint or Virgin Mary...* 2. **Image**: Historic building with clock tower **Caption**: *The image captures a grand, historic building under a clear blue sky. The structure features ornate architectural details...* 3. **Image**: South Asian temple entrance **Caption**: *The image depicts the entrance of a vibrant and ornate temple, likely of South Asian origin...* 4. **Image**: Outdoor grilling event **Caption**: *The image shows a close-up of a person grilling food outdoors. The individual is wearing an apron...* 5. **Image**: Scenic harbor **Caption**: *The image depicts a serene harbor scene under a clear blue sky with a few scattered clouds...* ## Use Cases This dataset supports a variety of vision-language tasks: * Long-form image captioning * Visual scene understanding * Multi-modal grounding and reasoning * Fine-tuning VLMs like BLIP, IDEFICS, Flamingo, etc. ## Citation If you use this dataset, please cite the original dataset: > **BLIP3o/BLIP3o-Pretrain-Long-Caption** > [https://huggingface.co/datasets/BLIP3o/BLIP3o-Pretrain-Long-Caption](https://huggingface.co/datasets/BLIP3o/BLIP3o-Pretrain-Long-Caption) And reference this curated derivative: > **Caption3o-Opt by prithivMLmods**

# **Caption3o-Opt** **Caption3o-Opt** 是一款轻量化且高质量的图像-字幕数据集，其源自原始数据集[BLIP3o/BLIP3o-Pretrain-Long-Caption](https://huggingface.co/datasets/BLIP3o/BLIP3o-Pretrain-Long-Caption)。该精选子集聚焦于优化后的长文本字幕生成任务，专为视觉语言模型的真实场景与艺术图像理解任务精心构建。 ## 概览 - **总样本量**：10,278 - **模态**：图像↔文本 - **格式**：Arrow格式（可自动转换为Parquet格式） - **许可协议**：Apache 2.0 - **语言**：英语 - **体积**：约500 MB ## 数据集结构 | 字段名 | 数据类型 | 描述说明 | | ------- | -------- | -------------------------------------------- | | image | 图像 | 二进制格式的输入图像 | | caption | 字符串 | 用于描述该图像的长文本详细字幕 | ## 快速上手（基于🤗 Datasets库） bash pip install datasets python from datasets import load_dataset # 加载数据集 dataset = load_dataset("prithivMLmods/Caption3o-Opt", split="train") # 查看单条样本 print(dataset[0]) ## 示例条目 1. **图像**：宗教雕像 **字幕**：*本图像展示了一尊身着华丽繁复服饰的宗教造像，大概率为某位圣徒或圣母玛利亚的雕像或圣像……* 2. **图像**：带钟楼的历史建筑 **字幕**：*本图像定格了晴朗蓝天下一座宏伟的历史建筑。该建筑拥有精致繁复的建筑细节……* 3. **图像**：南亚寺庙入口 **字幕**：*本图像展现了一座色彩鲜艳、装饰华丽的寺庙入口，大概率源自南亚地区……* 4. **图像**：户外烧烤活动 **字幕**：*本图像特写了一位正在户外烤制食物的人物。该人物身着围裙……* 5. **图像**：静谧港湾 **字幕**：*本图像描绘了晴朗蓝天下点缀着零星云朵的静谧港湾场景……* ## 应用场景本数据集可支持多种视觉语言任务： * 长文本图像字幕生成 * 视觉场景理解 * 多模态锚定与推理 * 针对BLIP、IDEFICS、Flamingo等视觉语言模型的微调任务 ## 引用说明若您使用本数据集，请引用原始数据集： > **BLIP3o/BLIP3o-Pretrain-Long-Caption** > [https://huggingface.co/datasets/BLIP3o/BLIP3o-Pretrain-Long-Caption](https://huggingface.co/datasets/BLIP3o/BLIP3o-Pretrain-Long-Caption) 同时请标注本衍生数据集： > **由prithivMLmods整理的Caption3o-Opt**

提供机构：

maas

创建时间：

2025-07-03

5,000+

优质数据集

54 个

任务类型

进入经典数据集