Caption3o-Opt
收藏魔搭社区2025-12-03 更新2025-07-05 收录
下载链接:
https://modelscope.cn/datasets/prithivMLmods/Caption3o-Opt
下载链接
链接失效反馈官方服务:
资源简介:
# **Caption3o-Opt**
**Caption3o-Opt** is a compact, high-quality image-caption dataset derived from the original [BLIP3o/BLIP3o-Pretrain-Long-Caption](https://huggingface.co/datasets/BLIP3o/BLIP3o-Pretrain-Long-Caption). This refined subset focuses on optimized long-form captioning, curated for real-world and artistic image understanding across vision-language models.
## Overview
- **Total Samples**: 10,278
- **Modality**: Image ↔ Text
- **Format**: Arrow (auto-converted to Parquet)
- **License**: Apache 2.0
- **Language**: English
- **Size**: ~500 MB
## Dataset Structure
| Field | Type | Description |
| ------- | ------ | ----------------------------------------------- |
| image | image | Input image in binary format |
| caption | string | Long-form, descriptive caption for the image |
## Quickstart with 🤗 Datasets
```bash
pip install datasets
````
```python
from datasets import load_dataset
# Load dataset
dataset = load_dataset("prithivMLmods/Caption3o-Opt", split="train")
# View a sample
print(dataset[0])
```
## Example Entries
1. **Image**: Religious statue
**Caption**: *The image depicts a religious figure adorned in elaborate, ornate attire, likely a statue or icon of a saint or Virgin Mary...*
2. **Image**: Historic building with clock tower
**Caption**: *The image captures a grand, historic building under a clear blue sky. The structure features ornate architectural details...*
3. **Image**: South Asian temple entrance
**Caption**: *The image depicts the entrance of a vibrant and ornate temple, likely of South Asian origin...*
4. **Image**: Outdoor grilling event
**Caption**: *The image shows a close-up of a person grilling food outdoors. The individual is wearing an apron...*
5. **Image**: Scenic harbor
**Caption**: *The image depicts a serene harbor scene under a clear blue sky with a few scattered clouds...*
## Use Cases
This dataset supports a variety of vision-language tasks:
* Long-form image captioning
* Visual scene understanding
* Multi-modal grounding and reasoning
* Fine-tuning VLMs like BLIP, IDEFICS, Flamingo, etc.
## Citation
If you use this dataset, please cite the original dataset:
> **BLIP3o/BLIP3o-Pretrain-Long-Caption**
> [https://huggingface.co/datasets/BLIP3o/BLIP3o-Pretrain-Long-Caption](https://huggingface.co/datasets/BLIP3o/BLIP3o-Pretrain-Long-Caption)
And reference this curated derivative:
> **Caption3o-Opt by prithivMLmods**
# **Caption3o-Opt**
**Caption3o-Opt** 是一款轻量化且高质量的图像-字幕数据集,其源自原始数据集[BLIP3o/BLIP3o-Pretrain-Long-Caption](https://huggingface.co/datasets/BLIP3o/BLIP3o-Pretrain-Long-Caption)。该精选子集聚焦于优化后的长文本字幕生成任务,专为视觉语言模型的真实场景与艺术图像理解任务精心构建。
## 概览
- **总样本量**:10,278
- **模态**:图像↔文本
- **格式**:Arrow格式(可自动转换为Parquet格式)
- **许可协议**:Apache 2.0
- **语言**:英语
- **体积**:约500 MB
## 数据集结构
| 字段名 | 数据类型 | 描述说明 |
| ------- | -------- | -------------------------------------------- |
| image | 图像 | 二进制格式的输入图像 |
| caption | 字符串 | 用于描述该图像的长文本详细字幕 |
## 快速上手(基于🤗 Datasets库)
bash
pip install datasets
python
from datasets import load_dataset
# 加载数据集
dataset = load_dataset("prithivMLmods/Caption3o-Opt", split="train")
# 查看单条样本
print(dataset[0])
## 示例条目
1. **图像**:宗教雕像
**字幕**:*本图像展示了一尊身着华丽繁复服饰的宗教造像,大概率为某位圣徒或圣母玛利亚的雕像或圣像……*
2. **图像**:带钟楼的历史建筑
**字幕**:*本图像定格了晴朗蓝天下一座宏伟的历史建筑。该建筑拥有精致繁复的建筑细节……*
3. **图像**:南亚寺庙入口
**字幕**:*本图像展现了一座色彩鲜艳、装饰华丽的寺庙入口,大概率源自南亚地区……*
4. **图像**:户外烧烤活动
**字幕**:*本图像特写了一位正在户外烤制食物的人物。该人物身着围裙……*
5. **图像**:静谧港湾
**字幕**:*本图像描绘了晴朗蓝天下点缀着零星云朵的静谧港湾场景……*
## 应用场景
本数据集可支持多种视觉语言任务:
* 长文本图像字幕生成
* 视觉场景理解
* 多模态锚定与推理
* 针对BLIP、IDEFICS、Flamingo等视觉语言模型的微调任务
## 引用说明
若您使用本数据集,请引用原始数据集:
> **BLIP3o/BLIP3o-Pretrain-Long-Caption**
> [https://huggingface.co/datasets/BLIP3o/BLIP3o-Pretrain-Long-Caption](https://huggingface.co/datasets/BLIP3o/BLIP3o-Pretrain-Long-Caption)
同时请标注本衍生数据集:
> **由prithivMLmods整理的Caption3o-Opt**
提供机构:
maas
创建时间:
2025-07-03



