five

Caption3o-Opt

收藏
魔搭社区2025-12-03 更新2025-07-05 收录
下载链接:
https://modelscope.cn/datasets/prithivMLmods/Caption3o-Opt
下载链接
链接失效反馈
官方服务:
资源简介:
# **Caption3o-Opt** **Caption3o-Opt** is a compact, high-quality image-caption dataset derived from the original [BLIP3o/BLIP3o-Pretrain-Long-Caption](https://huggingface.co/datasets/BLIP3o/BLIP3o-Pretrain-Long-Caption). This refined subset focuses on optimized long-form captioning, curated for real-world and artistic image understanding across vision-language models. ## Overview - **Total Samples**: 10,278 - **Modality**: Image ↔ Text - **Format**: Arrow (auto-converted to Parquet) - **License**: Apache 2.0 - **Language**: English - **Size**: ~500 MB ## Dataset Structure | Field | Type | Description | | ------- | ------ | ----------------------------------------------- | | image | image | Input image in binary format | | caption | string | Long-form, descriptive caption for the image | ## Quickstart with 🤗 Datasets ```bash pip install datasets ```` ```python from datasets import load_dataset # Load dataset dataset = load_dataset("prithivMLmods/Caption3o-Opt", split="train") # View a sample print(dataset[0]) ``` ## Example Entries 1. **Image**: Religious statue **Caption**: *The image depicts a religious figure adorned in elaborate, ornate attire, likely a statue or icon of a saint or Virgin Mary...* 2. **Image**: Historic building with clock tower **Caption**: *The image captures a grand, historic building under a clear blue sky. The structure features ornate architectural details...* 3. **Image**: South Asian temple entrance **Caption**: *The image depicts the entrance of a vibrant and ornate temple, likely of South Asian origin...* 4. **Image**: Outdoor grilling event **Caption**: *The image shows a close-up of a person grilling food outdoors. The individual is wearing an apron...* 5. **Image**: Scenic harbor **Caption**: *The image depicts a serene harbor scene under a clear blue sky with a few scattered clouds...* ## Use Cases This dataset supports a variety of vision-language tasks: * Long-form image captioning * Visual scene understanding * Multi-modal grounding and reasoning * Fine-tuning VLMs like BLIP, IDEFICS, Flamingo, etc. ## Citation If you use this dataset, please cite the original dataset: > **BLIP3o/BLIP3o-Pretrain-Long-Caption** > [https://huggingface.co/datasets/BLIP3o/BLIP3o-Pretrain-Long-Caption](https://huggingface.co/datasets/BLIP3o/BLIP3o-Pretrain-Long-Caption) And reference this curated derivative: > **Caption3o-Opt by prithivMLmods**

# **Caption3o-Opt** **Caption3o-Opt** 是一款轻量化且高质量的图像-字幕数据集,其源自原始数据集[BLIP3o/BLIP3o-Pretrain-Long-Caption](https://huggingface.co/datasets/BLIP3o/BLIP3o-Pretrain-Long-Caption)。该精选子集聚焦于优化后的长文本字幕生成任务,专为视觉语言模型的真实场景与艺术图像理解任务精心构建。 ## 概览 - **总样本量**:10,278 - **模态**:图像↔文本 - **格式**:Arrow格式(可自动转换为Parquet格式) - **许可协议**:Apache 2.0 - **语言**:英语 - **体积**:约500 MB ## 数据集结构 | 字段名 | 数据类型 | 描述说明 | | ------- | -------- | -------------------------------------------- | | image | 图像 | 二进制格式的输入图像 | | caption | 字符串 | 用于描述该图像的长文本详细字幕 | ## 快速上手(基于🤗 Datasets库) bash pip install datasets python from datasets import load_dataset # 加载数据集 dataset = load_dataset("prithivMLmods/Caption3o-Opt", split="train") # 查看单条样本 print(dataset[0]) ## 示例条目 1. **图像**:宗教雕像 **字幕**:*本图像展示了一尊身着华丽繁复服饰的宗教造像,大概率为某位圣徒或圣母玛利亚的雕像或圣像……* 2. **图像**:带钟楼的历史建筑 **字幕**:*本图像定格了晴朗蓝天下一座宏伟的历史建筑。该建筑拥有精致繁复的建筑细节……* 3. **图像**:南亚寺庙入口 **字幕**:*本图像展现了一座色彩鲜艳、装饰华丽的寺庙入口,大概率源自南亚地区……* 4. **图像**:户外烧烤活动 **字幕**:*本图像特写了一位正在户外烤制食物的人物。该人物身着围裙……* 5. **图像**:静谧港湾 **字幕**:*本图像描绘了晴朗蓝天下点缀着零星云朵的静谧港湾场景……* ## 应用场景 本数据集可支持多种视觉语言任务: * 长文本图像字幕生成 * 视觉场景理解 * 多模态锚定与推理 * 针对BLIP、IDEFICS、Flamingo等视觉语言模型的微调任务 ## 引用说明 若您使用本数据集,请引用原始数据集: > **BLIP3o/BLIP3o-Pretrain-Long-Caption** > [https://huggingface.co/datasets/BLIP3o/BLIP3o-Pretrain-Long-Caption](https://huggingface.co/datasets/BLIP3o/BLIP3o-Pretrain-Long-Caption) 同时请标注本衍生数据集: > **由prithivMLmods整理的Caption3o-Opt**
提供机构:
maas
创建时间:
2025-07-03
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作