pixmo-cap
收藏魔搭社区2025-12-05 更新2025-02-15 收录
下载链接:
https://modelscope.cn/datasets/allenai/pixmo-cap
下载链接
链接失效反馈官方服务:
资源简介:
# PixMo-Cap
PixMo-Cap is a dataset of very long (roughly 200 words on average), detailed captions.
It can be used to pre-train and fine-tune vision-language models.
PixMo-Cap was created by recording annotators speaking about an image for 60-90 seconds and then using the [Claude large language model](https://claude.ai/) to turn the audio transcripts(s) into a long caption.
The audio transcripts are also included.
PixMo-Cap is part of the [PixMo dataset collection](https://huggingface.co/collections/allenai/pixmo-674746ea613028006285687b) and was used to train the [Molmo family of models](https://huggingface.co/collections/allenai/molmo-66f379e6fe3b8ef090a8ca19)
Quick links:
- 📃 [Paper](https://molmo.allenai.org/paper.pdf)
- 🎥 [Blog with Videos](https://molmo.allenai.org/blog)
## Loading
```python
data = datasets.load_dataset("allenai/pixmo-cap", split="train")
```
## Data Format
Images are stored as URLs that will need to be downloaded separately.
The `transcripts` fields contains one or more audio transcripts
The `caption` field contains the caption from the LLM.
## License
This dataset is licensed by ODC-BY-1.0. It is intended for research and educational use in accordance with Ai2's [Responsible Use Guidelines](https://allenai.org/responsible-use).
This dataset includes output data generated from Claude which are subject to Anthropic [terms of service](https://www.anthropic.com/legal/commercial-terms) and [usage policy](https://www.anthropic.com/legal/aup).
# PixMo-Cap
PixMo-Cap是一个包含平均约200词的详细长字幕数据集,可用于视觉语言模型的预训练与微调。
PixMo-Cap的构建方式为,组织标注员针对单张图像进行60至90秒的口头描述,随后通过[Claude大语言模型(LLM)]将音频转录文本转化为长字幕,同时也会保留原始音频转录文件。
PixMo-Cap隶属于[PixMo数据集集合](https://huggingface.co/collections/allenai/pixmo-674746ea613028006285687b),曾用于训练[Molmo系列模型](https://huggingface.co/collections/allenai/molmo-66f379e6fe3b8ef090a8ca19)
## 快速链接
- 📃 [论文](https://molmo.allenai.org/paper.pdf)
- 🎥 [带视频的博客](https://molmo.allenai.org/blog)
## 数据加载
python
data = datasets.load_dataset("allenai/pixmo-cap", split="train")
## 数据格式
图像以URL形式存储,需单独下载。
`transcripts`字段包含一条或多条音频转录文本。
`caption`字段包含由大语言模型生成的字幕。
## 许可协议
本数据集采用ODC-BY-1.0许可协议发布,仅可用于研究与教育用途,并需遵循艾伦人工智能研究所(AI2)的[负责任使用指南](https://allenai.org/responsible-use)。
本数据集包含由Claude生成的输出数据,需遵守Anthropic的[服务条款](https://www.anthropic.com/legal/commercial-terms)与[使用政策](https://www.anthropic.com/legal/aup)。
提供机构:
maas
创建时间:
2025-05-28



