pixmo-cap

Name: pixmo-cap
Creator: maas
Published: 2025-12-05 16:36:30
License: 暂无描述

魔搭社区2025-12-05 更新2025-02-15 收录

下载链接：

https://modelscope.cn/datasets/allenai/pixmo-cap

下载链接

链接失效反馈

官方服务：

资源简介：

# PixMo-Cap PixMo-Cap is a dataset of very long (roughly 200 words on average), detailed captions. It can be used to pre-train and fine-tune vision-language models. PixMo-Cap was created by recording annotators speaking about an image for 60-90 seconds and then using the [Claude large language model](https://claude.ai/) to turn the audio transcripts(s) into a long caption. The audio transcripts are also included. PixMo-Cap is part of the [PixMo dataset collection](https://huggingface.co/collections/allenai/pixmo-674746ea613028006285687b) and was used to train the [Molmo family of models](https://huggingface.co/collections/allenai/molmo-66f379e6fe3b8ef090a8ca19) Quick links: - 📃 [Paper](https://molmo.allenai.org/paper.pdf) - 🎥 [Blog with Videos](https://molmo.allenai.org/blog) ## Loading ```python data = datasets.load_dataset("allenai/pixmo-cap", split="train") ``` ## Data Format Images are stored as URLs that will need to be downloaded separately. The `transcripts` fields contains one or more audio transcripts The `caption` field contains the caption from the LLM. ## License This dataset is licensed by ODC-BY-1.0. It is intended for research and educational use in accordance with Ai2's [Responsible Use Guidelines](https://allenai.org/responsible-use). This dataset includes output data generated from Claude which are subject to Anthropic [terms of service](https://www.anthropic.com/legal/commercial-terms) and [usage policy](https://www.anthropic.com/legal/aup).

# PixMo-Cap PixMo-Cap是一个包含平均约200词的详细长字幕数据集，可用于视觉语言模型的预训练与微调。 PixMo-Cap的构建方式为，组织标注员针对单张图像进行60至90秒的口头描述，随后通过[Claude大语言模型(LLM)]将音频转录文本转化为长字幕，同时也会保留原始音频转录文件。 PixMo-Cap隶属于[PixMo数据集集合](https://huggingface.co/collections/allenai/pixmo-674746ea613028006285687b)，曾用于训练[Molmo系列模型](https://huggingface.co/collections/allenai/molmo-66f379e6fe3b8ef090a8ca19) ## 快速链接 - 📃 [论文](https://molmo.allenai.org/paper.pdf) - 🎥 [带视频的博客](https://molmo.allenai.org/blog) ## 数据加载 python data = datasets.load_dataset("allenai/pixmo-cap", split="train") ## 数据格式图像以URL形式存储，需单独下载。 `transcripts`字段包含一条或多条音频转录文本。 `caption`字段包含由大语言模型生成的字幕。 ## 许可协议本数据集采用ODC-BY-1.0许可协议发布，仅可用于研究与教育用途，并需遵循艾伦人工智能研究所(AI2)的[负责任使用指南](https://allenai.org/responsible-use)。本数据集包含由Claude生成的输出数据，需遵守Anthropic的[服务条款](https://www.anthropic.com/legal/commercial-terms)与[使用政策](https://www.anthropic.com/legal/aup)。

提供机构：

maas

创建时间：

2025-05-28

搜集汇总

数据集介绍