five

pixmo-cap

收藏
魔搭社区2025-12-05 更新2025-02-15 收录
下载链接:
https://modelscope.cn/datasets/allenai/pixmo-cap
下载链接
链接失效反馈
官方服务:
资源简介:
# PixMo-Cap PixMo-Cap is a dataset of very long (roughly 200 words on average), detailed captions. It can be used to pre-train and fine-tune vision-language models. PixMo-Cap was created by recording annotators speaking about an image for 60-90 seconds and then using the [Claude large language model](https://claude.ai/) to turn the audio transcripts(s) into a long caption. The audio transcripts are also included. PixMo-Cap is part of the [PixMo dataset collection](https://huggingface.co/collections/allenai/pixmo-674746ea613028006285687b) and was used to train the [Molmo family of models](https://huggingface.co/collections/allenai/molmo-66f379e6fe3b8ef090a8ca19) Quick links: - 📃 [Paper](https://molmo.allenai.org/paper.pdf) - 🎥 [Blog with Videos](https://molmo.allenai.org/blog) ## Loading ```python data = datasets.load_dataset("allenai/pixmo-cap", split="train") ``` ## Data Format Images are stored as URLs that will need to be downloaded separately. The `transcripts` fields contains one or more audio transcripts The `caption` field contains the caption from the LLM. ## License This dataset is licensed by ODC-BY-1.0. It is intended for research and educational use in accordance with Ai2's [Responsible Use Guidelines](https://allenai.org/responsible-use). This dataset includes output data generated from Claude which are subject to Anthropic [terms of service](https://www.anthropic.com/legal/commercial-terms) and [usage policy](https://www.anthropic.com/legal/aup).

# PixMo-Cap PixMo-Cap是一个包含平均约200词的详细长字幕数据集,可用于视觉语言模型的预训练与微调。 PixMo-Cap的构建方式为,组织标注员针对单张图像进行60至90秒的口头描述,随后通过[Claude大语言模型(LLM)]将音频转录文本转化为长字幕,同时也会保留原始音频转录文件。 PixMo-Cap隶属于[PixMo数据集集合](https://huggingface.co/collections/allenai/pixmo-674746ea613028006285687b),曾用于训练[Molmo系列模型](https://huggingface.co/collections/allenai/molmo-66f379e6fe3b8ef090a8ca19) ## 快速链接 - 📃 [论文](https://molmo.allenai.org/paper.pdf) - 🎥 [带视频的博客](https://molmo.allenai.org/blog) ## 数据加载 python data = datasets.load_dataset("allenai/pixmo-cap", split="train") ## 数据格式 图像以URL形式存储,需单独下载。 `transcripts`字段包含一条或多条音频转录文本。 `caption`字段包含由大语言模型生成的字幕。 ## 许可协议 本数据集采用ODC-BY-1.0许可协议发布,仅可用于研究与教育用途,并需遵循艾伦人工智能研究所(AI2)的[负责任使用指南](https://allenai.org/responsible-use)。 本数据集包含由Claude生成的输出数据,需遵守Anthropic的[服务条款](https://www.anthropic.com/legal/commercial-terms)与[使用政策](https://www.anthropic.com/legal/aup)。
提供机构:
maas
创建时间:
2025-05-28
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作