five

anthracite-org/pixmo-cap-images

收藏
Hugging Face2024-11-30 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/anthracite-org/pixmo-cap-images
下载链接
链接失效反馈
官方服务:
资源简介:
PixMo-Cap是一个包含详细长文本描述的数据集,主要用于预训练和微调视觉-语言模型。数据集中的描述是通过记录注释者对图像的60-90秒语音描述,并使用Claude大型语言模型将音频转录转换为长文本描述。数据集还包括音频转录。此外,数据集是PixMo数据集集合的一部分,并用于训练Molmo系列模型。数据集格式包括图像、图像URL、描述和转录。

PixMo-Cap is a dataset of very long (roughly 200 words on average), detailed captions for image-to-text tasks. It is used for pre-training and fine-tuning vision-language models. The dataset was created by recording annotators speaking about an image for 60-90 seconds and then using the Claude large language model to turn the audio transcripts into long captions. The audio transcripts are also included. PixMo-Cap is part of the PixMo dataset collection and was used to train the Molmo family of models. Unlike the original release, images are included in the dataset itself.
提供机构:
anthracite-org
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作