anthracite-org/pixmo-cap-images
收藏Hugging Face2024-11-30 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/anthracite-org/pixmo-cap-images
下载链接
链接失效反馈官方服务:
资源简介:
PixMo-Cap是一个包含详细长文本描述的数据集,主要用于预训练和微调视觉-语言模型。数据集中的描述是通过记录注释者对图像的60-90秒语音描述,并使用Claude大型语言模型将音频转录转换为长文本描述。数据集还包括音频转录。此外,数据集是PixMo数据集集合的一部分,并用于训练Molmo系列模型。数据集格式包括图像、图像URL、描述和转录。
PixMo-Cap is a dataset of very long (roughly 200 words on average), detailed captions for image-to-text tasks. It is used for pre-training and fine-tuning vision-language models. The dataset was created by recording annotators speaking about an image for 60-90 seconds and then using the Claude large language model to turn the audio transcripts into long captions. The audio transcripts are also included. PixMo-Cap is part of the PixMo dataset collection and was used to train the Molmo family of models. Unlike the original release, images are included in the dataset itself.
提供机构:
anthracite-org



