kyutai/Babillage
收藏Hugging Face2025-03-21 更新2025-04-08 收录
下载链接:
https://hf-mirror.com/datasets/kyutai/Babillage
下载链接
链接失效反馈官方服务:
资源简介:
Babillage是一个用于评估视觉语音模型的多模态基准数据集,包括COCO-Captions、OCR-VQA和VQAv2三个子数据集的语音转换形式。每个样本包含问题的音频、文本转录和时间对齐信息,对于COCO-Captions和OCR-VQA还包括答案的音频、文本转录和时间对齐信息。
Babillage is a multimodal benchmark dataset for evaluating Vision Speech Models, including the spoken form of three common vision-language benchmarks: COCO-Captions, OCR-VQA, and VQAv2. Each sample contains the audio of the question, its text transcript, and alignment information, and for COCO-Captions and OCR-VQA, it also includes the audio, text transcript, and alignment information of the answer.
提供机构:
kyutai



