FusionAudio-1.2M
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/satsuki2486441738/FusionAudio
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是一个大规模的数据集,包含了120万的详细音频描述和600万的问答对。它旨在通过融合多模态的上下文信息来提升细粒度音频描述的能力。与其它数据集相比,FusionAudio-1.2M的描述长度更长,语义多样性更高,超过50%的样本融合了来自多个模态的信息。该数据集的规模为120万音频描述和600万问答对,针对的任务是音频描述。
This is a large-scale dataset containing 1.2 million detailed audio descriptions and 6 million question-answer pairs. It aims to enhance the capability of fine-grained audio description by fusing multimodal contextual information. Compared with other datasets, FusionAudio-1.2M has longer description lengths and higher semantic diversity, with over 50% of samples integrating information from multiple modalities. This dataset, with a scale of 1.2 million audio descriptions and 6 million question-answer pairs, targets the task of audio description.



