five

VoiceAssistant-400K

收藏
arXiv2025-09-30 收录
下载链接:
https://github.com/SJTU-OmniAgent/VocalNet
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集包含了约47万个由GPT-4o专门生成的条目,提供了查询音频和回应的转录文本。在整理过程中,创建了一个清洁版本,去除了回应过长的实例。因此,规模上从47万个条目调整为43万组查询与回应的配对。该数据集的任务是语音生成及识别。

This dataset contains approximately 470,000 entries specially generated by GPT-4o, which provide transcribed texts of query audios and their corresponding responses. During the curation process, a cleaned version was created by filtering out instances with overly long responses. As a result, the dataset size was adjusted from 470,000 entries to 430,000 query-response pairs. The tasks targeted by this dataset are speech generation and speech recognition.
提供机构:
Mini-Omni
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作