slprl/SpokenSwag
收藏Hugging Face2025-02-25 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/slprl/SpokenSwag
下载链接
链接失效反馈官方服务:
资源简介:
SpokenSwag数据集是基于allenai/swag文本数据集,通过合成语音技术生成的音频数据集。该数据集包含了训练集和验证集,用于增强spoken language models的语义能力。数据集中的音频由4位说话者(2男2女)合成,每个样本包括提示文本、选择的文本、被拒绝的文本以及对应的音频文件。此外,每个样本还包含了自动BLEU分数用于过滤重复样本。
SpokenSwag is an audio dataset based on the allenai/swag text dataset, generated using text-to-speech synthesis. It includes both training and validation sets designed to enhance the semantic capabilities of spoken language models. The dataset features audio samples synthesized by four speakers (two male and two female), with each sample consisting of a prompt text, a chosen text, a rejected text, and corresponding audio files. Additionally, each sample includes an auto-Bleu score for filtering repetitive samples.
提供机构:
slprl



