five

NVSpeech170k

收藏
魔搭社区2025-08-20 更新2025-08-16 收录
下载链接:
https://modelscope.cn/datasets/Virgo-Internal/NVSpeech170k
下载链接
链接失效反馈
官方服务:
资源简介:
# NVSpeech Dataset ## Overview The NVSpeech dataset provides extensive annotations of paralinguistic vocalizations for Mandarin Chinese speech, aimed at enhancing the capabilities of automatic speech recognition (ASR) and text-to-speech (TTS) systems. The dataset features explicit word-level annotations for 18 categories of paralinguistic vocalizations, including non-verbal sounds like laughter and breathing, as well as lexicalized interjections like "uhm" and "oh." ## Dataset Description * **NVSpeech**: An automatically annotated larger subset consisting of 174,179 utterances (573.4 hours of speech). Annotations in this set are generated by a state-of-the-art paralinguistic-aware ASR model, ensuring scalability and diversity for robust model training. ## Annotation Categories The NVSpeech dataset includes annotations for the following paralinguistic vocalization categories: * [Breathing] * [Laughter] * [Cough] * [Sigh] * [Confirmation-en] * [Question-en] * [Question-ah] * [Question-oh] * [Surprise-ah] * [Surprise-oh] * [Dissatisfaction-hnn] * [Uhm] * [Shh] * [Crying] * [Surprise-wa] * [Surprise-yo] * [Question-ei] * [Question-yi] ## Usage ```py from datasets import load_dataset dataset = load_dataset("Hannie0813/NVSpeech170k") ``` ### Intended Use NVSpeech is designed to facilitate: * Training and evaluation of paralinguistic-aware speech recognition models. * Development of expressive and controllable TTS systems that can accurately synthesize human-like speech with inline paralinguistic cues. ### Tasks * Automatic Speech Recognition (ASR) * Text-to-Speech (TTS) Synthesis * Paralinguistic Tagging ## Languages * Mandarin Chinese ## Dataset Structure * **Format**: Audio (WAV format) paired with text annotations including inline paralinguistic tokens. * **Size**: 174,179 automatically annotated utterances, totaling over 573 hours. ## License NVSpeech dataset is available for research use under the Creative Commons Attribution-NonCommercial-ShareAlike (CC BY-NC-SA) license. ## Citation If you use NVSpeech in your research, please cite: ```bibtex ``` ## Contact For further questions, please visit the [project webpage](https://nvspeech.github.io/) or contact the authors through the provided channels.

# NVSpeech 数据集 ## 概述 NVSpeech 数据集针对汉语普通话语音提供了丰富的副语言发声(paralinguistic vocalizations)标注,旨在提升自动语音识别(ASR)与文本到语音(TTS)系统的性能。该数据集为18类副语言发声提供了精准的词级标注,涵盖笑声、呼吸声等非语音声音,以及「uhm」「oh」这类词汇化感叹词。 ## 数据集说明 * **NVSpeech**:该子集为自动标注的大规模子集,包含174,179条语音片段(总时长573.4小时)。其标注由当前前沿的感知副语言的ASR模型生成,可为鲁棒的模型训练提供可扩展性与多样性保障。 ## 标注类别 NVSpeech 数据集包含以下副语言发声类别的标注: * [呼吸声(Breathing)] * [笑声(Laughter)] * [咳嗽声(Cough)] * [叹息声(Sigh)] * [确认-en(Confirmation-en)] * [疑问-en(Question-en)] * [疑问-ah(Question-ah)] * [疑问-oh(Question-oh)] * [惊讶-ah(Surprise-ah)] * [惊讶-oh(Surprise-oh)] * [不满-hnn(Dissatisfaction-hnn)] * [Uhm(Uhm)] * [Shh(Shh)] * [哭泣声(Crying)] * [惊讶-wa(Surprise-wa)] * [惊讶-yo(Surprise-yo)] * [疑问-ei(Question-ei)] * [疑问-yi(Question-yi)] ## 使用方法 py from datasets import load_dataset dataset = load_dataset("Hannie0813/NVSpeech170k") ### 预期用途 NVSpeech 旨在支持以下工作: * 感知副语言的语音识别模型的训练与评估 * 开发具备表现力与可控性的TTS系统,该系统可精准合成带有内嵌副语言线索的类人语音。 ### 任务方向 * 自动语音识别(ASR) * 文本到语音(TTS)合成 * 副语言标注 ## 语言类型 * 汉语普通话 ## 数据集结构 * **格式**:音频(WAV格式)与包含内嵌副语言标记的文本标注配对。 * **规模**:174,179条自动标注的语音片段,总时长超过573小时。 ## 许可协议 NVSpeech 数据集依据知识共享署名-非商业性使用-相同方式共享(CC BY-NC-SA)许可协议开放,仅供研究使用。 ## 引用信息 若您在研究中使用NVSpeech数据集,请引用如下文献: bibtex ## 联系方式 如有进一步疑问,请访问[项目主页](https://nvspeech.github.io/)或通过指定渠道联系作者。
提供机构:
maas
创建时间:
2025-08-14
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
NVSpeech170k是一个针对普通话语音的副语言发声标注数据集,包含174,179个自动标注的语音片段,总计573.4小时,旨在增强自动语音识别和文本到语音系统的能力。数据集涵盖18个副语言发声类别,如笑声和呼吸声,适用于训练副语言感知的ASR和TTS模型,以及副语言标注任务。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作