NVSpeech170k

Name: NVSpeech170k
Creator: maas
Published: 2025-08-20 17:53:02
License: 暂无描述

魔搭社区2025-08-20 更新2025-08-16 收录

下载链接：

https://modelscope.cn/datasets/Virgo-Internal/NVSpeech170k

下载链接

链接失效反馈

官方服务：

资源简介：

# NVSpeech Dataset ## Overview The NVSpeech dataset provides extensive annotations of paralinguistic vocalizations for Mandarin Chinese speech, aimed at enhancing the capabilities of automatic speech recognition (ASR) and text-to-speech (TTS) systems. The dataset features explicit word-level annotations for 18 categories of paralinguistic vocalizations, including non-verbal sounds like laughter and breathing, as well as lexicalized interjections like "uhm" and "oh." ## Dataset Description * **NVSpeech**: An automatically annotated larger subset consisting of 174,179 utterances (573.4 hours of speech). Annotations in this set are generated by a state-of-the-art paralinguistic-aware ASR model, ensuring scalability and diversity for robust model training. ## Annotation Categories The NVSpeech dataset includes annotations for the following paralinguistic vocalization categories: * [Breathing] * [Laughter] * [Cough] * [Sigh] * [Confirmation-en] * [Question-en] * [Question-ah] * [Question-oh] * [Surprise-ah] * [Surprise-oh] * [Dissatisfaction-hnn] * [Uhm] * [Shh] * [Crying] * [Surprise-wa] * [Surprise-yo] * [Question-ei] * [Question-yi] ## Usage ```py from datasets import load_dataset dataset = load_dataset("Hannie0813/NVSpeech170k") ``` ### Intended Use NVSpeech is designed to facilitate: * Training and evaluation of paralinguistic-aware speech recognition models. * Development of expressive and controllable TTS systems that can accurately synthesize human-like speech with inline paralinguistic cues. ### Tasks * Automatic Speech Recognition (ASR) * Text-to-Speech (TTS) Synthesis * Paralinguistic Tagging ## Languages * Mandarin Chinese ## Dataset Structure * **Format**: Audio (WAV format) paired with text annotations including inline paralinguistic tokens. * **Size**: 174,179 automatically annotated utterances, totaling over 573 hours. ## License NVSpeech dataset is available for research use under the Creative Commons Attribution-NonCommercial-ShareAlike (CC BY-NC-SA) license. ## Citation If you use NVSpeech in your research, please cite: ```bibtex ``` ## Contact For further questions, please visit the [project webpage](https://nvspeech.github.io/) or contact the authors through the provided channels.

# NVSpeech 数据集 ## 概述 NVSpeech 数据集针对汉语普通话语音提供了丰富的副语言发声（paralinguistic vocalizations）标注，旨在提升自动语音识别（ASR）与文本到语音（TTS）系统的性能。该数据集为18类副语言发声提供了精准的词级标注，涵盖笑声、呼吸声等非语音声音，以及「uhm」「oh」这类词汇化感叹词。 ## 数据集说明 * **NVSpeech**：该子集为自动标注的大规模子集，包含174,179条语音片段（总时长573.4小时）。其标注由当前前沿的感知副语言的ASR模型生成，可为鲁棒的模型训练提供可扩展性与多样性保障。 ## 标注类别 NVSpeech 数据集包含以下副语言发声类别的标注： * [呼吸声（Breathing）] * [笑声（Laughter）] * [咳嗽声（Cough）] * [叹息声（Sigh）] * [确认-en（Confirmation-en）] * [疑问-en（Question-en）] * [疑问-ah（Question-ah）] * [疑问-oh（Question-oh）] * [惊讶-ah（Surprise-ah）] * [惊讶-oh（Surprise-oh）] * [不满-hnn（Dissatisfaction-hnn）] * [Uhm（Uhm）] * [Shh（Shh）] * [哭泣声（Crying）] * [惊讶-wa（Surprise-wa）] * [惊讶-yo（Surprise-yo）] * [疑问-ei（Question-ei）] * [疑问-yi（Question-yi）] ## 使用方法 py from datasets import load_dataset dataset = load_dataset("Hannie0813/NVSpeech170k") ### 预期用途 NVSpeech 旨在支持以下工作： * 感知副语言的语音识别模型的训练与评估 * 开发具备表现力与可控性的TTS系统，该系统可精准合成带有内嵌副语言线索的类人语音。 ### 任务方向 * 自动语音识别（ASR） * 文本到语音（TTS）合成 * 副语言标注 ## 语言类型 * 汉语普通话 ## 数据集结构 * **格式**：音频（WAV格式）与包含内嵌副语言标记的文本标注配对。 * **规模**：174,179条自动标注的语音片段，总时长超过573小时。 ## 许可协议 NVSpeech 数据集依据知识共享署名-非商业性使用-相同方式共享（CC BY-NC-SA）许可协议开放，仅供研究使用。 ## 引用信息若您在研究中使用NVSpeech数据集，请引用如下文献： bibtex ## 联系方式如有进一步疑问，请访问[项目主页](https://nvspeech.github.io/)或通过指定渠道联系作者。

提供机构：

maas

创建时间：

2025-08-14

搜集汇总

数据集介绍