NVSpeech170k
收藏魔搭社区2025-08-20 更新2025-08-16 收录
下载链接:
https://modelscope.cn/datasets/Virgo-Internal/NVSpeech170k
下载链接
链接失效反馈官方服务:
资源简介:
# NVSpeech Dataset
## Overview
The NVSpeech dataset provides extensive annotations of paralinguistic vocalizations for Mandarin Chinese speech, aimed at enhancing the capabilities of automatic speech recognition (ASR) and text-to-speech (TTS) systems. The dataset features explicit word-level annotations for 18 categories of paralinguistic vocalizations, including non-verbal sounds like laughter and breathing, as well as lexicalized interjections like "uhm" and "oh."
## Dataset Description
* **NVSpeech**: An automatically annotated larger subset consisting of 174,179 utterances (573.4 hours of speech). Annotations in this set are generated by a state-of-the-art paralinguistic-aware ASR model, ensuring scalability and diversity for robust model training.
## Annotation Categories
The NVSpeech dataset includes annotations for the following paralinguistic vocalization categories:
* [Breathing]
* [Laughter]
* [Cough]
* [Sigh]
* [Confirmation-en]
* [Question-en]
* [Question-ah]
* [Question-oh]
* [Surprise-ah]
* [Surprise-oh]
* [Dissatisfaction-hnn]
* [Uhm]
* [Shh]
* [Crying]
* [Surprise-wa]
* [Surprise-yo]
* [Question-ei]
* [Question-yi]
## Usage
```py
from datasets import load_dataset
dataset = load_dataset("Hannie0813/NVSpeech170k")
```
### Intended Use
NVSpeech is designed to facilitate:
* Training and evaluation of paralinguistic-aware speech recognition models.
* Development of expressive and controllable TTS systems that can accurately synthesize human-like speech with inline paralinguistic cues.
### Tasks
* Automatic Speech Recognition (ASR)
* Text-to-Speech (TTS) Synthesis
* Paralinguistic Tagging
## Languages
* Mandarin Chinese
## Dataset Structure
* **Format**: Audio (WAV format) paired with text annotations including inline paralinguistic tokens.
* **Size**: 174,179 automatically annotated utterances, totaling over 573 hours.
## License
NVSpeech dataset is available for research use under the Creative Commons Attribution-NonCommercial-ShareAlike (CC BY-NC-SA) license.
## Citation
If you use NVSpeech in your research, please cite:
```bibtex
```
## Contact
For further questions, please visit the [project webpage](https://nvspeech.github.io/) or contact the authors through the provided channels.
# NVSpeech 数据集
## 概述
NVSpeech 数据集针对汉语普通话语音提供了丰富的副语言发声(paralinguistic vocalizations)标注,旨在提升自动语音识别(ASR)与文本到语音(TTS)系统的性能。该数据集为18类副语言发声提供了精准的词级标注,涵盖笑声、呼吸声等非语音声音,以及「uhm」「oh」这类词汇化感叹词。
## 数据集说明
* **NVSpeech**:该子集为自动标注的大规模子集,包含174,179条语音片段(总时长573.4小时)。其标注由当前前沿的感知副语言的ASR模型生成,可为鲁棒的模型训练提供可扩展性与多样性保障。
## 标注类别
NVSpeech 数据集包含以下副语言发声类别的标注:
* [呼吸声(Breathing)]
* [笑声(Laughter)]
* [咳嗽声(Cough)]
* [叹息声(Sigh)]
* [确认-en(Confirmation-en)]
* [疑问-en(Question-en)]
* [疑问-ah(Question-ah)]
* [疑问-oh(Question-oh)]
* [惊讶-ah(Surprise-ah)]
* [惊讶-oh(Surprise-oh)]
* [不满-hnn(Dissatisfaction-hnn)]
* [Uhm(Uhm)]
* [Shh(Shh)]
* [哭泣声(Crying)]
* [惊讶-wa(Surprise-wa)]
* [惊讶-yo(Surprise-yo)]
* [疑问-ei(Question-ei)]
* [疑问-yi(Question-yi)]
## 使用方法
py
from datasets import load_dataset
dataset = load_dataset("Hannie0813/NVSpeech170k")
### 预期用途
NVSpeech 旨在支持以下工作:
* 感知副语言的语音识别模型的训练与评估
* 开发具备表现力与可控性的TTS系统,该系统可精准合成带有内嵌副语言线索的类人语音。
### 任务方向
* 自动语音识别(ASR)
* 文本到语音(TTS)合成
* 副语言标注
## 语言类型
* 汉语普通话
## 数据集结构
* **格式**:音频(WAV格式)与包含内嵌副语言标记的文本标注配对。
* **规模**:174,179条自动标注的语音片段,总时长超过573小时。
## 许可协议
NVSpeech 数据集依据知识共享署名-非商业性使用-相同方式共享(CC BY-NC-SA)许可协议开放,仅供研究使用。
## 引用信息
若您在研究中使用NVSpeech数据集,请引用如下文献:
bibtex
## 联系方式
如有进一步疑问,请访问[项目主页](https://nvspeech.github.io/)或通过指定渠道联系作者。
提供机构:
maas
创建时间:
2025-08-14
搜集汇总
数据集介绍

背景与挑战
背景概述
NVSpeech170k是一个针对普通话语音的副语言发声标注数据集,包含174,179个自动标注的语音片段,总计573.4小时,旨在增强自动语音识别和文本到语音系统的能力。数据集涵盖18个副语言发声类别,如笑声和呼吸声,适用于训练副语言感知的ASR和TTS模型,以及副语言标注任务。
以上内容由遇见数据集搜集并总结生成



