NSC

Name: NSC
Creator: IMDA (Infocomm Media Development Authority)
License: 暂无描述

arXiv2025-09-30 收录

下载链接：

https://www.imda.gov.sg/about-imda/emerging-technologies-and-research/artificial-intelligence/national-speech-corpus

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集名为“国家语音语料库”，包含了大约10600小时的新加坡英语使用者的录音资料，这些资料被划分为六个部分，涵盖了各种主题和类型的口语表达。此外，该数据集还包含了详尽的说话者元信息，并已经过质量验证和筛选流程。其规模达到10600小时，适用于语音识别、对话分析、语码转换和主题对话等任务。

Named "National Speech Corpus", this dataset contains approximately 10,600 hours of audio recordings from Singapore English speakers. These recordings are divided into six segments, covering a wide range of topics and types of spoken discourse. In addition, the dataset includes comprehensive speaker metadata, and has undergone rigorous quality validation and screening procedures. With a total size of 10,600 hours, it is suitable for tasks including speech recognition, conversational analysis, code-switching, and topic-based dialogue.

提供机构：

IMDA (Infocomm Media Development Authority)

5,000+

优质数据集

54 个

任务类型

进入经典数据集