NPSC
收藏huggingface.co2025-01-21 收录
下载链接:
https://huggingface.co/datasets/NbAiLab/NPSC
下载链接
链接失效反馈官方服务:
资源简介:
The Norwegian Parliament Speech Corpus (NPSC) is a corpus for training a Norwegian ASR (Automatic Speech Recognition) models. The corpus is created by Språkbanken at the National Library in Norway.
NPSC is based on sound recording from meeting in the Norwegian Parliament. These talks are orthographically transcribed to either Norwegian Bokmål or Norwegian Nynorsk. In addition to the data actually included in this dataset, there is a significant amount of metadata that is included in the original corpus. Through the speaker id there is additional information about the speaker, like gender, age, and place of birth (ie dialect). Through the proceedings id the corpus can be linked to the official proceedings from the meetings.
The corpus is in total sound recordings from 40 entire days of meetings. This amounts to 140 hours of speech, 65,000 sentences or 1.2 million words.
挪威议会演讲语料库(NPSC)系专为训练挪威自动语音识别(ASR)模型而构建的语料库。该语料库由挪威国家图书馆内的Språkbanken所创建。NPSC语料库基于挪威议会的录音会议数据,其中演讲内容经文字转写,形成挪威布克莫尔语或挪威尼诺斯克语文本。除了本数据集实际包含的数据外,原始语料库中还包含大量元数据。通过演讲者ID,可获取关于演讲者的额外信息,如性别、年龄及出生地(即方言)。通过会议程序ID,该语料库可与会议的正式程序相联系。该语料库总计包含40天会议的录音,共计140小时语音、65,000个句子或1,200万个单词。
提供机构:
Nasjonalbiblioteket AI Lab



