scribe-project/npsc_nb

Name: scribe-project/npsc_nb
Creator: scribe-project
Published: 2023-04-25 10:23:19
License: 暂无描述

Hugging Face2023-04-25 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/scribe-project/npsc_nb

下载链接

链接失效反馈

官方服务：

资源简介：

这是挪威议会语音语料库（NPSC）的Bokmål部分，用于训练和测试STORTINGET模型，并且只包含长度小于15秒的片段。数据集的语言是挪威Bokmål。数据集的创建过程包括从挪威语言银行获取数据，并使用Spraakbanken下载器和标准化脚本进行处理。数据集的许可信息为CC0，并提供了引用信息。

提供机构：

scribe-project

原始信息汇总

数据集概述

数据集名称

NPSC Bokmål (< 15 sec. segments)

数据集特征

speaker_id: 字符串
gender: 字符串
utterance_id: 字符串
language: 字符串
raw_text: 字符串
full_audio_file: 字符串
original_data_split: 字符串
region: 字符串
duration: 浮点数
start: 浮点数
end: 浮点数
utterance_audio_file: 音频
standardized_text: 字符串

数据集分割

train: 40008个样本，8190809957.84字节
test: 5044个样本，1026553338.856字节
validation: 5461个样本，1097030649.769字节

数据集大小

下载大小: 10261847599字节
数据集总大小: 10314393946.465字节

语言

挪威语 Bokmål

许可证

引用信息

@inproceedings{ solberg2023improving, title={Improving Generalization of Norwegian {ASR} with Limited Linguistic Resources}, author={Per Erik Solberg and Pablo Ortiz and Phoebe Parsons and Torbj{o}rn Svendsen and Giampiero Salvi}, booktitle={The 24rd Nordic Conference on Computational Linguistics}, year={2023} }

5,000+

优质数据集

54 个

任务类型

进入经典数据集