SNuC: The Sheffield Numbers Spoken Language Corpus

Name: SNuC: The Sheffield Numbers Spoken Language Corpus
Creator: orda.shef.ac.uk
Published: 2024-03-13 00:00:00
License: 暂无描述

orda.shef.ac.uk2024-03-13 更新2025-03-25 收录

下载链接：

https://orda.shef.ac.uk/articles/dataset/SNuC_The_Sheffield_Numbers_Spoken_Language_Corpus/19673772/3

下载链接

链接失效反馈

官方服务：

资源简介：

SNuC is the first published corpus of spoken alphanumeric identifiers of the sort typically used as serial and part numbers in the manufacturing sector. The dataset contains recordings and transcriptions of over 50 native British English speakers, speaking over 13,000 multi-character alphanumeric sequences and totalling almost 20 hours of recorded speech. Ethical approval to use human participants to gather spoken data using the setup described above was sought and obtained via the University of Sheffield's Research Ethics Review procedures (application 031449). Please refer to the following paper for more information about this dataset: Barker, E., Barker, J., Gaizauskas, R., Ma, N., Paramita, M. L. 2022. SNuC: The Sheffield Numbers Spoken Language Corpus. In: Proceedings of LREC 2022 (forthcoming).

SNuC乃首次公开发表的口语字母数字标识语料库，此类标识通常用于制造业领域的序列号和部件号。该数据集包含超过50位母语为英国英语的说话者的录音和转录，他们共计发音超过13,000个多字符的字母数字序列，录音时长总计近20小时。关于使用上述设置收集口语数据涉及人类参与者的伦理审批，已通过谢菲尔德大学的科研伦理审查程序（申请编号：031449）获得批准。欲了解更多关于本数据集的信息，请参阅以下论文：Barker, E., Barker, J., Gaizauskas, R., Ma, N., Paramita, M. L. 2022. SNuC: The Sheffield Numbers Spoken Language Corpus. In: Proceedings of LREC 2022 (即将出版).

提供机构：

orda.shef.ac.uk

5,000+

优质数据集

54 个

任务类型

进入经典数据集