DewiBrynJones/banc-trawsgrifiadau-bangor-normalized
收藏Hugging Face2025-12-17 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/DewiBrynJones/banc-trawsgrifiadau-bangor-normalized
下载链接
链接失效反馈官方服务:
资源简介:
Bangor转录银行是一个包含52小时自然语音片段的数据集,来自50多位贡献者的mp3格式文件,以及相应的逐字转录.tsv文件。大部分语音是自发的自然语音。该数据集主要用于训练语音识别模型,特别是wav2vec模型,并包含特定的转录约定以确保转录的逐字性和一致性。数据集以CC0开放许可证分发。README还详细描述了资源的创建过程、匿名化实践以及未来完善转录的计划。
The Bangor Transcription Bank is a dataset comprising 52 hours of natural speech segments from over 50 contributors in mp3 file format, along with corresponding verbatim transcripts in .tsv files. The majority of the speech is spontaneous, natural speech. The dataset is designed to act as training data for speech recognition models, including wav2vec models, and includes a bespoke set of conventions for transcription work to ensure verbatim and consistent transcriptions. The material is distributed under a CC0 open license. The README also details the process of creating the resource, anonymization practices, and future plans for refining the transcripts.
提供机构:
DewiBrynJones



