five

cymen-arfor/lleisiau-arfor

收藏
Hugging Face2025-04-03 更新2024-12-21 收录
下载链接:
https://hf-mirror.com/datasets/cymen-arfor/lleisiau-arfor
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集是由Cymen创建的,作为ARFOR资助项目的一部分,旨在收集大量高质量的威尔士语语音数据及其对应的转录文本,特别是来自Arfor地区的非正式、对话式和自发式语音。数据集将用于改进威尔士语的语音识别技术,并确保威尔士语在最新技术发展中的可用性。数据集包括三个主要部分:`test`、`train`和`dev`,以及每个部分的`clean`版本。`train`部分包含80%的数据,`test`和`dev`各包含10%。在`clean`版本中,所有语言注释和特殊字符已被移除,以减少数据格式化的需求。数据集包含四列:`path`、`sentence`、`accent`和`language`。

This dataset was created at Cymen as part of a project funded by ARFOR in collaboration with the Language Technologies Unit at Bangor University. The goal of the project was to collect a large amount of high quality Welsh speech data and their corresponding transcriptions with a particular focus on informal, conversational and spontaneous speech from the Arfor area. The dataset consists of three splits `test`, `train` and `dev` as well as a `clean` version for each of those data splits. The transcription style loosely follows the guidelines of the Language Technologies Unit’s Banc Trawsgrifiadau, particularly in punctuation and data formatting. The dataset consists of four columns: path, sentence, accent and language.
提供机构:
cymen-arfor
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作