HiTZ/composite_corpus_eu_v2.1
收藏Hugging Face2024-12-19 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/HiTZ/composite_corpus_eu_v2.1
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是一个用于巴斯克语自动语音识别(ASR)任务的复合数据集,版本为v2.1。数据集由多个公开可用的数据源组合而成,包括Mozilla Foundation的Common Voice 18.0、Basque Parliament 1和OpenSLR的SLR76子集。数据集分为训练集、测试集和开发集,每个集又进一步细分为多个子集。训练集包含来自Common Voice、Basque Parliament和OpenSLR的数据,测试集和开发集则分别包含来自这些数据源的独立子集。数据集的总时长和句子数量在README中详细列出。
This is a composite dataset for Basque language, version 2.1, composed of publicly available data. The dataset includes train, test, and dev splits, each composed of different public datasets. The train split is composed of Common Voice 18.0, Basque Parliament dataset, and the SLR76 subset of OpenSLR. The test and dev splits are also composed of different parts of these datasets. Each part of the dataset has detailed source, duration, and number of sentences.
提供机构:
HiTZ



