mikelalda/common_voice_17_0_eu
收藏Hugging Face2025-10-02 更新2025-10-25 收录
下载链接:
https://hf-mirror.com/datasets/mikelalda/common_voice_17_0_eu
下载链接
链接失效反馈官方服务:
资源简介:
Common Voice 17.0 Euskera子集是一个包含巴斯克语音频片段及其相应转录的数据集,适用于与Hugging Face datasets库等配合使用。该数据集整合了Common Voice 17.0原始数据集中的不同分割段,如train、test、dev、validated、invalidated和other,并且保留了原始的segment列以便需要时进行分割。数据集以Parquet格式提供,并包含演讲者ID、音频文件路径、句子ID、句子转录、社区投票、演讲者元数据和语言地区代码等特征。该数据集旨在支持语音技术的研究和开发,尤其是巴斯克语的自动语音识别。
This dataset contains the Euskera (Basque) subset of the Mozilla Common Voice 17.0 dataset, including audio clips and their corresponding transcriptions, ready for use with libraries like Hugging Face datasets. It combines different splits from the original Common Voice 17.0 dataset such as train, test, dev, validated, invalidated, and other, while retaining the original segment column for potential splitting. The dataset is provided in Parquet format with features including speaker ID, audio file path, sentence ID, sentence transcription, community votes, speaker metadata, and language locale code. It is intended for research and development in speech technologies, particularly for Euskera Automatic Speech Recognition.
提供机构:
mikelalda



