Persian Mozilla Common Voice

Name: Persian Mozilla Common Voice
Creator: Mozilla Foundation
License: 暂无描述

arXiv2025-09-30 收录

下载链接：

https://commonvoice.mozilla.org/fa/datasets

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集旨在训练英语自动语音识别（ASR）模型，该模型最初在波斯语的Mozilla Common Voice数据集上进行预训练，随后在一个包含五个英文句子和五个波斯文句子的领域内数据集上进行微调。此外，数据集还包括使用预训练的ASR模型转录的自由文本语音，以增强训练数据的质量。同时，该数据集也适用于语音验证任务。

This dataset is designed for training English automatic speech recognition (ASR) models. These models are initially pre-trained on the Persian subset of the Mozilla Common Voice dataset, and then fine-tuned on an in-domain dataset containing five English sentences and five Persian sentences. Additionally, the dataset also includes free-text speech transcribed using the pre-trained ASR model to improve the quality of the training data. Meanwhile, this dataset is also suitable for voice verification tasks.

提供机构：

Mozilla Foundation

5,000+

优质数据集

54 个

任务类型

进入经典数据集