persian-voice-v1

Name: persian-voice-v1
Creator: maas
Published: 2026-01-06 16:43:57
License: 暂无描述

魔搭社区2026-01-06 更新2025-12-06 收录

下载链接：

https://modelscope.cn/datasets/vhdm/persian-voice-v1

下载链接

链接失效反馈

官方服务：

资源简介：

# 🗣️ Common Voice 17 — Persian (Spelling-Corrected Edition) This is a refined version of the **Persian subset** of Mozilla's Common Voice 17 dataset, specially curated to enhance the performance of ASR (Automatic Speech Recognition) systems in Persian. ## 🛠️ Why this matters The original dataset contained a significant number of spelling inconsistencies and typographical errors, which negatively impacted transcription accuracy and model alignment. ## ✨ What’s improved Over **28,000** transcriptions were automatically cleaned using **GPT-4o**, with a focus on preserving the original meaning while correcting orthographic issues. The audio files remain unchanged, and all metadata fields are preserved for compatibility. ## 🎯 Use cases This corrected dataset is ideal for fine-tuning Whisper, Wav2Vec2, or other speech-to-text models in Persian — providing cleaner, more reliable supervision for training and evaluation. ## 🤝 Contribute or collaborate Feel free to open an issue or submit pull requests if you'd like to contribute further improvements or validations.

# 🗣️ 通用语音17（Common Voice 17）——波斯语子集（拼写修正版）本数据集是 Mozilla 通用语音17（Common Voice 17）数据集**波斯语子集**的优化版本，专为提升波斯语自动语音识别（Automatic Speech Recognition, ASR）系统的性能而精心整理。 ## 🛠️ 优化意义原始数据集存在大量拼写不一致与排版错误，对转录准确率与模型对齐效果造成了负面影响。 ## ✨ 优化内容本次优化通过 **GPT-4o** 自动清理了超过**28,000条**转录文本，在保留原始语义的前提下修正了正字法问题。音频文件未做任何改动，所有元数据字段均予以保留以确保兼容性。 ## 🎯 适用场景本修正数据集非常适合用于波斯语环境下的 Whisper、Wav2Vec2 或其他语音转文本模型的微调工作，可为模型训练与评估提供更规范可靠的监督信号。 ## 🤝 贡献与协作若您希望参与进一步的优化或验证工作，欢迎提交议题（Issue）或拉取请求（Pull Request）。

提供机构：

maas

创建时间：

2025-08-23

5,000+

优质数据集

54 个

任务类型

进入经典数据集