kiarashQ/farsi-asr-unified-cleaned
收藏Hugging Face2025-11-03 更新2025-10-25 收录
下载链接:
https://hf-mirror.com/datasets/kiarashQ/farsi-asr-unified-cleaned
下载链接
链接失效反馈官方服务:
资源简介:
Farsi ASR统一数据集是一个大规模、高质量、完全标准化的波斯语(Farsi)语音转文本数据集,专为现代机器学习和自动语音识别(ASR)工作流设计。该数据集整合了多个开源音频-文本对,经过严格的清洗和标准化流程,并以高效的方式存储在Parquet分片中,内嵌音频。总时长约为1392.8小时,总样本数为1278937个。
The Farsi ASR Unified Dataset is a large-scale, high-quality, and fully standardized collection of Persian (Farsi) speech-to-text data, designed specifically for modern machine learning and ASR (Automatic Speech Recognition) workflows. This dataset consolidates audio-text pairs from multiple open sources, applies a rigorous cleaning and normalization pipeline, and stores everything efficiently in Parquet shards with embedded audio.
提供机构:
kiarashQ



