Arabic Audio Text Dataset Repository
收藏Figshare2025-09-14 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/Arabic_Audio_Text_Dataset_Repository/30121789
下载链接
链接失效反馈官方服务:
资源简介:
This dataset, from the IDEAL 2025 project, was created to evaluate how well commercial speech-to-text (STT) tools transcribe Arabic speech.Dataset ComponentsThe repository includes several key components:Audio Files: Original recordings of native Arabic speakers, categorized by length (Short, Medium, Long, Very Long).Human Transcripts: Manually created transcripts that serve as the "ground truth" for accuracy comparison.Tool Transcripts: Machine-generated transcripts from six commercial STT tools: Clipto, Maestra, Notta, Sonix, Turboscribe, and Veed.Metadata: An Excel file containing details like recording duration, topic, word counts, speaker age, gender, and transcription accuracy scores.Ethical Documentation: Includes IRB approval, ensuring data was collected ethically and personally identifiable information was removed.
创建时间:
2025-09-14



