Sam04/au30_tra_strict_clean_highconf_nopunct_absaudio
收藏Hugging Face2026-03-07 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/Sam04/au30_tra_strict_clean_highconf_nopunct_absaudio
下载链接
链接失效反馈官方服务:
资源简介:
# Strict Cleaned Transcriptions
Source: `Sam04/au30_tra`
Applied:
- punctuation removed from transcription (including Ethiopic punctuation like ።)
- removed non-high confidence rows
- removed empty rows
- removed rows containing latin letters
- removed rows containing non-Ethiopic letters
- removed rows with < 4 words
- removed rows with single-character first/last word
- removed exact duplicate-group rows (mode: strip)
- converted audio to absolute URLs:
https://huggingface.co/datasets/Sam04/au30_tra/resolve/main/<audio_path>
提供机构:
Sam04



