tachiwin/tachiwin_voice_raw
收藏Hugging Face2025-10-16 更新2025-10-25 收录
下载链接:
https://hf-mirror.com/datasets/tachiwin/tachiwin_voice_raw
下载链接
链接失效反馈官方服务:
资源简介:
该数据集收集了墨西哥土著语言的语音数据,这些数据是从23个向土著社区广播的公共广播电台记录的。数据集的目的是在没有语言识别的情况下预训练语言模型用于文本识别或语音识别,或者通过提供的线索进行注释,以便于模型微调或训练。数据集的格式是每天压缩为tar.gz格式的.aac文件,采样率为22 kHz。数据集是完全开源的,旨在为墨西哥和世界的土著语言缩小技术差距,让社区能够免费使用高端技术。
This dataset collects speech recordings in indigenous languages of Mexico from 23 public radio stations broadcasting to indigenous communities. The aim is to pretrain language models for text or speech recognition without language identification, or to annotate with provided hints for identification and further model fine-tuning or training. The dataset is open-source, aiming to bridge the technological gap for indigenous languages by making high-end technology accessible to communities for free.
提供机构:
tachiwin



