MWirelabs/mizo-language-corpus-4M
收藏Hugging Face2025-10-23 更新2025-10-25 收录
下载链接:
https://hf-mirror.com/datasets/MWirelabs/mizo-language-corpus-4M
下载链接
链接失效反馈官方服务:
资源简介:
Mizo-Language-Corpus-4M是一个开源的单语种Mizo语言数据集,包含400万个句子,由MWireLabs精心制作,旨在支持自然语言处理研究,促进语言平等,并在低资源语言中推动开放开发。该数据集适用于训练和评估Mizo语言模型,包括掩码语言建模、情感分析、命名实体识别、文本分类、机器翻译等NLP任务。
The Mizo-Language-Corpus-4M is an open-source monolingual Mizo dataset containing 4 million sentences, curated by MWireLabs to support natural language processing research, promote linguistic equity, and foster open development in low-resource languages. It is suitable for training and evaluating Mizo language models for tasks such as masked language modeling, sentiment analysis, named entity recognition, text classification, machine translation, and more.
提供机构:
MWirelabs



