mimir-project/mimir-core
收藏Hugging Face2025-03-13 更新2025-04-08 收录
下载链接:
https://hf-mirror.com/datasets/mimir-project/mimir-core
下载链接
链接失效反馈官方服务:
资源简介:
这是一个包含多种语言语料的数据集,包括挪威语、冰岛语和英语等。数据集规模在10B到100B之间,分为默认、不良、中等和优质四个配置,每个配置包含训练集和验证集,文件类型为JSON。
This dataset contains corpus in multiple languages, including Norwegian, Icelandic, and English, etc. The dataset size ranges from 10B to 100B, divided into four configurations: default, bad, medium, and good, each with a training set and a validation set, in JSON format.
提供机构:
mimir-project



