MoLA-LLM/OpenHelix-NonThink-200k-v4
收藏Hugging Face2025-08-27 更新2025-10-18 收录
下载链接:
https://hf-mirror.com/datasets/MoLA-LLM/OpenHelix-NonThink-200k-v4
下载链接
链接失效反馈官方服务:
资源简介:
OpenHelix NonThink 200k是一个多样化的、通用的、平衡的数据集,旨在生成高质量的通用模型。该数据集从另外三个数据集中编译而成,不仅包含数学、编程和科学内容,还包含角色扮演、创意写作和通用问答等内容。数据集经过高ngram重叠的筛选,以确保更高的多样性和平衡性。
This is a new version of the OpenHelix series, which is a diverse, general, and balanced dataset intended to produce high-quality general models. The dataset is compiled from three other datasets, covering not only math, coding, and science but also roleplay, creative writing, and general QA. The dataset has been filtered to reduce ngram overlap for greater diversity and balance.
提供机构:
MoLA-LLM



