openhermes
收藏魔搭社区2026-05-06 更新2025-12-06 收录
下载链接:
https://modelscope.cn/datasets/teknium/openhermes
下载链接
链接失效反馈官方服务:
资源简介:
# OpenHermes Dataset

The OpenHermes dataset is composed of 242,000 entries of primarily GPT-4 generated data, from open datasets across the AI landscape, including:
OpenHermes 13B is the first fine tune of the Hermes dataset that has a fully open source dataset!
OpenHermes was trained on 242,000 entries of primarily GPT-4 generated data, from open datasets across the AI landscape, including:
- GPTeacher - General Instruct, Roleplay v1, Roleplay v2, and Code Instruct Datasets, by Teknium
- WizardLM (v1, evol_instruct 70k), by WizardLM Team/nlpxucan
- Airoboros GPT-4 (v1.0), by JonDurbin
- Camel-AI's domain expert datasets, by the Camel-AI Team
- CodeAlpaca, by Sahil2801
- GPT4-LLM and Unnatural Instructions, by Microsoft
Filtering included removal of OpenAI refusals, disclaimers, and "As an AI" type examples and more
The base dataset mix is identical to the original Nous-Hermes', minus the Nous-Instruct and PDACTL datasets which were private datasets.
# OpenHermes数据集

OpenHermes数据集由24.2万条数据条目构成,其数据主要源自AI领域各类开源数据集,且以GPT-4生成的数据为主,具体涵盖:
OpenHermes 13B是首个基于完全开源数据集完成微调的Hermes系列模型!
OpenHermes模型的训练数据包含24.2万条条目,同样以GPT-4生成的数据为主,数据来源覆盖AI领域各类开源数据集,具体包括:
- GPTeacher系列数据集:包含通用指令(General Instruct)、角色扮演v1、角色扮演v2以及代码指令(Code Instruct)子数据集,作者为Teknium
- WizardLM(含v1、evol_instruct 70k版本),由WizardLM团队与nlpxucan联合开发
- Airoboros GPT-4(v1.0版本),作者为JonDurbin
- Camel-AI领域专家数据集,由Camel-AI团队打造
- CodeAlpaca数据集,作者为Sahil2801
- GPT4-LLM与非自然指令(Unnatural Instructions)数据集,由微软(Microsoft)推出
数据筛选环节移除了OpenAI的拒绝回复、免责声明以及“作为AI”类示例等内容。
其基础数据集组合与原版Nous-Hermes完全一致,但剔除了其中属于私有数据集的Nous-Instruct与PDACTL数据集。
提供机构:
maas
创建时间:
2025-11-18



