five

openhermes

收藏
魔搭社区2026-05-06 更新2025-12-06 收录
下载链接:
https://modelscope.cn/datasets/teknium/openhermes
下载链接
链接失效反馈
官方服务:
资源简介:
# OpenHermes Dataset ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6317aade83d8d2fd903192d9/XIiSwLP1Uu94IUucGypyl.png) The OpenHermes dataset is composed of 242,000 entries of primarily GPT-4 generated data, from open datasets across the AI landscape, including: OpenHermes 13B is the first fine tune of the Hermes dataset that has a fully open source dataset! OpenHermes was trained on 242,000 entries of primarily GPT-4 generated data, from open datasets across the AI landscape, including: - GPTeacher - General Instruct, Roleplay v1, Roleplay v2, and Code Instruct Datasets, by Teknium - WizardLM (v1, evol_instruct 70k), by WizardLM Team/nlpxucan - Airoboros GPT-4 (v1.0), by JonDurbin - Camel-AI's domain expert datasets, by the Camel-AI Team - CodeAlpaca, by Sahil2801 - GPT4-LLM and Unnatural Instructions, by Microsoft Filtering included removal of OpenAI refusals, disclaimers, and "As an AI" type examples and more The base dataset mix is identical to the original Nous-Hermes', minus the Nous-Instruct and PDACTL datasets which were private datasets.

# OpenHermes数据集 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6317aade83d8d2fd903192d9/XIiSwLP1Uu94IUucGypyl.png) OpenHermes数据集由24.2万条数据条目构成,其数据主要源自AI领域各类开源数据集,且以GPT-4生成的数据为主,具体涵盖: OpenHermes 13B是首个基于完全开源数据集完成微调的Hermes系列模型! OpenHermes模型的训练数据包含24.2万条条目,同样以GPT-4生成的数据为主,数据来源覆盖AI领域各类开源数据集,具体包括: - GPTeacher系列数据集:包含通用指令(General Instruct)、角色扮演v1、角色扮演v2以及代码指令(Code Instruct)子数据集,作者为Teknium - WizardLM(含v1、evol_instruct 70k版本),由WizardLM团队与nlpxucan联合开发 - Airoboros GPT-4(v1.0版本),作者为JonDurbin - Camel-AI领域专家数据集,由Camel-AI团队打造 - CodeAlpaca数据集,作者为Sahil2801 - GPT4-LLM与非自然指令(Unnatural Instructions)数据集,由微软(Microsoft)推出 数据筛选环节移除了OpenAI的拒绝回复、免责声明以及“作为AI”类示例等内容。 其基础数据集组合与原版Nous-Hermes完全一致,但剔除了其中属于私有数据集的Nous-Instruct与PDACTL数据集。
提供机构:
maas
创建时间:
2025-11-18
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作