teknium/openhermes
收藏Hugging Face2023-09-07 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/teknium/openhermes
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- eng
pretty_name: "OpenHermes-v1.0"
tags:
- distillation
- synthetic data
- gpt
task_categories:
- text-generation
---
# OpenHermes Dataset

The OpenHermes dataset is composed of 242,000 entries of primarily GPT-4 generated data, from open datasets across the AI landscape, including:
OpenHermes 13B is the first fine tune of the Hermes dataset that has a fully open source dataset!
OpenHermes was trained on 242,000 entries of primarily GPT-4 generated data, from open datasets across the AI landscape, including:
- GPTeacher - General Instruct, Roleplay v1, Roleplay v2, and Code Instruct Datasets, by Teknium
- WizardLM (v1, evol_instruct 70k), by WizardLM Team/nlpxucan
- Airoboros GPT-4 (v1.0), by JonDurbin
- Camel-AI's domain expert datasets, by the Camel-AI Team
- CodeAlpaca, by Sahil2801
- GPT4-LLM and Unnatural Instructions, by Microsoft
Filtering included removal of OpenAI refusals, disclaimers, and "As an AI" type examples and more
The base dataset mix is identical to the original Nous-Hermes', minus the Nous-Instruct and PDACTL datasets which were private datasets.
提供机构:
teknium
原始信息汇总
OpenHermes-v1.0 数据集概述
数据集基本信息
- 语言: 英语 (eng)
- 别名: OpenHermes
- 标签:
- 蒸馏
- 合成数据
- GPT
- 任务类别: 文本生成
数据集内容
- 数据规模: 包含242,000条数据
- 数据来源: 主要由GPT-4生成,来源于多个开放数据集,包括:
- GPTeacher系列数据集,由Teknium提供
- WizardLM (v1, evol_instruct 70k),由WizardLM Team/nlpxucan提供
- Airoboros GPT-4 (v1.0),由JonDurbin提供
- Camel-AI的领域专家数据集,由Camel-AI Team提供
- CodeAlpaca,由Sahil2801提供
- GPT4-LLM和Unnatural Instructions,由Microsoft提供
数据处理
- 过滤: 移除了OpenAI拒绝响应、免责声明以及“作为AI”类型的示例等。



