OpenHermes-2.5-Formatted
收藏OpenHermes 2.5 - Formatted
概述
OpenHermes 2.5 - Formatted 是基于 teknium/OpenHermes-2.5 数据集的格式化版本,旨在更方便地集成到使用 OpenAI 聊天格式或不支持系统提示的训练脚本中。
数据集结构
数据集包含四个配置:
chat:使用 OpenAI 聊天 API 格式。joinsys:系统提示与第一个用户提示合并。nosys:系统提示被移除。teknium:原始的 OpenHermes-2.5 数据集。
chat
- 文件:
openhermes_2.5_chat.jsonl - 包含两列:"index" 和 "messages"。
- "index":实例在原始数据集中的索引。
- "messages":聊天消息,包含 "role" 和 "content" 键。
示例: json { "index": 0, "messages": [ {"role": "system", "content": "You are an assistant and must provide concise responses."}, {"role": "user", "content": "Which is correct? A. Humans are primates. B. Humans are fish."}, {"role": "assistant", "content": "A"} ] }
joinsys
- 文件:
openhermes_2.5_joinsys.jsonl - 与
chat相同,但系统消息与第一个用户消息合并。 - 系统提示与用户提示的合并使用 "
"、" " 或 " " 作为分隔符。
示例: json { "index": 0, "messages": [ {"role": "user", "content": "You are an assistant and must provide concise responses. Which is correct? A. Humans are primates. B. Humans are fish."}, {"role": "assistant", "content": "A"} ] }
nosys
- 文件:
openhermes_2.5_nosys.jsonl - 与
chat相同,但系统消息被完全移除。
示例: json { "index": 0, "messages": [ {"role": "user", "content": "Which is correct? A. Humans are primates. B. Humans are fish."}, {"role": "assistant", "content": "A"} ] }
teknium
- 文件:
openhermes_2.5_teknium.jsonl - 原始的 OpenHermes-2.5 数据集,包含 "index" 字段。
- 消息在 "conversations" 列中,使用 ShareGPT 格式。
示例: json { "index": 0, "conversations": [ {"from": "system", "value": "You are an assistant and must provide concise responses."}, {"from": "human", "value": "Which is correct? A. Humans are primates. B. Humans are fish."}, {"from": "gpt", "value": "A"} ] }
引用
在发布基于此数据集的结果时,请包含指向此数据集的链接。
正式引用请使用以下格式: bibtex @misc{OpenHermes 2.5, title = {OpenHermes 2.5: An Open Dataset of Synthetic Data for Generalist LLM Assistants}, author = {Teknium}, year = {2023}, publisher = {HuggingFace}, url = {https://huggingface.co/datasets/teknium/OpenHermes-2.5} }




