OpenHermes-2.5-zh
收藏魔搭社区2026-05-23 更新2024-06-08 收录
下载链接:
https://modelscope.cn/datasets/swift/OpenHermes-2.5-zh
下载链接
链接失效反馈官方服务:
资源简介:
# Dataset Card for OpenHermes-2.5-zh
This is a partial Chinese translation of the [OpenHermes-2.5](https://huggingface.co/datasets/teknium/OpenHermes-2.5) dataset as well as [glaiveai/glaive-function-calling](https://huggingface.co/datasets/glaiveai/glaive-function-calling). Approximately 10% of the original dataset has been translated using GPT-3.5, and low-quality translations have been filtered out.
OpenHermes is a diverse and high-quality instruction tuning dataset that primarily contains samples generated with GPT-4. This Chinese version can serve as a complement for fine-tuning LLM models to help them handle Chinese instructions better.
这是 [OpenHermes-2.5](https://huggingface.co/datasets/teknium/OpenHermes-2.5) 数据集以及 [glaiveai/glaive-function-calling](https://huggingface.co/datasets/glaiveai/glaive-function-calling) 的部分中文翻译。我用 GPT-3.5 翻译了原数据大约 10% 的样本并过滤掉了低质量的翻译。
OpenHermes 是一个多样化高质量的指令微调数据集,主要包含由 GPT-4 生成的样本。这个中文版本可以作为微调中文LLM的补充。
## Data Structure
The dataset contains 91506 samples, each of which has the same structure as OpenHermes-2.5. Only fields in conversations are translated, and other fields are kept the same as the original dataset. The following is an example of a sample in the dataset:
```json
{
"system_prompt": str,
"id": str,
"origin_idx": int, // the orginal index of the sample in the OpenHermes-2.5
"model_name": null,
"avatarUrl": null,
"topic": null,
"custom_instruction": null,
"views": null,
"hash": null,
"idx": null,
"source": "glaiveai/glaive-function-calling-v2", // from which split of the OpenHermes-2.5 the sample comes
"conversations": [
{
"from": "system",
"value": "您是一个乐于助人的助手...",
"weight": null
},
{
"from": "human",
"value": "使用Python编程语言编写一个函数...",
"weight": null
},
{
"from": "gpt",
"value": "这是用于函数的Python代码...",
"weight": null
},
//...
],
"title": null,
"category": null,
"skip_prompt_formatting": null,
"model": null,
"language": null
}
```
## Citation
```bibtex
@misc{OpenHermes 2.5-zh,
title = {OpenHermes 2.5-zh: A partial Chinese translation of OpenHermes-2.5},
author = {Wenbo Pan},
year = {2024},
publisher = {HuggingFace},
url = {https://huggingface.co/datasets/wenbopan/OpenHermes-2.5-zh}
}
```
# 数据集卡片:OpenHermes-2.5-zh
本数据集为[OpenHermes-2.5](https://huggingface.co/datasets/teknium/OpenHermes-2.5)与[glaiveai/glaive-function-calling](https://huggingface.co/datasets/glaiveai/glaive-function-calling)数据集的部分中文译本。我们通过GPT-3.5翻译了原数据集约10%的样本,并过滤了低质量的翻译结果。
OpenHermes是一个兼具多样性与高质量的指令微调数据集,其样本主要由GPT-4生成。本中文译本可作为大语言模型(LLM)微调的补充数据集,助力模型更好地处理中文指令。
## 数据集结构
本数据集共包含91506条样本,每条样本的结构与OpenHermes-2.5保持一致。仅对话(conversations)字段的内容被译为中文,其余字段均保留原始数据集格式。以下为本数据集的一条样本示例:
json
{
"system_prompt": str,
"id": str,
"origin_idx": int, // 该样本在OpenHermes-2.5中的原始索引
"model_name": null,
"avatarUrl": null,
"topic": null,
"custom_instruction": null,
"views": null,
"hash": null,
"idx": null,
"source": "glaiveai/glaive-function-calling-v2", // 该样本所属的OpenHermes-2.5数据集分支
"conversations": [
{
"from": "system",
"value": "您是一个乐于助人的助手...",
"weight": null
},
{
"from": "human",
"value": "使用Python编程语言编写一个函数...",
"weight": null
},
{
"from": "gpt",
"value": "这是用于函数的Python代码...",
"weight": null
},
//...
],
"title": null,
"category": null,
"skip_prompt_formatting": null,
"model": null,
"language": null
}
## 引用
bibtex
@misc{OpenHermes 2.5-zh,
title = {OpenHermes 2.5-zh:OpenHermes-2.5数据集的部分中文译本},
author = {潘文博 (Wenbo Pan)},
year = {2024},
publisher = {HuggingFace},
url = {https://huggingface.co/datasets/wenbopan/OpenHermes-2.5-zh}
}
提供机构:
maas
创建时间:
2024-06-06



