lodrick-the-lafted/Hermes-217K
收藏Hugging Face2024-02-21 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/lodrick-the-lafted/Hermes-217K
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- eng
pretty_name: Hermes-217K
tags:
- distillation
- synthetic data
- gpt
task_categories:
- text-generation
---
It's 217K rows sampled from teknium/openhermes (not the newer 2.5).
Filtered some GPTisms I dislike out, and removed rows with short output as well to bias towards longer answers.
bad_phrases = ["couldn't help but", "can't resist", "random", "unethical", "I'm sorry, but", "I'm sorry but", "as an AI", "as a Language Model", "AI Language Model", "language model", "However, it is important to", "However, it's important", "ethical guidelines", "just an AI", "within my programming", "illegal", "cannot provide"]
提供机构:
lodrick-the-lafted
原始信息汇总
数据集概述
基本信息
- 语言: 英语
- 名称: Hermes-217K
- 标签:
- 蒸馏 (distillation)
- 合成数据 (synthetic data)
- GPT
- 任务类别: 文本生成 (text-generation)
数据来源
- 数据集包含从
teknium/openhermes(非更新的2.5版本)中抽样的217,000行数据。
数据处理
- 过滤了一些不受欢迎的GPT生成内容。
- 移除了输出较短的行,以偏向于较长的回答。
过滤词组
- 过滤的词组包括:
- "couldnt help but"
- "cant resist"
- "random"
- "unethical"
- "Im sorry, but"
- "Im sorry but"
- "as an AI"
- "as a Language Model"
- "AI Language Model"
- "language model"
- "However, it is important to"
- "However, its important"
- "ethical guidelines"
- "just an AI"
- "within my programming"
- "illegal"
- "cannot provide"



