lodrick-the-lafted/Hermes-100K
收藏Hugging Face2024-02-21 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/lodrick-the-lafted/Hermes-100K
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- eng
pretty_name: Hermes-100K
tags:
- distillation
- synthetic data
- gpt
task_categories:
- text-generation
---
It's 100K rows sampled from teknium/openhermes (not the newer 2.5).
Filtered some GPTisms I dislike out, and removed rows with short output as well to bias towards longer answers.
bad_phrases = ["couldn't help but", "can't resist", "random", "unethical", "I'm sorry, but", "I'm sorry but", "as an AI", "as a Language Model", "AI Language Model", "language model", "However, it is important to", "However, it's important", "ethical guidelines", "just an AI", "within my programming", "illegal", "cannot provide"]
提供机构:
lodrick-the-lafted
原始信息汇总
数据集概述
基本信息
- 语言: 英语
- 名称: Hermes-100K
- 标签:
- 蒸馏 (distillation)
- 合成数据 (synthetic data)
- GPT
- 任务类别: 文本生成 (text-generation)
数据来源
- 数据集是从
teknium/openhermes(非更新的2.5版本)中采样的100,000行数据。
数据处理
- 过滤了一些不受欢迎的表达方式。
- 移除了输出较短的行,以偏向于较长的回答。
过滤词组
couldnt help butcant resistrandomunethicalIm sorry, butIm sorry butas an AIas a Language ModelAI Language Modellanguage modelHowever, it is important toHowever, its importantethical guidelinesjust an AIwithin my programmingillegalcannot provide



