five

lodrick-the-lafted/Hermes-100K

收藏
Hugging Face2024-02-21 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/lodrick-the-lafted/Hermes-100K
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - eng pretty_name: Hermes-100K tags: - distillation - synthetic data - gpt task_categories: - text-generation --- It's 100K rows sampled from teknium/openhermes (not the newer 2.5). Filtered some GPTisms I dislike out, and removed rows with short output as well to bias towards longer answers. bad_phrases = ["couldn't help but", "can't resist", "random", "unethical", "I'm sorry, but", "I'm sorry but", "as an AI", "as a Language Model", "AI Language Model", "language model", "However, it is important to", "However, it's important", "ethical guidelines", "just an AI", "within my programming", "illegal", "cannot provide"]
提供机构:
lodrick-the-lafted
原始信息汇总

数据集概述

基本信息

  • 语言: 英语
  • 名称: Hermes-100K
  • 标签:
    • 蒸馏 (distillation)
    • 合成数据 (synthetic data)
    • GPT
  • 任务类别: 文本生成 (text-generation)

数据来源

  • 数据集是从 teknium/openhermes(非更新的2.5版本)中采样的100,000行数据。

数据处理

  • 过滤了一些不受欢迎的表达方式。
  • 移除了输出较短的行,以偏向于较长的回答。

过滤词组

  • couldnt help but
  • cant resist
  • random
  • unethical
  • Im sorry, but
  • Im sorry but
  • as an AI
  • as a Language Model
  • AI Language Model
  • language model
  • However, it is important to
  • However, its important
  • ethical guidelines
  • just an AI
  • within my programming
  • illegal
  • cannot provide
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作