five

lodrick-the-lafted/Hermes-217K

收藏
Hugging Face2024-02-21 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/lodrick-the-lafted/Hermes-217K
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - eng pretty_name: Hermes-217K tags: - distillation - synthetic data - gpt task_categories: - text-generation --- It's 217K rows sampled from teknium/openhermes (not the newer 2.5). Filtered some GPTisms I dislike out, and removed rows with short output as well to bias towards longer answers. bad_phrases = ["couldn't help but", "can't resist", "random", "unethical", "I'm sorry, but", "I'm sorry but", "as an AI", "as a Language Model", "AI Language Model", "language model", "However, it is important to", "However, it's important", "ethical guidelines", "just an AI", "within my programming", "illegal", "cannot provide"]
提供机构:
lodrick-the-lafted
原始信息汇总

数据集概述

基本信息

  • 语言: 英语
  • 名称: Hermes-217K
  • 标签:
    • 蒸馏 (distillation)
    • 合成数据 (synthetic data)
    • GPT
  • 任务类别: 文本生成 (text-generation)

数据来源

  • 数据集包含从 teknium/openhermes(非更新的2.5版本)中抽样的217,000行数据。

数据处理

  • 过滤了一些不受欢迎的GPT生成内容。
  • 移除了输出较短的行,以偏向于较长的回答。

过滤词组

  • 过滤的词组包括:
    • "couldnt help but"
    • "cant resist"
    • "random"
    • "unethical"
    • "Im sorry, but"
    • "Im sorry but"
    • "as an AI"
    • "as a Language Model"
    • "AI Language Model"
    • "language model"
    • "However, it is important to"
    • "However, its important"
    • "ethical guidelines"
    • "just an AI"
    • "within my programming"
    • "illegal"
    • "cannot provide"
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作