five

vicgalle/OpenHermesPreferences-1k

收藏
Hugging Face2024-02-29 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/vicgalle/OpenHermesPreferences-1k
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: source dtype: string - name: category dtype: string - name: prompt dtype: string - name: candidates_completions sequence: string - name: candidate_policies sequence: string - name: ranks sequence: int64 - name: rank_str dtype: string - name: chosen_policy dtype: string - name: rejected_policy dtype: string - name: chosen dtype: string - name: rejected dtype: string - name: len_chosen_response dtype: int64 - name: len_rejected_response dtype: int64 - name: __index_level_0__ dtype: int64 splits: - name: train num_bytes: 22437232 num_examples: 1111 download_size: 10508529 dataset_size: 22437232 configs: - config_name: default data_files: - split: train path: data/train-* license: other source_datasets: - argilla/OpenHermesPreferences size_categories: - 1K<n<10K task_categories: - text-generation pretty_name: OpenHermesPreferences-1k tags: - synthetic - rlaif - dpo language: - en --- ## OpenHermesPreferences-1k ⚗️ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/5fad8602b8423e1d80b8a965/cGla4wyqURIuy40dxWldt.png) OpenHermesPreferences-1k is a dataset of ~1,000 samples derived from [argilla/OpenHermesPreferences](https://huggingface.co/datasets/argilla/OpenHermesPreferences) using the [Long is More for Alignment](https://arxiv.org/abs/2402.04833) protocol. This protocol consists of selecting the ~1,000 longest responses (for the preferred/chosen ones) and provides a strong baseline to measure performance against. Instead of uniform sampling across the dataset categories, we used stratified sampling to keep all the categories, leading to the following distribution of categories: | category | count | |:--------------------------|--------:| | None | 400 | | orca | 221 | | coding | 110 | | general | 85 | | trivia | 50 | | roleplay | 42 | | writing | 31 | | wordgame | 20 | | stylized_response | 19 | | joke | 17 | | multiple_choice | 17 | | plan | 13 | | riddle | 12 | | rp | 10 | | misconception | 9 | | gtkm | 8 | | theory_of_mind | 7 | | awareness | 7 | | summarization | 5 | | cot | 5 | | counterfactual_contextual | 5 | | editor | 4 | | song | 4 | | card | 2 | | agent | 2 | | experience | 2 | | greeting | 2 | | quiz | 1 | | detailed_writing | 1 | ## Usage The dataset already has the columns `prompt`, `chosen` and `rejected`, so it is trivially compatible with the [DPOTrainer](https://huggingface.co/docs/trl/en/dpo_trainer) from the trl library. ## License `OpenHermesPreferences-1k` inherits the same license as the source dataset [`teknium/OpenHermes-2.5`](https://huggingface.co/datasets/teknium/OpenHermes-2.5) which is currently listed as `other` to account for the varying licenses in each source.
提供机构:
vicgalle
原始信息汇总

OpenHermesPreferences-1k 数据集概述

数据集信息

  • 特征列表

    • source: 字符串类型
    • category: 字符串类型
    • prompt: 字符串类型
    • candidates_completions: 字符串序列
    • candidate_policies: 字符串序列
    • ranks: 整数序列
    • rank_str: 字符串类型
    • chosen_policy: 字符串类型
    • rejected_policy: 字符串类型
    • chosen: 字符串类型
    • rejected: 字符串类型
    • len_chosen_response: 整数类型
    • len_rejected_response: 整数类型
    • __index_level_0__: 整数类型
  • 数据分割

    • train: 包含 1111 个样本,占用 22437232 字节
  • 下载大小:10508529 字节

  • 数据集大小:22437232 字节

配置信息

  • 配置名称:default
    • 数据文件
      • train: 路径为 data/train-*

许可证

  • 许可证类型:other

源数据集

  • 源数据集:argilla/OpenHermesPreferences

数据集规模

  • 规模分类:1K<n<10K

任务分类

  • 任务类型:text-generation

数据集名称

  • 名称:OpenHermesPreferences-1k

标签

  • 标签:synthetic, rlaif, dpo

语言

  • 语言:英语

类别分布

  • None: 400
  • orca: 221
  • coding: 110
  • general: 85
  • trivia: 50
  • roleplay: 42
  • writing: 31
  • wordgame: 20
  • stylized_response: 19
  • joke: 17
  • multiple_choice: 17
  • plan: 13
  • riddle: 12
  • rp: 10
  • misconception: 9
  • gtkm: 8
  • theory_of_mind: 7
  • awareness: 7
  • summarization: 5
  • cot: 5
  • counterfactual_contextual: 5
  • editor: 4
  • song: 4
  • card: 2
  • agent: 2
  • experience: 2
  • greeting: 2
  • quiz: 1
  • detailed_writing: 1

使用说明

  • 数据集包含 prompt, chosenrejected 列,与 trl 库中的 DPOTrainer 兼容。

许可证说明

  • 数据集继承源数据集 teknium/OpenHermes-2.5 的许可证,目前列为 other
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作