five

distilabel-internal-testing/dpo-mix-4k-criticurus-temperature0-v0.0

收藏
Hugging Face2024-04-18 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/distilabel-internal-testing/dpo-mix-4k-criticurus-temperature0-v0.0
下载链接
链接失效反馈
官方服务:
资源简介:
--- size_categories: n<1K dataset_info: features: - name: instruction dtype: string - name: response dtype: string - name: rating dtype: float64 - name: dataset_name dtype: string - name: model_name dtype: string - name: score dtype: string - name: critique dtype: string - name: raw_output dtype: string splits: - name: train num_bytes: 9817993 num_examples: 3996 download_size: 4624133 dataset_size: 9817993 configs: - config_name: default data_files: - split: train path: data/train-* tags: - synthetic - distilabel - rlaif --- <p align="left"> <a href="https://github.com/argilla-io/distilabel"> <img src="https://raw.githubusercontent.com/argilla-io/distilabel/main/docs/assets/distilabel-badge-light.png" alt="Built with Distilabel" width="200" height="32"/> </a> </p> # Dataset Card for dpo-mix-4k-criticurus-temperature0-v0.0 This dataset has been created with [distilabel](https://distilabel.argilla.io/). ## Dataset Summary This dataset contains a `pipeline.yaml` which can be used to reproduce the pipeline that generated it in distilabel using the `distilabel` CLI: ```console distilabel pipeline run --config "https://huggingface.co/datasets/distilabel-internal-testing/dpo-mix-4k-criticurus-temperature0-v0.0/raw/main/pipeline.yaml" ``` or explore the configuration: ```console distilabel pipeline info --config "https://huggingface.co/datasets/distilabel-internal-testing/dpo-mix-4k-criticurus-temperature0-v0.0/raw/main/pipeline.yaml" ``` ## Dataset structure The examples have the following structure per configuration: <details><summary> Configuration: default </summary><hr> ```json { "critique": "You\u0027ve done a good job identifying the anagram of \" dirty room \" as \" Dormary \" which is a good start. However, the phrase \"dormary\" is not a commonly used term for a place where students live. It\u0027s important to ensure that the answer you provide is accurate and well-known. \n\nTo improve, you could have used a more common term like \"dormition\" which is a more common anagram of \" dirty room \" and is a place where students live. \n\nRemember, accuracy and understanding of the context are key in providing a helpful and truthful answer.", "dataset_name": "argilla/distilabel-capybara-dpo-7k-binarized", "instruction": "A phrase that\u0027s an anagram of \"dirty room\", it refers to a place where students live.", "model_name": "distilabel-internal-testing/criticurus-v0.0", "rating": 5.0, "raw_output": null, "response": "dormitory", "score": "6\u003c|im_end|\u003e" } ``` This subset can be loaded as: ```python from datasets import load_dataset ds = load_dataset("distilabel-internal-testing/dpo-mix-4k-criticurus-temperature0-v0.0", "default") ``` Or simply as it follows, since there's only one configuration and is named `default`: ```python from datasets import load_dataset ds = load_dataset("distilabel-internal-testing/dpo-mix-4k-criticurus-temperature0-v0.0") ``` </details>
提供机构:
distilabel-internal-testing
原始信息汇总

数据集概述

数据集基本信息

  • 数据集名称: dpo-mix-4k-criticurus-temperature0-v0.0
  • 数据集大小:
    • 下载大小: 4624133字节
    • 数据集大小: 9817993字节
  • 示例数量: 3996
  • 分类: 小于1K

数据集特征

  • 特征名称:instruction, response, rating, dataset_name, model_name, score, critique, raw_output
  • 数据类型
    • instruction: string
    • response: string
    • rating: float64
    • dataset_name: string
    • model_name: string
    • score: string
    • critique: string
    • raw_output: string

数据集结构

  • 配置名称: default
  • 数据文件路径: data/train-*
  • 示例结构: json { "critique": ..., "dataset_name": "argilla/distilabel-capybara-dpo-7k-binarized", "instruction": ..., "model_name": "distilabel-internal-testing/criticurus-v0.0", "rating": 5.0, "raw_output": null, "response": "dormitory", "score": "6u003c|im_end|u003e" }

数据集加载

  • 加载方式: python from datasets import load_dataset ds = load_dataset("distilabel-internal-testing/dpo-mix-4k-criticurus-temperature0-v0.0", "default")

    或 python from datasets import load_dataset ds = load_dataset("distilabel-internal-testing/dpo-mix-4k-criticurus-temperature0-v0.0")

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作