five

DIBT/aya_english_dpo_raw

收藏
Hugging Face2024-05-03 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/DIBT/aya_english_dpo_raw
下载链接
链接失效反馈
官方服务:
资源简介:
--- size_categories: n<1K dataset_info: features: - name: instruction dtype: string - name: targets dtype: string - name: language dtype: string - name: language_code dtype: string - name: annotation_type dtype: string - name: user_id dtype: string - name: generation_model dtype: string - name: generation dtype: string - name: predicted_generation_language dtype: string - name: predicted_generation_language_score dtype: float64 - name: generations sequence: string - name: generation_models sequence: string - name: model_name dtype: string - name: ratings sequence: int64 - name: rationales sequence: string splits: - name: train num_bytes: 567777 num_examples: 100 download_size: 320089 dataset_size: 567777 configs: - config_name: default data_files: - split: train path: data/train-* tags: - synthetic - distilabel - rlaif --- <p align="left"> <a href="https://github.com/argilla-io/distilabel"> <img src="https://raw.githubusercontent.com/argilla-io/distilabel/main/docs/assets/distilabel-badge-light.png" alt="Built with Distilabel" width="200" height="32"/> </a> </p> # Dataset Card for aya_english_dpo_raw This dataset has been created with [distilabel](https://distilabel.argilla.io/). ## Dataset Summary This dataset contains a `pipeline.yaml` which can be used to reproduce the pipeline that generated it in distilabel using the `distilabel` CLI: ```console distilabel pipeline run --config "https://huggingface.co/datasets/DIBT/aya_english_dpo_raw/raw/main/pipeline.yaml" ``` or explore the configuration: ```console distilabel pipeline info --config "https://huggingface.co/datasets/DIBT/aya_english_dpo_raw/raw/main/pipeline.yaml" ``` ## Dataset structure The examples have the following structure per configuration: <details><summary> Configuration: default </summary><hr> ```json { "annotation_type": "original-annotations", "generation": "The largest organ in the human body is the skin. It covers the entire surface of the body and has a total area of about 22 square feet (2 square meters) in adults. The skin performs many important functions, such as:\n\n* Protecting the body from external damage, such as injury, infection, and UV radiation\n* Regulating body temperature\n* Aiding in the production of vitamin D\n* Assisting in the sensation of touch and feel\n* Helping to maintain fluid balance and hydration\n\nThe skin is composed of several layers, with the outermost layer being the epidermis, followed by the dermis, and then the hypodermis. It is estimated that the skin accounts for about 15% of an adult\u0027s total body weight.\n\nIt\u0027s worth noting that while the skin is the largest organ by surface area, the liver is often considered the largest organ by mass, weighing around 3 pounds (1.4 kilograms) in adults.", "generation_model": "meta-llama/Meta-Llama-3-70B-Instruct", "generation_models": [ "aya", "meta-llama/Meta-Llama-3-70B-Instruct" ], "generations": [ "The largest organ in the human body is the skin. It is the body\u0027s largest organ by surface area and weight. The skin serves many essential functions, including protecting the body from physical damage, pathogens, and dehydration, as well as helping to regulate body temperature and producing vitamin D in response to sunlight. It\u0027s a complex organ with multiple layers, cells, and structures, each serving a unique purpose in maintaining the body\u0027s overall health and well-being.", "The largest organ in the human body is the skin. It covers the entire surface of the body and has a total area of about 22 square feet (2 square meters) in adults. The skin performs many important functions, such as:\n\n* Protecting the body from external damage, such as injury, infection, and UV radiation\n* Regulating body temperature\n* Aiding in the production of vitamin D\n* Assisting in the sensation of touch and feel\n* Helping to maintain fluid balance and hydration\n\nThe skin is composed of several layers, with the outermost layer being the epidermis, followed by the dermis, and then the hypodermis. It is estimated that the skin accounts for about 15% of an adult\u0027s total body weight.\n\nIt\u0027s worth noting that while the skin is the largest organ by surface area, the liver is often considered the largest organ by mass, weighing around 3 pounds (1.4 kilograms) in adults." ], "instruction": "What is the largest organ in the human body?", "language": "English", "language_code": "eng", "model_name": "meta-llama/Meta-Llama-3-70B-Instruct", "predicted_generation_language": "eng_Latn", "predicted_generation_language_score": 0.9590950012207031, "ratings": [ 5, 5 ], "rationales": [ "The text is accurate, informative, and comprehensive in describing the skin as the largest organ in the human body. It provides specific details about its functions, structure, and importance, aligning perfectly with the instruction.", "This text is equally excellent, providing a clear and concise answer to the question. It lists the skin\u0027s functions, describes its composition, and offers additional interesting facts, such as the comparison with the liver\u0027s mass. The text is well-structured, accurate, and confident in its information, making it an excellent response." ], "targets": "The largest organ in the human body is the skin. It is the body\u0027s largest organ by surface area and weight. The skin serves many essential functions, including protecting the body from physical damage, pathogens, and dehydration, as well as helping to regulate body temperature and producing vitamin D in response to sunlight. It\u0027s a complex organ with multiple layers, cells, and structures, each serving a unique purpose in maintaining the body\u0027s overall health and well-being.", "user_id": "29f22cf193a81e1a5c47d76af453a91b3cd19aa348995c7add1df15fe24e8801" } ``` This subset can be loaded as: ```python from datasets import load_dataset ds = load_dataset("DIBT/aya_english_dpo_raw", "default") ``` Or simply as it follows, since there's only one configuration and is named `default`: ```python from datasets import load_dataset ds = load_dataset("DIBT/aya_english_dpo_raw") ``` </details>
提供机构:
DIBT
原始信息汇总

数据集概述

数据集基本信息

  • 数据集名称: aya_english_dpo_raw
  • 数据集大小:
    • 下载大小: 320089字节
    • 数据集大小: 567777字节
  • 示例数量: 100 (训练集)
  • 特征数量: 14

数据集特征

  • 特征名称 (数据类型):
    • instruction (string)
    • targets (string)
    • language (string)
    • language_code (string)
    • annotation_type (string)
    • user_id (string)
    • generation_model (string)
    • generation (string)
    • predicted_generation_language (string)
    • predicted_generation_language_score (float64)
    • generations (sequence: string)
    • generation_models (sequence: string)
    • model_name (string)
    • ratings (sequence: int64)
    • rationales (sequence: string)

数据集结构

  • 配置名称: default
  • 数据文件:
    • 分割: train
    • 路径: data/train-*

数据集加载示例

python from datasets import load_dataset

ds = load_dataset("DIBT/aya_english_dpo_raw", "default")

或者,由于只有一个配置且名为default,可以直接加载: python from datasets import load_dataset

ds = load_dataset("DIBT/aya_english_dpo_raw")

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作