five

xDAN-Engine/size_test

收藏
Hugging Face2024-05-06 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/xDAN-Engine/size_test
下载链接
链接失效反馈
官方服务:
资源简介:
--- size_categories: n<1K dataset_info: - config_name: default features: - name: instruction dtype: string - name: completion dtype: string - name: meta struct: - name: category dtype: string - name: completion dtype: string - name: id dtype: int64 - name: input dtype: 'null' - name: motivation_app dtype: 'null' - name: prompt dtype: string - name: source dtype: string - name: subcategory dtype: string - name: model_name dtype: string - name: generations dtype: string splits: - name: train num_bytes: 68386 num_examples: 30 - name: test num_bytes: 69606 num_examples: 30 download_size: 71198 dataset_size: 137992 - config_name: test1 features: - name: instruction dtype: string - name: input dtype: string - name: output dtype: string - name: task_type struct: - name: major sequence: string - name: minor sequence: string - name: domain sequence: string - name: metadata dtype: string - name: answer_from dtype: string - name: human_verified dtype: bool - name: copyright dtype: string - name: subset dtype: string splits: - name: train num_bytes: 62864 num_examples: 100 download_size: 42246 dataset_size: 62864 - config_name: test2 features: - name: instruction dtype: string - name: input dtype: string - name: output dtype: string - name: task_type struct: - name: major sequence: string - name: minor sequence: string - name: domain sequence: string - name: metadata dtype: string - name: answer_from dtype: string - name: human_verified dtype: bool - name: copyright dtype: string - name: subset dtype: string splits: - name: train num_bytes: 6286.4 num_examples: 10 download_size: 12402 dataset_size: 6286.4 - config_name: test3 features: - name: input dtype: string - name: generation_model sequence: string - name: generation_prompt list: list: - name: content dtype: string - name: role dtype: string - name: raw_generation_responses sequence: string - name: generations sequence: string - name: labelling_model dtype: string - name: labelling_prompt list: - name: content dtype: string - name: role dtype: string - name: raw_labelling_response dtype: string - name: rating sequence: float64 - name: rationale sequence: string splits: - name: train num_bytes: 135709 num_examples: 20 download_size: 68862 dataset_size: 135709 configs: - config_name: default data_files: - split: train path: data/train-* - split: test path: data/test-* - config_name: test1 data_files: - split: train path: test1/train-* - config_name: test2 data_files: - split: train path: test2/train-* - config_name: test3 data_files: - split: train path: test3/train-* tags: - synthetic - distilabel - rlaif --- <p align="left"> <a href="https://github.com/argilla-io/distilabel"> <img src="https://raw.githubusercontent.com/argilla-io/distilabel/main/docs/assets/distilabel-badge-light.png" alt="Built with Distilabel" width="200" height="32"/> </a> </p> # Dataset Card for size_test This dataset has been created with [distilabel](https://distilabel.argilla.io/). ## Dataset Summary This dataset contains a `pipeline.yaml` which can be used to reproduce the pipeline that generated it in distilabel using the `distilabel` CLI: ```console distilabel pipeline run --config "https://huggingface.co/datasets/xDAN-Engine/size_test/raw/main/pipeline.yaml" ``` or explore the configuration: ```console distilabel pipeline info --config "https://huggingface.co/datasets/xDAN-Engine/size_test/raw/main/pipeline.yaml" ``` ## Dataset structure The examples have the following structure per configuration: <details><summary> Configuration: default </summary><hr> ```json { "generation": "\u6839\u636e\u6570\u5b66\u4e2d\u7684\u8fd0\u7b97\u987a\u5e8f\uff08\u5148\u4e58\u9664\u540e\u52a0\u51cf\uff09\uff0c\u9996\u5148\u8fdb\u884c\u4e58\u6cd5\u8fd0\u7b97\uff1a\n\n2 * 1 = 2\n\n\u7136\u540e\u8fdb\u884c\u51cf\u6cd5\u8fd0\u7b97\uff1a\n\n8 - 2 = 6\n\n\u6240\u4ee5\uff0c8 - 2 * 1 \u7684\u7ed3\u679c\u662f 6\u3002", "instruction": "\u7b97\u4e00\u4e0b\u8fd9\u4e2a\u6570\u5b66\u9898\uff1a8 - 2 * 1\uff0c\u7ed3\u679c\u662f\uff1f", "model_name": "gpt-4-turbo", "response": 6.0 } ``` This subset can be loaded as: ```python from datasets import load_dataset ds = load_dataset("xDAN-Engine/size_test", "default") ``` Or simply as it follows, since there's only one configuration and is named `default`: ```python from datasets import load_dataset ds = load_dataset("xDAN-Engine/size_test") ``` </details>
提供机构:
xDAN-Engine
原始信息汇总

数据集概述

数据集基本信息

  • 大小分类: 小于1K
  • 数据集大小: 137992字节
  • 下载大小: 71198字节

数据集配置

  • 默认配置 (config_name: default)

    • 特征:
      • instruction: 字符串
      • completion: 字符串
      • meta: 结构体
        • category: 字符串
        • completion: 字符串
        • id: int64
        • input: null
        • motivation_app: null
        • prompt: 字符串
        • source: 字符串
        • subcategory: 字符串
      • model_name: 字符串
      • generations: 字符串
    • 分割:
      • train: 68386字节, 30个样本
      • test: 69606字节, 30个样本
  • 测试配置1 (config_name: test1)

    • 特征:
      • instruction: 字符串
      • input: 字符串
      • output: 字符串
      • task_type: 结构体
        • major: 序列, 字符串
        • minor: 序列, 字符串
      • domain: 序列, 字符串
      • metadata: 字符串
      • answer_from: 字符串
      • human_verified: bool
      • copyright: 字符串
      • subset: 字符串
    • 分割:
      • train: 62864字节, 100个样本
  • 测试配置2 (config_name: test2)

    • 特征:
      • instruction: 字符串
      • input: 字符串
      • output: 字符串
      • task_type: 结构体
        • major: 序列, 字符串
        • minor: 序列, 字符串
      • domain: 序列, 字符串
      • metadata: 字符串
      • answer_from: 字符串
      • human_verified: bool
      • copyright: 字符串
      • subset: 字符串
    • 分割:
      • train: 6286.4字节, 10个样本
  • 测试配置3 (config_name: test3)

    • 特征:
      • input: 字符串
      • generation_model: 序列, 字符串
      • generation_prompt: 列表
        • content: 字符串
        • role: 字符串
      • raw_generation_responses: 序列, 字符串
      • generations: 序列, 字符串
      • labelling_model: 字符串
      • labelling_prompt: 列表
        • content: 字符串
        • role: 字符串
      • raw_labelling_response: 字符串
      • rating: 序列, float64
      • rationale: 序列, 字符串
    • 分割:
      • train: 135709字节, 20个样本

数据集标签

  • synthetic
  • distilabel
  • rlaif
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作