xDAN-Engine/size_test

Name: xDAN-Engine/size_test
Creator: xDAN-Engine
Published: 2024-05-06 03:50:58
License: 暂无描述

Hugging Face2024-05-06 更新2024-06-12 收录

下载链接：

https://hf-mirror.com/datasets/xDAN-Engine/size_test

下载链接

链接失效反馈

官方服务：

资源简介：

--- size_categories: n<1K dataset_info: - config_name: default features: - name: instruction dtype: string - name: completion dtype: string - name: meta struct: - name: category dtype: string - name: completion dtype: string - name: id dtype: int64 - name: input dtype: 'null' - name: motivation_app dtype: 'null' - name: prompt dtype: string - name: source dtype: string - name: subcategory dtype: string - name: model_name dtype: string - name: generations dtype: string splits: - name: train num_bytes: 68386 num_examples: 30 - name: test num_bytes: 69606 num_examples: 30 download_size: 71198 dataset_size: 137992 - config_name: test1 features: - name: instruction dtype: string - name: input dtype: string - name: output dtype: string - name: task_type struct: - name: major sequence: string - name: minor sequence: string - name: domain sequence: string - name: metadata dtype: string - name: answer_from dtype: string - name: human_verified dtype: bool - name: copyright dtype: string - name: subset dtype: string splits: - name: train num_bytes: 62864 num_examples: 100 download_size: 42246 dataset_size: 62864 - config_name: test2 features: - name: instruction dtype: string - name: input dtype: string - name: output dtype: string - name: task_type struct: - name: major sequence: string - name: minor sequence: string - name: domain sequence: string - name: metadata dtype: string - name: answer_from dtype: string - name: human_verified dtype: bool - name: copyright dtype: string - name: subset dtype: string splits: - name: train num_bytes: 6286.4 num_examples: 10 download_size: 12402 dataset_size: 6286.4 - config_name: test3 features: - name: input dtype: string - name: generation_model sequence: string - name: generation_prompt list: list: - name: content dtype: string - name: role dtype: string - name: raw_generation_responses sequence: string - name: generations sequence: string - name: labelling_model dtype: string - name: labelling_prompt list: - name: content dtype: string - name: role dtype: string - name: raw_labelling_response dtype: string - name: rating sequence: float64 - name: rationale sequence: string splits: - name: train num_bytes: 135709 num_examples: 20 download_size: 68862 dataset_size: 135709 configs: - config_name: default data_files: - split: train path: data/train-* - split: test path: data/test-* - config_name: test1 data_files: - split: train path: test1/train-* - config_name: test2 data_files: - split: train path: test2/train-* - config_name: test3 data_files: - split: train path: test3/train-* tags: - synthetic - distilabel - rlaif --- <p align="left"> <a href="https://github.com/argilla-io/distilabel"> <img src="https://raw.githubusercontent.com/argilla-io/distilabel/main/docs/assets/distilabel-badge-light.png" alt="Built with Distilabel" width="200" height="32"/> </a> </p> # Dataset Card for size_test This dataset has been created with [distilabel](https://distilabel.argilla.io/). ## Dataset Summary This dataset contains a `pipeline.yaml` which can be used to reproduce the pipeline that generated it in distilabel using the `distilabel` CLI: ```console distilabel pipeline run --config "https://huggingface.co/datasets/xDAN-Engine/size_test/raw/main/pipeline.yaml" ``` or explore the configuration: ```console distilabel pipeline info --config "https://huggingface.co/datasets/xDAN-Engine/size_test/raw/main/pipeline.yaml" ``` ## Dataset structure The examples have the following structure per configuration: <details><summary> Configuration: default </summary><hr> ```json { "generation": "\u6839\u636e\u6570\u5b66\u4e2d\u7684\u8fd0\u7b97\u987a\u5e8f\uff08\u5148\u4e58\u9664\u540e\u52a0\u51cf\uff09\uff0c\u9996\u5148\u8fdb\u884c\u4e58\u6cd5\u8fd0\u7b97\uff1a\n\n2 * 1 = 2\n\n\u7136\u540e\u8fdb\u884c\u51cf\u6cd5\u8fd0\u7b97\uff1a\n\n8 - 2 = 6\n\n\u6240\u4ee5\uff0c8 - 2 * 1 \u7684\u7ed3\u679c\u662f 6\u3002", "instruction": "\u7b97\u4e00\u4e0b\u8fd9\u4e2a\u6570\u5b66\u9898\uff1a8 - 2 * 1\uff0c\u7ed3\u679c\u662f\uff1f", "model_name": "gpt-4-turbo", "response": 6.0 } ``` This subset can be loaded as: ```python from datasets import load_dataset ds = load_dataset("xDAN-Engine/size_test", "default") ``` Or simply as it follows, since there's only one configuration and is named `default`: ```python from datasets import load_dataset ds = load_dataset("xDAN-Engine/size_test") ``` </details>

提供机构：

xDAN-Engine

原始信息汇总

数据集概述

数据集基本信息

大小分类: 小于1K
数据集大小: 137992字节
下载大小: 71198字节

数据集配置

默认配置 (config_name: default)
- 特征:
  - instruction: 字符串
  - completion: 字符串
  - meta: 结构体
    - category: 字符串
    - completion: 字符串
    - id: int64
    - input: null
    - motivation_app: null
    - prompt: 字符串
    - source: 字符串
    - subcategory: 字符串
  - model_name: 字符串
  - generations: 字符串
- 分割:
  - train: 68386字节, 30个样本
  - test: 69606字节, 30个样本
测试配置1 (config_name: test1)
- 特征:
  - instruction: 字符串
  - input: 字符串
  - output: 字符串
  - task_type: 结构体
    - major: 序列, 字符串
    - minor: 序列, 字符串
  - domain: 序列, 字符串
  - metadata: 字符串
  - answer_from: 字符串
  - human_verified: bool
  - copyright: 字符串
  - subset: 字符串
- 分割:
  - train: 62864字节, 100个样本
测试配置2 (config_name: test2)
- 特征:
  - instruction: 字符串
  - input: 字符串
  - output: 字符串
  - task_type: 结构体
    - major: 序列, 字符串
    - minor: 序列, 字符串
  - domain: 序列, 字符串
  - metadata: 字符串
  - answer_from: 字符串
  - human_verified: bool
  - copyright: 字符串
  - subset: 字符串
- 分割:
  - train: 6286.4字节, 10个样本
测试配置3 (config_name: test3)
- 特征:
  - input: 字符串
  - generation_model: 序列, 字符串
  - generation_prompt: 列表
    - content: 字符串
    - role: 字符串
  - raw_generation_responses: 序列, 字符串
  - generations: 序列, 字符串
  - labelling_model: 字符串
  - labelling_prompt: 列表
    - content: 字符串
    - role: 字符串
  - raw_labelling_response: 字符串
  - rating: 序列, float64
  - rationale: 序列, 字符串
- 分割:
  - train: 135709字节, 20个样本

数据集标签

synthetic
distilabel
rlaif

5,000+

优质数据集

54 个

任务类型

进入经典数据集