five

gabrielmbmb/testing-vllm

收藏
Hugging Face2024-06-12 更新2024-06-29 收录
下载链接:
https://hf-mirror.com/datasets/gabrielmbmb/testing-vllm
下载链接
链接失效反馈
官方服务:
资源简介:
--- size_categories: n<1K dataset_info: - config_name: text_generation_0 features: - name: instruction dtype: string - name: completion dtype: string - name: generation dtype: string - name: distilabel_metadata struct: - name: raw_output_text_generation_0 dtype: string - name: model_name dtype: string splits: - name: train num_bytes: 515517 num_examples: 327 download_size: 338101 dataset_size: 515517 - config_name: text_generation_1 features: - name: instruction dtype: string - name: completion dtype: string - name: generation dtype: string - name: distilabel_metadata struct: - name: raw_output_text_generation_1 dtype: string - name: model_name dtype: string splits: - name: train num_bytes: 515517 num_examples: 327 download_size: 338101 dataset_size: 515517 configs: - config_name: text_generation_0 data_files: - split: train path: text_generation_0/train-* - config_name: text_generation_1 data_files: - split: train path: text_generation_1/train-* tags: - synthetic - distilabel - rlaif --- <p align="left"> <a href="https://github.com/argilla-io/distilabel"> <img src="https://raw.githubusercontent.com/argilla-io/distilabel/main/docs/assets/distilabel-badge-light.png" alt="Built with Distilabel" width="200" height="32"/> </a> </p> # Dataset Card for testing-vllm This dataset has been created with [distilabel](https://distilabel.argilla.io/). ## Dataset Summary This dataset contains a `pipeline.yaml` which can be used to reproduce the pipeline that generated it in distilabel using the `distilabel` CLI: ```console distilabel pipeline run --config "https://huggingface.co/datasets/gabrielmbmb/testing-vllm/raw/main/pipeline.yaml" ``` or explore the configuration: ```console distilabel pipeline info --config "https://huggingface.co/datasets/gabrielmbmb/testing-vllm/raw/main/pipeline.yaml" ``` ## Dataset structure The examples have the following structure per configuration: <details><summary> Configuration: text_generation_1 </summary><hr> ```json { "completion": "Denote the number of chocolates each person has by the letter of their first name. We know that\nA = D + 12\nD = R + 6\nA = 2 * R\n\nThus, A = (R + 6) + 12 = R + 18\nSince also A = 2 * R, this means 2 * R = R + 18\nHence R = 18\nHence D = 18 + 6 = 24", "distilabel_metadata": { "raw_output_text_generation_1": "Arianna has 12 more chocolates than Danny, so DANNY + 12 = ARIANNA. Arabic does not have twice as many chocolates as Robbie, so if ARIANNA = 2 * ROBBIE. Since ARIANNA = DANNY + 12, we can plug in the values and get 2 * ROBBIE = DANNY + 12. We know also that DANNY = ROBBIE + 6, so 2 * ROBBIE = ROBBIE + 6 + 12. Simplifying this equation, we get ROBBIE =" }, "generation": "Arianna has 12 more chocolates than Danny, so DANNY + 12 = ARIANNA. Arabic does not have twice as many chocolates as Robbie, so if ARIANNA = 2 * ROBBIE. Since ARIANNA = DANNY + 12, we can plug in the values and get 2 * ROBBIE = DANNY + 12. We know also that DANNY = ROBBIE + 6, so 2 * ROBBIE = ROBBIE + 6 + 12. Simplifying this equation, we get ROBBIE =", "instruction": "Arianna has 12 chocolates more than Danny. Danny has 6 chocolates more than Robbie. Arianna has twice as many chocolates as Robbie has. How many chocolates does Danny have?", "model_name": "meta-llama/Meta-Llama-3-8B-Instruct" } ``` This subset can be loaded as: ```python from datasets import load_dataset ds = load_dataset("gabrielmbmb/testing-vllm", "text_generation_1") ``` </details> <details><summary> Configuration: text_generation_0 </summary><hr> ```json { "completion": "Denote the number of chocolates each person has by the letter of their first name. We know that\nA = D + 12\nD = R + 6\nA = 2 * R\n\nThus, A = (R + 6) + 12 = R + 18\nSince also A = 2 * R, this means 2 * R = R + 18\nHence R = 18\nHence D = 18 + 6 = 24", "distilabel_metadata": { "raw_output_text_generation_0": "Arianna has 12 more chocolates than Danny, so DANNY + 12 = ARIANNA. Arabic does not have twice as many chocolates as Robbie, so if ARIANNA = 2 * ROBBIE. Since ARIANNA = DANNY + 12, we can plug in the values and get 2 * ROBBIE = DANNY + 12. We know also that DANNY = ROBBIE + 6, so 2 * ROBBIE = ROBBIE + 6 + 12. Simplifying this equation, we get ROBBIE =" }, "generation": "Arianna has 12 more chocolates than Danny, so DANNY + 12 = ARIANNA. Arabic does not have twice as many chocolates as Robbie, so if ARIANNA = 2 * ROBBIE. Since ARIANNA = DANNY + 12, we can plug in the values and get 2 * ROBBIE = DANNY + 12. We know also that DANNY = ROBBIE + 6, so 2 * ROBBIE = ROBBIE + 6 + 12. Simplifying this equation, we get ROBBIE =", "instruction": "Arianna has 12 chocolates more than Danny. Danny has 6 chocolates more than Robbie. Arianna has twice as many chocolates as Robbie has. How many chocolates does Danny have?", "model_name": "meta-llama/Meta-Llama-3-8B-Instruct" } ``` This subset can be loaded as: ```python from datasets import load_dataset ds = load_dataset("gabrielmbmb/testing-vllm", "text_generation_0") ``` </details>
提供机构:
gabrielmbmb
原始信息汇总

数据集概述

数据集结构

配置: text_generation_0

  • 特征:
    • instruction: 类型为 string
    • completion: 类型为 string
    • generation: 类型为 string
    • distilabel_metadata: 包含 raw_output_text_generation_0,类型为 string
    • model_name: 类型为 string
  • 分割:
    • train: 包含 327 个样本,占用 515517 字节
  • 下载大小: 338101 字节
  • 数据集大小: 515517 字节
  • 数据文件路径: text_generation_0/train-*

配置: text_generation_1

  • 特征:
    • instruction: 类型为 string
    • completion: 类型为 string
    • generation: 类型为 string
    • distilabel_metadata: 包含 raw_output_text_generation_1,类型为 string
    • model_name: 类型为 string
  • 分割:
    • train: 包含 327 个样本,占用 515517 字节
  • 下载大小: 338101 字节
  • 数据集大小: 515517 字节
  • 数据文件路径: text_generation_1/train-*

标签

  • synthetic
  • distilabel
  • rlaif
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作