gabrielmbmb/testing-vllm

Name: gabrielmbmb/testing-vllm
Creator: gabrielmbmb
Published: 2024-06-12 12:47:30
License: 暂无描述

Hugging Face2024-06-12 更新2024-06-29 收录

下载链接：

https://hf-mirror.com/datasets/gabrielmbmb/testing-vllm

下载链接

链接失效反馈

官方服务：

资源简介：

--- size_categories: n<1K dataset_info: - config_name: text_generation_0 features: - name: instruction dtype: string - name: completion dtype: string - name: generation dtype: string - name: distilabel_metadata struct: - name: raw_output_text_generation_0 dtype: string - name: model_name dtype: string splits: - name: train num_bytes: 515517 num_examples: 327 download_size: 338101 dataset_size: 515517 - config_name: text_generation_1 features: - name: instruction dtype: string - name: completion dtype: string - name: generation dtype: string - name: distilabel_metadata struct: - name: raw_output_text_generation_1 dtype: string - name: model_name dtype: string splits: - name: train num_bytes: 515517 num_examples: 327 download_size: 338101 dataset_size: 515517 configs: - config_name: text_generation_0 data_files: - split: train path: text_generation_0/train-* - config_name: text_generation_1 data_files: - split: train path: text_generation_1/train-* tags: - synthetic - distilabel - rlaif --- <p align="left"> <a href="https://github.com/argilla-io/distilabel"> <img src="https://raw.githubusercontent.com/argilla-io/distilabel/main/docs/assets/distilabel-badge-light.png" alt="Built with Distilabel" width="200" height="32"/> </a> </p> # Dataset Card for testing-vllm This dataset has been created with [distilabel](https://distilabel.argilla.io/). ## Dataset Summary This dataset contains a `pipeline.yaml` which can be used to reproduce the pipeline that generated it in distilabel using the `distilabel` CLI: ```console distilabel pipeline run --config "https://huggingface.co/datasets/gabrielmbmb/testing-vllm/raw/main/pipeline.yaml" ``` or explore the configuration: ```console distilabel pipeline info --config "https://huggingface.co/datasets/gabrielmbmb/testing-vllm/raw/main/pipeline.yaml" ``` ## Dataset structure The examples have the following structure per configuration: <details><summary> Configuration: text_generation_1 </summary><hr> ```json { "completion": "Denote the number of chocolates each person has by the letter of their first name. We know that\nA = D + 12\nD = R + 6\nA = 2 * R\n\nThus, A = (R + 6) + 12 = R + 18\nSince also A = 2 * R, this means 2 * R = R + 18\nHence R = 18\nHence D = 18 + 6 = 24", "distilabel_metadata": { "raw_output_text_generation_1": "Arianna has 12 more chocolates than Danny, so DANNY + 12 = ARIANNA. Arabic does not have twice as many chocolates as Robbie, so if ARIANNA = 2 * ROBBIE. Since ARIANNA = DANNY + 12, we can plug in the values and get 2 * ROBBIE = DANNY + 12. We know also that DANNY = ROBBIE + 6, so 2 * ROBBIE = ROBBIE + 6 + 12. Simplifying this equation, we get ROBBIE =" }, "generation": "Arianna has 12 more chocolates than Danny, so DANNY + 12 = ARIANNA. Arabic does not have twice as many chocolates as Robbie, so if ARIANNA = 2 * ROBBIE. Since ARIANNA = DANNY + 12, we can plug in the values and get 2 * ROBBIE = DANNY + 12. We know also that DANNY = ROBBIE + 6, so 2 * ROBBIE = ROBBIE + 6 + 12. Simplifying this equation, we get ROBBIE =", "instruction": "Arianna has 12 chocolates more than Danny. Danny has 6 chocolates more than Robbie. Arianna has twice as many chocolates as Robbie has. How many chocolates does Danny have?", "model_name": "meta-llama/Meta-Llama-3-8B-Instruct" } ``` This subset can be loaded as: ```python from datasets import load_dataset ds = load_dataset("gabrielmbmb/testing-vllm", "text_generation_1") ``` </details> <details><summary> Configuration: text_generation_0 </summary><hr> ```json { "completion": "Denote the number of chocolates each person has by the letter of their first name. We know that\nA = D + 12\nD = R + 6\nA = 2 * R\n\nThus, A = (R + 6) + 12 = R + 18\nSince also A = 2 * R, this means 2 * R = R + 18\nHence R = 18\nHence D = 18 + 6 = 24", "distilabel_metadata": { "raw_output_text_generation_0": "Arianna has 12 more chocolates than Danny, so DANNY + 12 = ARIANNA. Arabic does not have twice as many chocolates as Robbie, so if ARIANNA = 2 * ROBBIE. Since ARIANNA = DANNY + 12, we can plug in the values and get 2 * ROBBIE = DANNY + 12. We know also that DANNY = ROBBIE + 6, so 2 * ROBBIE = ROBBIE + 6 + 12. Simplifying this equation, we get ROBBIE =" }, "generation": "Arianna has 12 more chocolates than Danny, so DANNY + 12 = ARIANNA. Arabic does not have twice as many chocolates as Robbie, so if ARIANNA = 2 * ROBBIE. Since ARIANNA = DANNY + 12, we can plug in the values and get 2 * ROBBIE = DANNY + 12. We know also that DANNY = ROBBIE + 6, so 2 * ROBBIE = ROBBIE + 6 + 12. Simplifying this equation, we get ROBBIE =", "instruction": "Arianna has 12 chocolates more than Danny. Danny has 6 chocolates more than Robbie. Arianna has twice as many chocolates as Robbie has. How many chocolates does Danny have?", "model_name": "meta-llama/Meta-Llama-3-8B-Instruct" } ``` This subset can be loaded as: ```python from datasets import load_dataset ds = load_dataset("gabrielmbmb/testing-vllm", "text_generation_0") ``` </details>

提供机构：

gabrielmbmb

原始信息汇总

数据集概述

数据集结构

配置: text_generation_0

特征:
- instruction: 类型为 string
- completion: 类型为 string
- generation: 类型为 string
- distilabel_metadata: 包含 raw_output_text_generation_0，类型为 string
- model_name: 类型为 string
分割:
- train: 包含 327 个样本，占用 515517 字节
下载大小: 338101 字节
数据集大小: 515517 字节
数据文件路径: text_generation_0/train-*

配置: text_generation_1

特征:
- instruction: 类型为 string
- completion: 类型为 string
- generation: 类型为 string
- distilabel_metadata: 包含 raw_output_text_generation_1，类型为 string
- model_name: 类型为 string
分割:
- train: 包含 327 个样本，占用 515517 字节
下载大小: 338101 字节
数据集大小: 515517 字节
数据文件路径: text_generation_1/train-*

gabrielmbmb/testing-vllm

数据集概述

数据集结构

配置: text_generation_0

配置: text_generation_1

标签