alvarobartt/example-distilabel
收藏Hugging Face2024-05-10 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/alvarobartt/example-distilabel
下载链接
链接失效反馈官方服务:
资源简介:
---
size_categories: n<1K
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
dataset_info:
features:
- name: instruction
dtype: string
- name: completion
dtype: string
- name: meta
struct:
- name: category
dtype: string
- name: completion
dtype: string
- name: id
dtype: int64
- name: input
dtype: 'null'
- name: motivation_app
dtype: 'null'
- name: prompt
dtype: string
- name: source
dtype: string
- name: subcategory
dtype: string
- name: generation_model
dtype: string
- name: generation
dtype: string
splits:
- name: train
num_bytes: 4257
num_examples: 2
download_size: 21358
dataset_size: 4257
tags:
- synthetic
- distilabel
- rlaif
---
<p align="left">
<a href="https://github.com/argilla-io/distilabel">
<img src="https://raw.githubusercontent.com/argilla-io/distilabel/main/docs/assets/distilabel-badge-light.png" alt="Built with Distilabel" width="200" height="32"/>
</a>
</p>
# Dataset Card for example-distilabel
This dataset has been created with [distilabel](https://distilabel.argilla.io/).
## Dataset Summary
This dataset contains a `pipeline.yaml` which can be used to reproduce the pipeline that generated it in distilabel using the `distilabel` CLI:
```console
distilabel pipeline run --config "https://huggingface.co/datasets/alvarobartt/example-distilabel/raw/main/pipeline.yaml"
```
or explore the configuration:
```console
distilabel pipeline info --config "https://huggingface.co/datasets/alvarobartt/example-distilabel/raw/main/pipeline.yaml"
```
## Dataset structure
The examples have the following structure per configuration:
<details><summary> Configuration: default </summary><hr>
```json
{
"completion": "Denote the number of chocolates each person has by the letter of their first name. We know that\nA = D + 12\nD = R + 6\nA = 2 * R\n\nThus, A = (R + 6) + 12 = R + 18\nSince also A = 2 * R, this means 2 * R = R + 18\nHence R = 18\nHence D = 18 + 6 = 24",
"generation": "Let\u0027s denote the number of chocolates Robbie has as R. According to the information given:\n\n1. Arianna has 2R (twice as many) chocolates.\n2. Danny has R + 6 chocolates (6 more than Robbie).\n3. Arianna also has 12 more chocolates than Danny, so we can write this as:\n 2R = (R + 6) + 12\n\nNow, let\u0027s solve for R:\n\n2R = R + 18\n2R - R = 18\nR = 18\n\nSo, Robbie has 18 chocolates. Now we can find out how many Danny has:\n\nDanny = R + 6\nDanny = 18 + 6\nDanny = 24\n\nTherefore, Danny has 24 chocolates.",
"generation_model": "examples/models/Phi-3-mini-4k-instruct-q4.gguf",
"instruction": "Arianna has 12 chocolates more than Danny. Danny has 6 chocolates more than Robbie. Arianna has twice as many chocolates as Robbie has. How many chocolates does Danny have?",
"meta": {
"category": "Question Answering",
"completion": "Denote the number of chocolates each person has by the letter of their first name. We know that\nA = D + 12\nD = R + 6\nA = 2 * R\n\nThus, A = (R + 6) + 12 = R + 18\nSince also A = 2 * R, this means 2 * R = R + 18\nHence R = 18\nHence D = 18 + 6 = 24",
"id": 0,
"input": null,
"motivation_app": null,
"prompt": "Arianna has 12 chocolates more than Danny. Danny has 6 chocolates more than Robbie. Arianna has twice as many chocolates as Robbie has. How many chocolates does Danny have?",
"source": "surge",
"subcategory": "Math"
}
}
```
This subset can be loaded as:
```python
from datasets import load_dataset
ds = load_dataset("alvarobartt/example-distilabel", "default")
```
Or simply as it follows, since there's only one configuration and is named `default`:
```python
from datasets import load_dataset
ds = load_dataset("alvarobartt/example-distilabel")
```
</details>
提供机构:
alvarobartt
原始信息汇总
数据集概述
基本信息
- 大小分类: n<1K
- 配置:
- 默认配置 (
config_name: default)- 数据文件:
- 训练集 (
split: train)- 路径:
data/train-*
- 路径:
- 训练集 (
- 数据文件:
- 默认配置 (
数据集信息
- 特征:
- instruction: 字符串类型
- completion: 字符串类型
- meta: 结构化数据
- category: 字符串类型
- completion: 字符串类型
- id: int64类型
- input: null类型
- motivation_app: null类型
- prompt: 字符串类型
- source: 字符串类型
- subcategory: 字符串类型
- generation_model: 字符串类型
- generation: 字符串类型
数据集分割
- 训练集 (
name: train)- 大小: 4257字节
- 示例数量: 2
下载与数据集大小
- 下载大小: 21358字节
- 数据集大小: 4257字节
标签
- synthetic
- distilabel
- rlaif



