TheHorme/my-distiset-84833152
收藏Hugging Face2025-04-11 更新2025-11-29 收录
下载链接:
https://hf-mirror.com/datasets/TheHorme/my-distiset-84833152
下载链接
链接失效反馈官方服务:
资源简介:
---
size_categories: n<1K
task_categories:
- text-generation
- text2text-generation
- question-answering
dataset_info:
features:
- name: prompt
dtype: string
- name: completion
dtype: 'null'
- name: system_prompt
dtype: string
splits:
- name: train
num_bytes: 98493
num_examples: 60
download_size: 31370
dataset_size: 98493
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
tags:
- synthetic
- distilabel
- rlaif
- datacraft
---
<p align="left">
<a href="https://github.com/argilla-io/distilabel">
<img src="https://raw.githubusercontent.com/argilla-io/distilabel/main/docs/assets/distilabel-badge-light.png" alt="Built with Distilabel" width="200" height="32"/>
</a>
</p>
# Dataset Card for my-distiset-84833152
This dataset has been created with [distilabel](https://distilabel.argilla.io/).
## Dataset Summary
This dataset contains a `pipeline.yaml` which can be used to reproduce the pipeline that generated it in distilabel using the `distilabel` CLI:
```console
distilabel pipeline run --config "https://huggingface.co/datasets/TheHorme/my-distiset-84833152/raw/main/pipeline.yaml"
```
or explore the configuration:
```console
distilabel pipeline info --config "https://huggingface.co/datasets/TheHorme/my-distiset-84833152/raw/main/pipeline.yaml"
```
## Dataset structure
The examples have the following structure per configuration:
<details><summary> Configuration: default </summary><hr>
```json
{
"completion": null,
"prompt": "Create a dataset of linear equations in the form of (a*x + b*y = c) where a, b, and c are integers and are chosen randomly from the set of integers between 1 and 100. The dataset should contain 1000 equations.\n\nHere is the dataset:\n\n1. 43x + 91y = 13\n2. 11x + 19y = 7\n3. 85x + 31y = 21\n...\nI will be adding more equations to this dataset as we go, so I can query for a specific amount.\n\nSo, here is the tool. What is the 100th equation of this dataset?\n\nTo find the 100th equation of the dataset, I will not have to look at the first 99 equations, I can just generate the 100th equation. \n\n",
"system_prompt": "You are an AI assistant designed to generate high-quality datasets for training machine learning models, particularly in the field of algebraic equations. Your purpose is to create a comprehensive and diverse dataset that caters to various algebraic concepts, including but not limited to, linear equations, quadratic equations, polynomial equations, rational equations, and systems of equations. The dataset should include a wide range of variables, coefficients, and constants to cover different types of algebraic expressions. Ensure that the dataset is well-structured, correctly formatted, and free of errors. User questions are direct and concise."
}
```
This subset can be loaded as:
```python
from datasets import load_dataset
ds = load_dataset("TheHorme/my-distiset-84833152", "default")
```
Or simply as it follows, since there's only one configuration and is named `default`:
```python
from datasets import load_dataset
ds = load_dataset("TheHorme/my-distiset-84833152")
```
</details>
提供机构:
TheHorme



