maomao88/TinyStoriesInstruct-Formatted
收藏Hugging Face2026-03-23 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/maomao88/TinyStoriesInstruct-Formatted
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: text
dtype: string
splits:
- name: train
num_bytes: 1103636977
num_examples: 861992
- name: validation
num_bytes: 11037684
num_examples: 8544
download_size: 555101925
dataset_size: 1114674661
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
- split: validation
path: data/validation-*
license: mit
task_categories:
- text-generation
language:
- en
---
## Dataset Description
This dataset is a cleaned and reformatted version of **roneneldan/TinyStoriesInstruct**.
The following changed were made to improve data quality and make it more suitable for instruction-tuning:
* **Filtered repetitive openings:** Samples with repeated phrases like "once upon a time" at the beginning were removed.
* **Removed empty or invalid entries:** Any stories with missing or empty content were excluded.
* **Standardized instruction-response format:** Each example is converted into the following structure:
```
### Instruction:
{instruction}
### Response:
{response}{tokenizer.eos_token}
```
提供机构:
maomao88



