AmanPriyanshu/reasoning-sft-PleIAs-SYNTH-1M
收藏Hugging Face2026-02-26 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/AmanPriyanshu/reasoning-sft-PleIAs-SYNTH-1M
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cdla-permissive-2.0
task_categories:
- text-generation
- question-answering
language:
- en
tags:
- reasoning
- sft
- chain-of-thought
- synthetic
- wikipedia
pretty_name: reasoning-sft-PleIAs-SYNTH-1M
size_categories:
- 1M<n<10M
---
# reasoning-sft-PleIAs-SYNTH-1M
1M English samples subsampled from [PleIAs/SYNTH](https://huggingface.co/datasets/PleIAs/SYNTH) and converted into a clean messages format for SFT/reasoning model training.
Each row has two relevant columns:
- `input` — list of dicts (conversation turns with `role` and `content`), ending on the last user turn
- `response` — string formatted as `<think>\n{chain-of-thought}\n</think>\n{final answer}`
Sampled to maximise distribution across `query_seed_url`, `model`, and `exercise` axes via round-robin across ~2M candidate rows pulled from all 500 source files.
## License
[CDLA-Permissive-2.0](https://cdla.dev/permissive-2-0/) — free to use, modify, and republish commercially.
Seed data sourced from Wikipedia under [CC-BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/).
## Credits
Original dataset: [PleIAs/SYNTH](https://huggingface.co/datasets/PleIAs/SYNTH) by [Pleias](https://huggingface.co/PleIAs)
Seed data: [Wikimedia / Structured Wikipedia](https://enterprise.wikimedia.com)
提供机构:
AmanPriyanshu



