oNo-1/difficult_problem_dataset_v5_500
收藏Hugging Face2025-10-23 更新2026-01-03 收录
下载链接:
https://hf-mirror.com/datasets/oNo-1/difficult_problem_dataset_v5_500
下载链接
链接失效反馈官方服务:
资源简介:
---
license: odc-by
task_categories:
- text-generation
language:
- en
size_categories:
- n<1K
---
# OverView
This dataset is a synthetic dataset created using the Scalable Data Generation (SDG) framework.
It is structured for use with a thinking model, and the input and output form a set of questions and answers.
# Pipeline of Data Generation
1.Process-based Question Generation
- A mechanism for automatically generating questions.
2.Curation + Diversity Filter
- A step to ensure quality assurance and diversity, rather than simple generation.
3.Expansion via Evolutionary Methods
- Improvement of questions, answers, through an evolutionary strategy (a genetic algorithm–like refinement cycle).
4.Automatic Generation of Reasoning Process
- Supplementing reasoning, explanation, and grounding.
5.Finally Storage as a Dataset
# Dataset Structure
```
{
"input": "question",
"output": "reasoning and answer",
}
```
# Licence
ODC-BY 1.0 This work contains data from Microsoft Academic Graph, available under the ODC-BY 1.0 license. Source: https://aka.ms/msracad
提供机构:
oNo-1



