nphearum/gsm8k-thinking
收藏Hugging Face2026-03-31 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/nphearum/gsm8k-thinking
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
task_categories:
- table-question-answering
- question-answering
- summarization
language:
- en
tags:
- thinking
- gsm
- math
pretty_name: nphearum/gsm678-thinking
size_categories:
- 1K<n<10K
---
# gsm8k-thinking
A math reasoning dataset built on [openai/gsm8k](https://huggingface.co/datasets/openai/gsm8k), augmented with chain-of-thought thinking traces generated via local inference.
Each record pairs the original GSM8K question and answer with a native thinking trace from the model's reasoning process — suitable for GRPO, DPO, or other preference/reasoning fine-tuning pipelines.
## Dataset Structure
```json
{
"question": "Natalia sold clips to 48 of her friends in April...",
"answer": "Natalia sold 48/2 = <<48/2=24>>24 clips in May.\nNatalia sold 48+24 = <<48+24=72>>72 clips altogether.\n#### 72",
"thinking": "Step 1: In April she sold 48 clips.\nStep 2: In May she sold 48/2 = 24 clips.\nStep 3: Total = 48 + 24 = 72."
}
```
| Field | Description |
|---|---|
| `question` | Original question from GSM8K |
| `answer` | Original GSM8K answer with step-by-step solution and `####` final answer |
| `thinking` | Chain-of-thought reasoning trace |
## Dataset Creation
- **Source dataset:** [openai/gsm8k](https://huggingface.co/datasets/openai/gsm8k) (train split)
- **Thinking:** Native thinking mode (`think=True`)
- **Temperature:** 0.2 (greedy, high quality)
## Usage
```python
from datasets import load_dataset
ds = load_dataset("nphearum/gsm8k-thinking")
# Example record
print(ds["train"][0])
```
### GRPO Training with TRL
```python
from trl import GRPOTrainer, GRPOConfig
trainer = GRPOTrainer(
model=model,
reward_funcs=[reward_fn],
args=GRPOConfig(...),
train_dataset=ds["train"],
)
trainer.train()
```
## License
MIT — the thinking traces are freely usable. Note that the source GSM8K dataset is licensed under [MIT](https://huggingface.co/datasets/openai/gsm8k) as well.
提供机构:
nphearum



