reasoning-degeneration-dev/wmc-sft-baseline-v2
收藏Hugging Face2026-03-23 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/reasoning-degeneration-dev/wmc-sft-baseline-v2
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
tags:
- world-model-curiosity
- sft-warmup
- baseline
- countdown
---
# wmc-sft-baseline-v2
SFT warmup dataset for baseline GRPO condition. 1200 Countdown problems (5 numbers, +/-/*) solved by Qwen3-1.7B with 32k token generation. Contains full untruncated reasoning traces. No confidence tags.
## Dataset Info
- **Rows**: 1200
- **Columns**: 3
## Columns
| Column | Type | Description |
|--------|------|-------------|
| messages | List({'content': Value('string'), 'role': Value('string')}) | Chat-format conversation (system, user, assistant). System sets task format, user provides numbers/target, assistant contains full reasoning trace in <think> tags followed by answer. |
| correct | Value('bool') | Boolean indicating whether the model's final expression evaluates to the target number. |
| difficulty | Value('string') | Problem difficulty tier: easy, medium, or hard (based on number count and operator complexity). |
## Generation Parameters
```json
{
"script_name": "generate_sft_data.py",
"model": "Qwen/Qwen3-1.7B",
"description": "SFT warmup dataset for baseline GRPO condition. 1200 Countdown problems (5 numbers, +/-/*) solved by Qwen3-1.7B with 32k token generation. Contains full untruncated reasoning traces. No confidence tags.",
"hyperparameters": {
"temperature": 0.7,
"max_tokens": 32768,
"top_p": 0.9
},
"input_datasets": []
}
```
## Experiment Documentation
For complete experiment details, see [https://github.com/Zayne-sprague/SC-Research-Notes/tree/main/experiments/world-model-curiosity](https://github.com/Zayne-sprague/SC-Research-Notes/tree/main/experiments/world-model-curiosity)
## Usage
```python
from datasets import load_dataset
dataset = load_dataset("reasoning-degeneration-dev/wmc-sft-baseline-v2", split="train")
print(f"Loaded {len(dataset)} rows")
```
---
*This dataset is tracked in [reasoning-degeneration-dev/PROJECT-MANIFEST](https://huggingface.co/datasets/reasoning-degeneration-dev/PROJECT-MANIFEST)*
许可证:MIT许可证
标签:
- 世界模型好奇心(world-model-curiosity)
- 监督微调预热(Supervised Fine-Tuning, SFT)
- 基线
- 倒计时
# wmc-sft-baseline-v2
本数据集为基线GRPO条件下的监督微调预热数据集,包含1200个倒计时问题(使用5个数字与加减乘运算符),由通义千问3-1.7B(Qwen3-1.7B)模型生成,生成长度上限为32768词元(Token),包含完整未截断的推理轨迹,无置信度标签。
## 数据集信息
- **行数**: 1200
- **列数**: 3
## 列信息
| 列名 | 数据类型 | 描述 |
|------|----------|------|
| messages | List({'content': Value('string'), 'role': Value('string')}) | 聊天格式对话(包含系统、用户、助手三类角色):系统提示用于设定任务格式,用户输入数字与目标数值,助手回复则包含包裹在<think>标签内的完整推理轨迹,其后跟随最终答案。 |
| correct | Value('bool') | 布尔值,用于指示模型生成的最终表达式计算结果是否与目标数值一致。 |
| difficulty | Value('string') | 字符串类型的问题难度层级,分为简单(easy)、中等(medium)、困难(hard)三类,划分依据为数字数量与运算符复杂度。 |
## 生成参数
json
{
"script_name": "generate_sft_data.py",
"model": "Qwen/Qwen3-1.7B",
"description": "本数据集为基线GRPO条件下的监督微调预热数据集,包含1200个倒计时问题(使用5个数字与加减乘运算符),由通义千问3-1.7B(Qwen3-1.7B)模型生成,生成长度上限为32768词元(Token),包含完整未截断的推理轨迹,无置信度标签。",
"hyperparameters": {
"temperature": 0.7,
"max_tokens": 32768,
"top_p": 0.9
},
"input_datasets": []
}
## 实验文档
如需查看完整实验细节,请访问:[https://github.com/Zayne-sprague/SC-Research-Notes/tree/main/experiments/world-model-curiosity](https://github.com/Zayne-sprague/SC-Research-Notes/tree/main/experiments/world-model-curiosity)
## 使用方法
python
from datasets import load_dataset
dataset = load_dataset("reasoning-degeneration-dev/wmc-sft-baseline-v2", split="train")
print(f"已加载 {len(dataset)} 条数据")
*本数据集已在[reasoning-degeneration-dev/PROJECT-MANIFEST](https://huggingface.co/datasets/reasoning-degeneration-dev/PROJECT-MANIFEST)中进行追踪*
提供机构:
reasoning-degeneration-dev



