reasoning-degeneration-dev/wmc-sft-cpb-v2
收藏Hugging Face2026-03-23 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/reasoning-degeneration-dev/wmc-sft-cpb-v2
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
tags:
- world-model-curiosity
- sft-warmup
- cpb
- countdown
- calibrated-prediction-bonus
---
# wmc-sft-cpb-v2
SFT warmup dataset for CPB GRPO condition. 1200 Countdown problems (5 numbers, +/-/*) solved by Qwen3-1.7B with 32k token generation. Contains confidence annotations via <c>X.X</c> tags (Beta distribution: correct->Beta(8,2)~0.80, wrong->Beta(2,5)~0.28). Full untruncated reasoning traces.
## Dataset Info
- **Rows**: 1200
- **Columns**: 4
## Columns
| Column | Type | Description |
|--------|------|-------------|
| messages | List({'content': Value('string'), 'role': Value('string')}) | Chat-format conversation (system, user, assistant). System instructs model to emit <c>X.X</c> confidence at start of thinking. Assistant content begins with <think><c>0.XX</c> followed by full reasoning trace and answer. |
| correct | Value('bool') | Boolean indicating whether the model's final expression evaluates to the target number. |
| confidence | Value('float64') | Float confidence value (0-1) sampled from Beta distribution: Beta(8,2)~0.80 for correct, Beta(2,5)~0.28 for incorrect. Injected into the <c> tag in the assistant response. |
| difficulty | Value('string') | Problem difficulty tier: easy, medium, or hard (based on number count and operator complexity). |
## Generation Parameters
```json
{
"script_name": "generate_sft_data.py",
"model": "Qwen/Qwen3-1.7B",
"description": "SFT warmup dataset for CPB GRPO condition. 1200 Countdown problems (5 numbers, +/-/*) solved by Qwen3-1.7B with 32k token generation. Contains confidence annotations via <c>X.X</c> tags (Beta distribution: correct->Beta(8,2)~0.80, wrong->Beta(2,5)~0.28). Full untruncated reasoning traces.",
"hyperparameters": {
"temperature": 0.7,
"max_tokens": 32768,
"top_p": 0.9
},
"input_datasets": []
}
```
## Experiment Documentation
For complete experiment details, see [https://github.com/Zayne-sprague/SC-Research-Notes/tree/main/experiments/world-model-curiosity](https://github.com/Zayne-sprague/SC-Research-Notes/tree/main/experiments/world-model-curiosity)
## Usage
```python
from datasets import load_dataset
dataset = load_dataset("reasoning-degeneration-dev/wmc-sft-cpb-v2", split="train")
print(f"Loaded {len(dataset)} rows")
```
---
*This dataset is tracked in [reasoning-degeneration-dev/PROJECT-MANIFEST](https://huggingface.co/datasets/reasoning-degeneration-dev/PROJECT-MANIFEST)*
提供机构:
reasoning-degeneration-dev



