reasoning-degeneration-dev/ttt-discover-viz-test-cp24
收藏Hugging Face2026-03-23 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/reasoning-degeneration-dev/ttt-discover-viz-test-cp24
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
tags:
- ttt-discover
- test-time-training
- qwen3-8b
- you-are-an-expert-mathematicia
---
# ttt-discover-viz-test-cp24
TTT-Discover training trace: Qwen/Qwen3-8B on 'You are an expert mathematician specializing in circle packing problems and comp'
## Dataset Info
- **Rows**: 5
- **Columns**: 18
## Columns
| Column | Type | Description |
|--------|------|-------------|
| run_id | Value('string') | Unique run identifier |
| model | Value('string') | Full model name |
| question | Value('string') | Prompt template (contains __STATE_CTX__ placeholder for PUCT injection) |
| answer | Value('string') | Target value string |
| step | Value('int64') | Training step number (0-indexed) |
| num_groups | Value('int64') | Number of PUCT groups per step (each group expands one parent state) |
| group_size | Value('int64') | Rollouts per group |
| total_rollouts | Value('int64') | Total rollouts this step (num_groups * group_size) |
| avg_reward | Value('float64') | Mean reward across all rollouts |
| best_reward | Value('float64') | Max reward across all rollouts |
| nonzero_frac | Value('float64') | Fraction of rollouts with reward > 0 |
| best_code | Value('string') | Extracted Python code from the highest-reward rollout this step |
| loss | Value('float64') | Policy gradient training loss |
| reward_delta | Value('float64') | Change in avg_reward from previous step |
| groups | Value('string') | JSON list of groups: [{parent_state_id, parent_value, state_context, rollouts: [{text, reward, advantage, rank, code}]}] |
| config | Value('string') | JSON hyperparameters |
| timestamp | Value('string') | ISO timestamp |
| puct_tree | Value('string') | JSON PUCT tree snapshot: {nodes: [{id, value, visits, timestep, is_root, selected, code_preview}], edges: [{source, target}]} |
## Generation Parameters
```json
{
"script_name": "scripts/run_ttt_discover.py",
"model": "Qwen/Qwen3-8B",
"description": "TTT-Discover training trace: Qwen/Qwen3-8B on 'You are an expert mathematician specializing in circle packing problems and comp'",
"hyperparameters": {
"task_id": "circle_packing_24",
"num_steps": 5,
"group_size": 16,
"num_groups": 2,
"total_rollouts": 32,
"lr": 4e-05,
"lora_rank": 32,
"lora_alpha": 64,
"temperature": 1.0,
"max_tokens": 15000,
"seed": 42,
"start_step": 0,
"resume_from": null
},
"input_datasets": []
}
```
## Usage
```python
from datasets import load_dataset
dataset = load_dataset("reasoning-degeneration-dev/ttt-discover-viz-test-cp24", split="train")
print(f"Loaded {len(dataset)} rows")
```
---
*This dataset is tracked in [reasoning-degeneration-dev/PROJECT-MANIFEST](https://huggingface.co/datasets/reasoning-degeneration-dev/PROJECT-MANIFEST)*
提供机构:
reasoning-degeneration-dev



