reasoning-degeneration-dev/ttt-discover-circle_packing_32-qwen3-8b
收藏Hugging Face2026-03-26 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/reasoning-degeneration-dev/ttt-discover-circle_packing_32-qwen3-8b
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
tags:
- ttt-discover
- test-time-training
- qwen3-8b
- you-are-an-expert-mathematicia
---
# ttt-discover-circle_packing_32-qwen3-8b
TTT-Discover training trace: Qwen/Qwen3-8B on 'You are an expert mathematician specializing in circle packing problems and comp'
## Dataset Info
- **Rows**: 16
- **Columns**: 13
## Columns
| Column | Type | Description |
|--------|------|-------------|
| run_id | Value('string') | Unique run identifier |
| model | Value('string') | Full model name |
| question | Value('string') | Prompt template (contains __STATE_CTX__ placeholder for PUCT injection) |
| answer | Value('string') | Target value string |
| epoch | Value('int64') | *No description provided* |
| group_size | Value('int64') | Rollouts per group |
| avg_reward | Value('float64') | Mean reward across all rollouts |
| best_reward | Value('float64') | Max reward across all rollouts |
| loss | Value('float64') | Policy gradient training loss |
| reward_delta | Value('float64') | Change in avg_reward from previous step |
| rollouts | Value('string') | *No description provided* |
| config | Value('string') | JSON hyperparameters |
| timestamp | Value('string') | ISO timestamp |
## Generation Parameters
```json
{
"script_name": "scripts/run_ttt_discover.py",
"model": "Qwen/Qwen3-8B",
"description": "TTT-Discover training trace: Qwen/Qwen3-8B on 'You are an expert mathematician specializing in circle packing problems and comp'",
"hyperparameters": {
"task_id": "circle_packing_32",
"num_steps": 50,
"group_size": 64,
"num_groups": 8,
"total_rollouts": 512,
"lr": 4e-05,
"lora_rank": 32,
"lora_alpha": 64,
"temperature": 1.0,
"max_tokens": 15000,
"seed": 42,
"start_step": 13,
"resume_from": "/mnt/home/zsprague/code/JobToolKit/discover_output/circle_packing_32/lora_step_12"
},
"input_datasets": []
}
```
## Usage
```python
from datasets import load_dataset
dataset = load_dataset("reasoning-degeneration-dev/ttt-discover-circle_packing_32-qwen3-8b", split="train")
print(f"Loaded {len(dataset)} rows")
```
---
*This dataset is tracked in [reasoning-degeneration-dev/PROJECT-MANIFEST](https://huggingface.co/datasets/reasoning-degeneration-dev/PROJECT-MANIFEST)*
许可证:MIT许可证
标签:
- 测试时训练发现(Test-Time Training Discover, TTT-Discover)
- 测试时训练(Test-Time Training)
- Qwen3-8B(通义千问3-8B大语言模型)
- you-are-an-expert-mathematician(原标签拼写应为mathematician,译为「专家数学家提示」)
# ttt-discover-circle_packing_32-qwen3-8b
## TTT-Discover训练轨迹说明
本数据集为Qwen/Qwen3-8B模型在提示语「您是专攻圆填充问题及相关领域的专家数学家」下的TTT-Discover训练轨迹(原文中「comp」为缩写,保留原形式)。
## 数据集信息
- **样本行数**:16
- **列数**:13
## 字段说明
| 字段名 | 数据类型 | 字段说明 |
|--------|----------|----------|
| run_id | 字符串类型 | 唯一运行标识符 |
| model | 字符串类型 | 完整模型名称 |
| question | 字符串类型 | 提示模板(包含用于PUCT注入的`__STATE_CTX__`占位符) |
| answer | 字符串类型 | 目标值字符串 |
| epoch | int64整数类型 | *未提供说明* |
| group_size | int64整数类型 | 每组滚动展开次数 |
| avg_reward | float64浮点类型 | 所有滚动展开的平均奖励值 |
| best_reward | float64浮点类型 | 所有滚动展开中的最大奖励值 |
| loss | float64浮点类型 | 策略梯度训练损失 |
| reward_delta | float64浮点类型 | 当前步骤相较于前一步骤的平均奖励变化量 |
| rollouts | 字符串类型 | *未提供说明* |
| config | 字符串类型 | JSON格式的超参数配置 |
| timestamp | 字符串类型 | ISO标准时间戳 |
## 生成参数
json
{
"脚本路径": "scripts/run_ttt_discover.py",
"模型": "Qwen/Qwen3-8B",
"描述": "Qwen/Qwen3-8B模型在"您是专攻圆填充问题及相关领域的专家数学家"提示下的TTT-Discover训练轨迹",
"超参数": {
"任务标识符": "circle_packing_32(圆填充32任务)",
"总步数": 50,
"每组展开数": 64,
"分组数": 8,
"总滚动展开次数": 512,
"学习率(Learning Rate, lr)": 4e-05,
"LoRA秩(Low-Rank Adaptation, LoRA)": 32,
"LoRA缩放系数": 64,
"温度系数": 1.0,
"最大Token(Token)数": 15000,
"随机种子": 42,
"起始步数": 13,
"恢复训练路径": "/mnt/home/zsprague/code/JobToolKit/discover_output/circle_packing_32/lora_step_12"
},
"输入数据集列表": []
}
## 使用方法
python
from datasets import load_dataset
# 加载指定数据集的训练划分
dataset = load_dataset("reasoning-degeneration-dev/ttt-discover-circle_packing_32-qwen3-8b", split="train")
# 打印加载的样本总数
print(f"已加载 {len(dataset)} 条样本")
*本数据集已在[reasoning-degeneration-dev/PROJECT-MANIFEST](https://huggingface.co/datasets/reasoning-degeneration-dev/PROJECT-MANIFEST)中进行追踪备案*
提供机构:
reasoning-degeneration-dev



