aditijc/snooker-testbed-phase4f2-longer-v1
收藏Hugging Face2026-04-27 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/aditijc/snooker-testbed-phase4f2-longer-v1
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
tags:
- snooker-testbed
- phase-4f2
- ppo
- multidiscrete
- BIG-WIN
- sustained-above-random
---
# snooker-testbed-phase4f2-longer-v1
Phase 4F2: extends 4F's 500k-step breakthrough to 1.5M steps. Job 7316111 on torch h200 gh112, 52m18s wall, ran 2026-04-27 14:14-15:06 UTC. SUSTAINED IMPROVEMENT THROUGHOUT TRAINING — trajectory does NOT plateau, continues climbing past 4F's 500k endpoint. Mean over 15 evals: 3.19 (vs 4F's 1.25, 4G's 0.76, random 2.10). Peak score 5.69 at step 1.5M (2.7x random!). Single-ep max 23 points (highest of any run). Foul rate broke 95% multiple times: low 93.9% at step 1.2M (vs 4F's 96.4%). Same recipe as 4F: PPO + MultiDiscrete([36,8,5,5]) + 79-D augmented obs + curriculum [0,1,2,3,4]. The action discretization breakthrough scales with training time.
## Dataset Info
- **Rows**: 15
- **Columns**: 8
## Columns
| Column | Type | Description |
|--------|------|-------------|
| step | Value('int64') | Global PPO timestep (100k-1.5M) |
| curriculum_stage | Value('int64') | *No description provided* |
| mean_score | Value('float64') | Mean over 16 eps. CLIMBED throughout training. Peak 5.69 at step 1.5M. |
| max_score | Value('float64') | Best single-ep score (peak 23 at step 1.5M — highest of any run) |
| mean_shots | Value('float64') | *No description provided* |
| mean_efficiency | Value('float64') | *No description provided* |
| mean_foul_rate | Value('float64') | Sustained drop 99% → 93.9% across training (vs 4F best 96.4%) |
| episodes | Value('int64') | *No description provided* |
## Generation Parameters
```json
{
"script_name": "sbatch/train_sim.sbatch (Phase 4F2 \u2014 discrete actions, 1.5M steps)",
"model": "stable-baselines3 PPO, MlpPolicy net_arch=[256,256], MultiDiscrete([36,8,5,5])",
"description": "Phase 4F2: extends 4F's 500k-step breakthrough to 1.5M steps. Job 7316111 on torch h200 gh112, 52m18s wall, ran 2026-04-27 14:14-15:06 UTC. SUSTAINED IMPROVEMENT THROUGHOUT TRAINING \u2014 trajectory does NOT plateau, continues climbing past 4F's 500k endpoint. Mean over 15 evals: 3.19 (vs 4F's 1.25, 4G's 0.76, random 2.10). Peak score 5.69 at step 1.5M (2.7x random!). Single-ep max 23 points (highest of any run). Foul rate broke 95% multiple times: low 93.9% at step 1.2M (vs 4F's 96.4%). Same recipe as 4F: PPO + MultiDiscrete([36,8,5,5]) + 79-D augmented obs + curriculum [0,1,2,3,4]. The action discretization breakthrough scales with training time.",
"hyperparameters": {
"algorithm": "PPO",
"total_timesteps": 1500000,
"n_envs": 8,
"n_steps": 512,
"batch_size": 2048,
"ent_coef": 0.01,
"learning_rate": 0.0003,
"gamma": 0.99,
"discrete_actions": true,
"discrete_phi_bins": 36,
"discrete_force_bins": 8,
"discrete_spin_bins": 5,
"obs_dim": 79,
"code_version": "2026-04-27-phase4f-discrete",
"commit": "62b5567"
},
"input_datasets": [],
"experiment_name": "snooker-testbed",
"job_id": "torch:7316111",
"cluster": "torch",
"artifact_status": "final",
"canary": false
}
```
## Experiment Documentation
For complete experiment details, see [https://github.com/aditijc/snooker-testbed](https://github.com/aditijc/snooker-testbed)
## Usage
```python
from datasets import load_dataset
dataset = load_dataset("aditijc/snooker-testbed-phase4f2-longer-v1", split="train")
print(f"Loaded {len(dataset)} rows")
```
---
提供机构:
aditijc



