five

aditijc/snooker-testbed-phase4f2-longer-v1

收藏
Hugging Face2026-04-27 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/aditijc/snooker-testbed-phase4f2-longer-v1
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit tags: - snooker-testbed - phase-4f2 - ppo - multidiscrete - BIG-WIN - sustained-above-random --- # snooker-testbed-phase4f2-longer-v1 Phase 4F2: extends 4F's 500k-step breakthrough to 1.5M steps. Job 7316111 on torch h200 gh112, 52m18s wall, ran 2026-04-27 14:14-15:06 UTC. SUSTAINED IMPROVEMENT THROUGHOUT TRAINING — trajectory does NOT plateau, continues climbing past 4F's 500k endpoint. Mean over 15 evals: 3.19 (vs 4F's 1.25, 4G's 0.76, random 2.10). Peak score 5.69 at step 1.5M (2.7x random!). Single-ep max 23 points (highest of any run). Foul rate broke 95% multiple times: low 93.9% at step 1.2M (vs 4F's 96.4%). Same recipe as 4F: PPO + MultiDiscrete([36,8,5,5]) + 79-D augmented obs + curriculum [0,1,2,3,4]. The action discretization breakthrough scales with training time. ## Dataset Info - **Rows**: 15 - **Columns**: 8 ## Columns | Column | Type | Description | |--------|------|-------------| | step | Value('int64') | Global PPO timestep (100k-1.5M) | | curriculum_stage | Value('int64') | *No description provided* | | mean_score | Value('float64') | Mean over 16 eps. CLIMBED throughout training. Peak 5.69 at step 1.5M. | | max_score | Value('float64') | Best single-ep score (peak 23 at step 1.5M — highest of any run) | | mean_shots | Value('float64') | *No description provided* | | mean_efficiency | Value('float64') | *No description provided* | | mean_foul_rate | Value('float64') | Sustained drop 99% → 93.9% across training (vs 4F best 96.4%) | | episodes | Value('int64') | *No description provided* | ## Generation Parameters ```json { "script_name": "sbatch/train_sim.sbatch (Phase 4F2 \u2014 discrete actions, 1.5M steps)", "model": "stable-baselines3 PPO, MlpPolicy net_arch=[256,256], MultiDiscrete([36,8,5,5])", "description": "Phase 4F2: extends 4F's 500k-step breakthrough to 1.5M steps. Job 7316111 on torch h200 gh112, 52m18s wall, ran 2026-04-27 14:14-15:06 UTC. SUSTAINED IMPROVEMENT THROUGHOUT TRAINING \u2014 trajectory does NOT plateau, continues climbing past 4F's 500k endpoint. Mean over 15 evals: 3.19 (vs 4F's 1.25, 4G's 0.76, random 2.10). Peak score 5.69 at step 1.5M (2.7x random!). Single-ep max 23 points (highest of any run). Foul rate broke 95% multiple times: low 93.9% at step 1.2M (vs 4F's 96.4%). Same recipe as 4F: PPO + MultiDiscrete([36,8,5,5]) + 79-D augmented obs + curriculum [0,1,2,3,4]. The action discretization breakthrough scales with training time.", "hyperparameters": { "algorithm": "PPO", "total_timesteps": 1500000, "n_envs": 8, "n_steps": 512, "batch_size": 2048, "ent_coef": 0.01, "learning_rate": 0.0003, "gamma": 0.99, "discrete_actions": true, "discrete_phi_bins": 36, "discrete_force_bins": 8, "discrete_spin_bins": 5, "obs_dim": 79, "code_version": "2026-04-27-phase4f-discrete", "commit": "62b5567" }, "input_datasets": [], "experiment_name": "snooker-testbed", "job_id": "torch:7316111", "cluster": "torch", "artifact_status": "final", "canary": false } ``` ## Experiment Documentation For complete experiment details, see [https://github.com/aditijc/snooker-testbed](https://github.com/aditijc/snooker-testbed) ## Usage ```python from datasets import load_dataset dataset = load_dataset("aditijc/snooker-testbed-phase4f2-longer-v1", split="train") print(f"Loaded {len(dataset)} rows") ``` ---
提供机构:
aditijc
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作