8Planetterraforming/solutions-training-v4
收藏Hugging Face2026-04-15 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/8Planetterraforming/solutions-training-v4
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
task_categories:
- text-generation
- question-answering
- text-classification
language:
- en
tags:
- parameter-golf
- auxiliary-training
- uncertainty-calibration
- context-management
- exact-reasoning
- hallucination-reduction
- long-context
- structured-output
size_categories:
- 10K<n<100K
pretty_name: "Parameter Golf Auxiliary Dataset V4: Calibration, Canonical State, and Exactness"
---
# Parameter Golf Auxiliary Dataset V4 (20,000 examples)
This dataset is a synthetic auxiliary training corpus designed around three concrete model failure modes observed during iterative work on OpenAI Parameter Golf submissions.
The dataset is built mainly from the following recurring failure patterns:
1. **Hallucination / premature guessing instead of calibrated uncertainty**
- The model answers too early from weak context.
- It should ask for high-impact missing variables first.
- It should separate confirmed facts from assumptions and from unverified sources.
- It should avoid acting certain when the state is incomplete.
2. **Weak long-context handling and poor canonical state tracking**
- The model answers from old conversation fragments instead of the newest verified state.
- It gives long responses with too many commands at once, then later steps become wasted when step 2 already fails.
- It should prefer a single canonical project state:
- current best run
- current best BPB
- current submission status
- It should answer stepwise and briefly when the user is debugging live.
3. **Fragility on exact discrete objects**
- exact numbers
- exact shell commands
- exact filenames and paths
- exact delimiter-sensitive outputs
- exact cube-volume sequences using the x8 pattern
- exact arithmetic where "plausible" is not enough
## Intended use
This is **not** a replacement for the official main training corpus in Parameter Golf.
It is intended as a **small auxiliary dataset** to mix into a main run in order to:
- reduce entropy on wrong-but-plausible continuations,
- improve calibration,
- improve exactness on structured text,
- improve state tracking during multi-step technical conversations.
## Suggested use
Start conservatively:
- **97% main corpus**
- **3% this auxiliary corpus**
Do not assume that larger mixing is better.
## Dataset structure
Each record has:
- `task`
- `category`
- `input`
- `target`
- `difficulty`
- optional `source_theme`
- optional `notes`
## Main categories
### 1. uncertainty_calibration
Examples focus on:
- asking clarifying questions before a recommendation,
- distinguishing verified from unverified evidence,
- refusing to guess when key variables are missing,
- turning uncertainty into a structured next question rather than a fabricated answer.
### 2. context_state_management
Examples focus on:
- selecting the newest verified project state,
- not drifting to obsolete BPB references,
- short stepwise debugging assistance,
- minimizing wasted tokens during technical help,
- maintaining a single canonical "source of truth".
### 3. exact_discrete_reasoning
Examples focus on:
- exact cube volume calculations,
- x8 sequence continuation,
- arithmetic shortcuts that preserve exactness,
- exact shell commands and exact file paths.
## Why this may help BPB
In Parameter Golf, BPB is reduced when the model places higher probability on the correct next continuation and lower probability on plausible-but-wrong continuations.
This auxiliary dataset is designed to help with exactly that:
- less blind guessing,
- better source separation,
- stronger state consistency,
- more reliable exact outputs.
## Notes
This dataset is synthetic and hand-shaped around observed failure patterns.
It should be treated as an **auxiliary training signal**, not as a benchmark of broad model quality.
## Files
- `train.jsonl`
- `validation.jsonl`
- `test.jsonl`
- `schema.json`
- `dataset_info.json`
- `prepare_aux_text.py`
- `RUNPOD_NOTES.md`
提供机构:
8Planetterraforming



