8Planetterraforming/Parameter-Golf-V10-Critical-Memory-FineWeb-MicroMix
收藏Hugging Face2026-04-21 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/8Planetterraforming/Parameter-Golf-V10-Critical-Memory-FineWeb-MicroMix
下载链接
链接失效反馈官方服务:
资源简介:
Parameter-Golf-V10 Critical Memory FineWeb MicroMix 是一个紧凑的研究数据集,旨在实现两个相互关联的目标:1) 面向压缩的信号提取:教导训练循环偏好紧凑、有证据支持的 FineWeb 类文本,而非样板、广告、Cookie 横幅、导航文本和指令繁重的填充内容;2) 关键记忆和声明纪律:教导助手/工作流程保留最新验证的项目状态,检查 BPB 到纳特的差异,验证统计显著性,并在满足正式规则之前避免声称新记录。该数据集并非 0.8 BPB 的魔法保证,而是将 0.8 BPB 作为研究目标和评估目标写入数据集。
Parameter-Golf-V10 Critical Memory FineWeb MicroMix is a compact research dataset for two connected goals: 1) Compression-oriented signal extraction: teach the training loop to prefer compact, evidence-bearing, FineWeb-like text over boilerplate, ads, cookie banners, navigation text, and instruction-heavy filler. 2) Critical memory and claim discipline: teach the assistant/workflow to preserve the latest verified project state, check BPB-to-nats deltas, verify statistical significance, and avoid claiming a new record before the formal rules are satisfied. The dataset is not a magic guarantee of 0.8 BPB. The 0.8 BPB number is written into the dataset as a research target and evaluation objective.
提供机构:
8Planetterraforming



