neogenesislab/whylab-gemini-2-5-docker-validation
收藏Hugging Face2026-04-29 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/neogenesislab/whylab-gemini-2-5-docker-validation
下载链接
链接失效反馈官方服务:
资源简介:
WhyLab Gemini 2.5 Flash Docker验证数据集(诚实零结果)是一个公开的实验证据数据集,源自Neo Genesis的NeurIPS 2026论文《WhyLab: Causal Decision Intelligence Engine》。该数据集记录了WhyLab的适应性C2方法在SWE-bench上的Docker真实验证结果,使用的是Gemini 2.5 Flash作为基础语言模型。数据集包含67个问题、3个种子和2种条件的402个完整事件级证据,旨在支持研究的可重复性,并记录了一个诚实零结果,即适应性C2方法在该特定切片上并未显著优于固定C2方法。这一发现促使论文从普遍增益的主张转向了基于阶段感知部署和选择性干预的框架。数据集采用CC-BY-4.0许可,可供研究和商业用途,但需注明来源。
The WhyLab Gemini 2.5 Flash Docker Validation (Honest Null Result) dataset is a collection of full episode-level evidence from a Docker ground-truth validation of WhyLabs adaptive C2 method on SWE-bench, using Gemini 2.5 Flash as the underlying language model. It consists of 402 episodes (67 problems × 3 seeds × 2 conditions) and is released as an open dataset to support reproducibility and to document an honest null result where the adaptive C2 follow-up did NOT beat fixed C2 on this slice. This finding led to a recalibration of the original paper, reframing it around phase-aware deployment and selective intervention rather than universal gain. The dataset is licensed under CC-BY-4.0 and is free for research and commercial use with attribution to Neo Genesis and the underlying paper.
提供机构:
neogenesislab



