FailureScope Multi-Turn Corpus (NeurIPS 2026 E&D)
收藏DataCite Commons2026-05-05 更新2026-05-07 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.20034373
下载链接
链接失效反馈官方服务:
资源简介:
FailureScope multi-turn corpus: 363 frontier-hard tasks across a 110-axis by 7-depth taxonomy with binary-search frontier detection. Each task is evaluated at multiple turn depths against five frontier models (Claude Sonnet 4.6, GPT-5.4, GPT-5.4-nano, Claude Haiku 4.5, DeepSeek-v3.2). Records contain dialog turns, per-depth per-model pass/fail labels, axis/depth coordinates, and LOMO cluster assignments. Surfaces depth-conditioned failure patterns including a 162-task universal-collapse cluster (failure on all five models at depth 6) and smaller model-selective signals. This dataset is one of three components of the FailureScope release; see related identifiers on the umbrella record DOI 10.5281/zenodo.20037167. Companion Croissant 1.0 metadata file is bundled with the FailureScope Croissant package on the umbrella record.
提供机构:
Zenodo
创建时间:
2026-05-05



