FailureScope Single-Turn Corpus (NeurIPS 2026 E&D)
收藏DataCite Commons2026-05-05 更新2026-05-07 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.20034012
下载链接
链接失效反馈官方服务:
资源简介:
FailureScope single-turn corpus: 2,664 tasks pooled from GSM8K, ARC-Challenge, MMLU, HumanEval, MBPP, and IFEval, evaluated across 18 language models (1.3B to 70B+ parameters). Each record contains the task prompt, source benchmark, gold answer, per-model pass/fail labels, taxonomy assignment, and the leave-one-model-out (LOMO) clustering cluster ID. Released to support reproducible cross-model behavioral diagnosis. This dataset is one of three components of the FailureScope release (single-turn, multi-turn, adversarial); see related identifiers on the umbrella record DOI 10.5281/zenodo.20037167. Companion Croissant 1.0 metadata file is bundled with the FailureScope Croissant package on the umbrella record. Usage and field schema are documented in the included README and the paper appendix.
提供机构:
Zenodo
创建时间:
2026-05-05



