MMJBDS/reflexbench-eval
收藏Hugging Face2026-04-22 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/MMJBDS/reflexbench-eval
下载链接
链接失效反馈官方服务:
资源简介:
ReflexBench评估结果数据集来自ReflexBench v1.0,这是第一个用于测量大型语言模型中反射推理能力(Observer Depth)的基准测试。数据集包含了多个前沿大型语言模型在不同观察深度(OD-0到OD-n)下的评估结果,显示了从表面推理到递归平衡推理的系统性退化现象。这一退化现象与模型规模和一般推理能力无关,表明反射智能是一个独特的、未经充分训练的认知维度。
Evaluation results from ReflexBench v1.0 — the first benchmark for measuring reflexive reasoning (Observer Depth) in large language models. The dataset includes evaluation results of multiple frontier LLMs across different observer depths (OD-0 to OD-n), showing systematic degradation from surface to recursive equilibrium reasoning. This degradation is independent of model scale and general reasoning capability, suggesting reflexive intelligence is a distinct, under-trained cognitive dimension.
提供机构:
MMJBDS



