MMJBDS/reflexbench-eval

Name: MMJBDS/reflexbench-eval
Creator: MMJBDS
Published: 2026-04-22 05:26:36
License: 暂无描述

Hugging Face2026-04-22 更新2026-04-26 收录

下载链接：

https://hf-mirror.com/datasets/MMJBDS/reflexbench-eval

下载链接

链接失效反馈

官方服务：

资源简介：

ReflexBench评估结果数据集来自ReflexBench v1.0，这是第一个用于测量大型语言模型中反射推理能力（Observer Depth）的基准测试。数据集包含了多个前沿大型语言模型在不同观察深度（OD-0到OD-n）下的评估结果，显示了从表面推理到递归平衡推理的系统性退化现象。这一退化现象与模型规模和一般推理能力无关，表明反射智能是一个独特的、未经充分训练的认知维度。

Evaluation results from ReflexBench v1.0 — the first benchmark for measuring reflexive reasoning (Observer Depth) in large language models. The dataset includes evaluation results of multiple frontier LLMs across different observer depths (OD-0 to OD-n), showing systematic degradation from surface to recursive equilibrium reasoning. This degradation is independent of model scale and general reasoning capability, suggesting reflexive intelligence is a distinct, under-trained cognitive dimension.

提供机构：

MMJBDS

5,000+

优质数据集

54 个

任务类型

进入经典数据集