maga666/reflexbench

Name: maga666/reflexbench
Creator: maga666
Published: 2026-04-24 15:58:09
License: 暂无描述

Hugging Face2026-04-24 更新2026-04-26 收录

下载链接：

https://hf-mirror.com/datasets/maga666/reflexbench

下载链接

链接失效反馈

官方服务：

资源简介：

ReflexBench是第一个旨在评估大型语言模型中反射性推理能力的基准测试，即分析自身对环境的因果影响的能力。该数据集包含20个场景，覆盖6个领域（金融市场、政策与治理、社会技术、医疗保健、自主系统、教育与劳动），每个场景探测4个观察者深度级别（OD-0到OD-n），共计80个评估点。数据集通过两阶段评分协议（自动预评分和人工校准）进行评估，并展示了9个大型语言模型在不同OD级别上的表现。

ReflexBench is the first benchmark designed to evaluate reflexive reasoning in large language models — the capacity to reason about ones own causal impact on the environment being analyzed. The dataset consists of 20 scenarios across 6 domains (Financial Markets, Policy & Governance, Social Technology, Healthcare, Autonomous Systems, Education & Labor), each probing 4 levels of Observer Depth (OD-0 to OD-n), totaling 80 evaluation points. The dataset is evaluated via a two-stage scoring protocol (automated pre-scoring and human calibration) and presents results for 9 LLMs across different OD levels.

提供机构：

maga666

5,000+

优质数据集

54 个

任务类型

进入经典数据集