shadow-bench/ShadowBench

Name: shadow-bench/ShadowBench
Creator: shadow-bench
Published: 2026-04-28 18:18:30
License: 暂无描述

Hugging Face2026-04-28 更新2026-05-03 收录

下载链接：

https://hf-mirror.com/datasets/shadow-bench/ShadowBench

下载链接

链接失效反馈

官方服务：

资源简介：

ShadowBench是一个诊断性框架，旨在评估大型语言模型（LLMs）的影子知识。与传统基准测试使用显式实体名称（如Elon Musk）不同，ShadowBench评估模型在这些词汇锚点被移除时是否能导航其内部知识图。核心任务是双特质关联（DTA），模型需要将一个匿名影子描述（特质A）与第二个独立事实（特质B）在三个硬负干扰项中关联起来。数据集包含技术、体育（网球）和娱乐（演员）三个主要领域，并分为多个分片，如upper_shadow、lower_shadow等。每个样本包含实体名称、问题描述、选项、正确答案和元数据。数据集经过多次迭代硬化，确保成功严格依赖于潜在语义推理。

ShadowBench is a diagnostic framework designed to evaluate the Shadow Knowledge of Large Language Models (LLMs). While traditional benchmarks measure factual recall using explicit entity names (e.g., Elon Musk), ShadowBench evaluates whether a model can navigate its internal knowledge graph when these lexical anchors are removed. The core task is Dual-Trait Association (DTA), where a model must associate an anonymized shadow description (Trait A) with a second, independent fact (Trait B) among three Hard Negative distractors. The dataset covers Technology, Sports (Tennis), and Entertainment (Actors) domains and includes multiple splits like upper_shadow, lower_shadow, etc. Each sample contains the entity name, question description, choices, correct answer, and metadata. The dataset is adversarially hardened to ensure success is strictly contingent on latent semantic reasoning.

提供机构：

shadow-bench

5,000+

优质数据集

54 个

任务类型

进入经典数据集