omnitom/omnitom-benchmark-review

Name: omnitom/omnitom-benchmark-review
Creator: omnitom
Published: 2026-04-22 00:33:50
License: 暂无描述

Hugging Face2026-04-22 更新2026-04-26 收录

下载链接：

https://hf-mirror.com/datasets/omnitom/omnitom-benchmark-review

下载链接

链接失效反馈

官方服务：

资源简介：

OmniToM是一个通过显式信念结构建模评估语言模型中心理理论的基准。与仅对社会推理问题的终点答案评分不同，OmniToM揭示了模型必须构建的中间信念结构，以便连贯地推理不同行为者知道、相信、推断、意图或误解的内容。每个示例都是一个短篇故事，配有：一组以行为者为中心的信念命题、一个保留的“世界”行为者用于叙述者/世界事实，以及每个信念的七维模式标签向量。该基准支持两个相互关联的任务：1. 信念提取：给定一个故事，提取相关的信念结构为（行为者、信念、顺序）元组；2. 信念标注：给定故事和信念元组，沿七个封闭集模式维度标注每个信念。

OmniToM is a benchmark for evaluating Theory of Mind in language models through explicit belief-structure modeling. Instead of scoring only endpoint answers to social reasoning questions, OmniToM exposes the intermediate belief structure that a model must build in order to reason coherently about what different actors know, believe, infer, intend, or misunderstand. Each example is a short story paired with: a set of actor-centered belief propositions, a reserved `world` actor for narrator/world facts, and a seven-dimensional schema label vector for every belief. The benchmark supports two linked tasks: 1. Belief Extraction: Given a story, extract the relevant belief structure as `(Actor, Belief, Order)` tuples. 2. Belief Labeling: Given the story and belief tuples, label each belief along seven closed-set schema dimensions.

提供机构：

omnitom

5,000+

优质数据集

54 个

任务类型

进入经典数据集