five

omnitom/omnitom-benchmark-review

收藏
Hugging Face2026-04-22 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/omnitom/omnitom-benchmark-review
下载链接
链接失效反馈
官方服务:
资源简介:
OmniToM是一个通过显式信念结构建模评估语言模型中心理理论的基准。与仅对社会推理问题的终点答案评分不同,OmniToM揭示了模型必须构建的中间信念结构,以便连贯地推理不同行为者知道、相信、推断、意图或误解的内容。每个示例都是一个短篇故事,配有:一组以行为者为中心的信念命题、一个保留的“世界”行为者用于叙述者/世界事实,以及每个信念的七维模式标签向量。该基准支持两个相互关联的任务:1. 信念提取:给定一个故事,提取相关的信念结构为(行为者、信念、顺序)元组;2. 信念标注:给定故事和信念元组,沿七个封闭集模式维度标注每个信念。

OmniToM is a benchmark for evaluating Theory of Mind in language models through explicit belief-structure modeling. Instead of scoring only endpoint answers to social reasoning questions, OmniToM exposes the intermediate belief structure that a model must build in order to reason coherently about what different actors know, believe, infer, intend, or misunderstand. Each example is a short story paired with: a set of actor-centered belief propositions, a reserved `world` actor for narrator/world facts, and a seven-dimensional schema label vector for every belief. The benchmark supports two linked tasks: 1. Belief Extraction: Given a story, extract the relevant belief structure as `(Actor, Belief, Order)` tuples. 2. Belief Labeling: Given the story and belief tuples, label each belief along seven closed-set schema dimensions.
提供机构:
omnitom
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作