Dataset for Testing embodied AI with accessibility for referent Identification and Theory of Mind
收藏DataONE2026-02-20 更新2026-02-28 收录
下载链接:
https://search.dataone.org/view/sha256:e4d2da4488d65c95d052bb0308941d5a26a73334dc318a50abb7d19cd63a02dc
下载链接
链接失效反馈官方服务:
资源简介:
This dataset was developed to evaluate Multimodal Large Language Models (MLLMs) on their ability to identify referents based on accessibility—the ease with which a speaker can acquire an object, as conveyed through linguistic cues like demonstratives (“this,” “that”). The task reflects key challenges in Embodied AI, particularly in resolving referential ambiguity in environments containing multiple similar objects using only natural linguistic input. Unlike traditional NLP benchmarks, this dataset requires models to understand text and interpret spatial proximity, viewpoint shifts, and self-other distinctions in visual scenes. It uses pair-to-pair question formats to minimize confounding variables and isolate accessibility-based reasoning. Preliminary results show that while MLLMs perform well on basic spatial reasoning, they struggle significantly with accessibility-based referent identification and perspective shifts, achieving only 0–5% accuracy compared to human baselines of over 70%. This resource is intended for researchers in embodied AI, cognitive modeling, multimodal NLP, and Theory of Mind studies. It supports benchmarking, error analysis, and architectural evaluation of current MLLMs, highlighting their limitations in simulating human-like referent resolution and perspective-taking.
创建时间:
2026-02-22



