Benchmark data and code for evaluating and enhancing spatial cognition abilities of large language models (LLMs)

Name: Benchmark data and code for evaluating and enhancing spatial cognition abilities of large language models (LLMs)
Creator: figshare
Published: 2025-06-01 06:12:46
License: 暂无描述

DataCite Commons2025-06-01 更新2025-05-07 收录

下载链接：

https://figshare.com/articles/dataset/Benchmark_data_and_code_for_evaluating_and_enhancing_spatial_cognition_abilities_of_large_language_models_LLMs_/26526529/1

下载链接

链接失效反馈

官方服务：

资源简介：

Large Language Models (LLMs) exhibit various capabilities that were previously exclusive to humans. However, existing evidence is insufficient to determine whether LLMs have developed spatial cognition, a core aspect of human cognition that supports logical-mathematical reasoning and various other skills. Previous studies on this topic primarily focused on small-scale perceptions, leaving spatial cognition in the context of GIScience unexplored. This paper adheres to the established framework of spatial cognition research, encompassing three types of spatial knowledge: landmark knowledge, route knowledge, and survey knowledge. We present a benchmark assessing the spatial cognition abilities across seven categories to systematically evaluate how well LLMs process and generate the three types of spatial knowledge. Additionally, we propose Hybrid Mind, a tool-augmented approach that integrates LLMs with deterministic GIS algorithms to enhance their performance in spatial cognitive tasks. The core idea involves implementing a mental map builder that generates a quantitative map based on segmented qualitative constraints, effectively overcoming LLMs' fallacies in synthesizing spatial information. Our experimental results indicated that although LLMs showed potential for spatial cognition, their performance was poor in most spatial cognitive tasks, particularly in building route knowledge and survey knowledge. Larger models performed better, yet even the leading model correctly answered fewer than one-fourth of the questions. An examination of samples revealed that LLMs tended to make simple mistakes and often failed when transitioning to egocentric views, deciding turns, and synthesizing information. The Hybrid Mind system significantly improved performance, correctly solving 70.48% of the questions. The materials contain the following: questions.db: A SQLite database for the questions in the benchmarkresults.db: A SQLite database for the evaluations of LLMs and the Hybrid Mind.run_benchmark: Source code of the experiment runner and the Hybrid Mindanalyze: Source code to analyze the results and plot charts

提供机构：

figshare

创建时间：

2025-04-03

5,000+

优质数据集

54 个

任务类型

进入经典数据集