Kakezh/MemoryAgentBench
收藏Hugging Face2026-04-27 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/Kakezh/MemoryAgentBench
下载链接
链接失效反馈官方服务:
资源简介:
MemoryAgentBench是一个用于评估大型语言模型(LLM)代理记忆能力的统一基准数据集。它通过四个核心能力(精确检索、测试时学习、长程理解和冲突解决)和增量多轮交互设计,全面测试记忆系统的性能。数据集包含新构建的数据集(如EventQA和FactConsolidation)以及现有数据集的改编版本,所有数据被分块以模拟真实的多轮对话场景。该数据集旨在揭示当前记忆代理的局限性,并为构建具有真正记忆能力的AI代理提供标准化评估框架。
MemoryAgentBench is a unified benchmark dataset designed to evaluate the memory capabilities of LLM agents. It comprehensively assesses memory systems through four core competencies (Accurate Retrieval, Test-Time Learning, Long-Range Understanding, and Conflict Resolution) and incremental multi-turn interaction design. The dataset includes newly constructed datasets (such as EventQA and FactConsolidation) as well as adapted existing datasets, with all data split into chunks to simulate real multi-turn conversation scenarios. It aims to reveal the limitations of current memory agents and provide a standardized evaluation framework for building AI agents with genuine memory capabilities.
提供机构:
Kakezh



