LTM Benchmark

Name: LTM Benchmark
Creator: GoodAI
License: 暂无描述

arXiv2025-09-30 收录

下载链接：

https://github.com/GoodAI/goodai-ltm

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集是一个动态的基准测试系统，旨在通过模拟涉及多个交错任务的长时间用户与代理互动，来评估对话式代理的表现。结果显示，大型语言模型在短时记忆方面表现良好，但当记忆跨度超过其最大上下文大小时，它们会面临困难。该数据集包括33项测试（11个场景和3次重复），并涵盖了不同的记忆跨度（2千、3.2千、12万、20万和50万标记）。其任务在于评估语言模型在长期记忆、持续学习和信息整合能力方面的表现。

This dataset is a dynamic benchmark system developed to evaluate the performance of conversational AI agents by simulating long-duration user-agent interactions involving multiple interleaved tasks. Findings indicate that large language models (LLMs) perform well in short-term memory tasks, but struggle when the required memory span exceeds their maximum context window size. This dataset includes 33 test cases, which cover 11 scenarios with 3 replicates per scenario, and encompasses varying memory spans of 2,000, 3,200, 120,000, 200,000, and 500,000 tokens. Its core objective is to assess the performance of language models in terms of long-term memory, continual learning, and information integration capabilities.

提供机构：

GoodAI

搜集汇总

数据集介绍

背景与挑战

背景概述

GoodAI-LTM是一个专注于增强语言模型长期记忆能力的Python库，提供文本嵌入、向量存储和对话代理等功能，特别适合社交代理场景。该库支持知识存储、会话记忆管理，并能通过状态持久化实现记忆和配置的保存与恢复。

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集