five

meme-benchmark/MEME

收藏
Hugging Face2026-04-29 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/meme-benchmark/MEME
下载链接
链接失效反馈
官方服务:
资源简介:
MEME(多实体与演化记忆评估)是一个用于评估大型语言模型记忆系统的基准数据集,专注于两个正交维度:实体范围(单实体与多实体)和时间动态(静态与演化)。数据集包含100个评估片段(50个个人生活+50个软件项目),每个片段是一个按时间顺序排列的会话序列,并配有相关的测试问题。六种任务类型包括:精确回忆(ER)、聚合(Agg)、跟踪(Tr)、删除(Del)、级联(Cas)和缺失(Abs),其中级联和缺失任务是该数据集首次引入的,用于测试依赖规则的逻辑一致性和不确定性识别。数据集通过虚构的实体值避免参数知识污染,并提供了三种配置:filler32k(默认设置)、filler128k(高噪声压力测试)和nofiller(仅证据会话)。

MEME (Multi-Entity and Evolving Memory Evaluation) is a benchmark for evaluating LLM memory systems along two orthogonal dimensions: entity scope (single vs. multi-entity) and temporal dynamics (static vs. evolving). The dataset contains 100 evaluation episodes (50 Personal Life + 50 Software Project), each being a chronological sequence of conversational sessions with associated test questions. Six task types are included: Exact Recall (ER), Aggregation (Agg), Tracking (Tr), Deletion (Del), Cascade (Cas), and Absence (Abs), with Cascade and Absence being novel tasks that test logical consistency over time and uncertainty recognition respectively. All entity values are fictitious to prevent parametric-knowledge contamination. Three configurations are provided: filler32k (default benchmark setting), filler128k (stress test under heavy noise), and nofiller (evidence-only sessions).
提供机构:
meme-benchmark
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作