Mohammadta/BEAM-10M
收藏Hugging Face2025-11-11 更新2025-10-25 收录
下载链接:
https://hf-mirror.com/datasets/Mohammadta/BEAM-10M
下载链接
链接失效反馈官方服务:
资源简介:
BEAM是一个用于评估语言模型长期记忆的综合数据集。它包含不同规模的对话(128K、500K、1M和10M令牌),涵盖了包括一般、编码和数学在内的广泛领域。每个对话都包括种子信息、叙述、对话计划、用户问题、聊天数据、探索性问题、用户配置文件和计划。10M对话具有独特的结构,包括多个计划,每个计划都有特定的主题信息、叙述、用户配置文件、对话计划和聊天数据。数据集还包含10种不同类型的探索性问题,用于评估模型的能力。
BEAM is a comprehensive dataset for evaluating long-term memory in language models. It contains multi-scale conversations (128K, 500K, 1M, and 10M tokens) with diverse domains including general, coding, and math topics. Each conversation includes seed information, narratives, conversation plan, user questions, chat data, probing questions, and user profile. The 10M conversations have a unique structure with multiple plans, each with specific topic information, narratives, user profile, conversation plan, and chat data. The dataset also includes 10 different types of probing questions for evaluation.
提供机构:
Mohammadta



