Counting-Stars
收藏arXiv2024-05-18 更新2024-06-21 收录
下载链接:
https://github.com/nick7nlp/Counting-Stars
下载链接
链接失效反馈官方服务:
资源简介:
Counting-Stars是一个专为评估长上下文大型语言模型(LLMs)而设计的多证据、位置感知和可扩展的基准。该数据集通过两个任务:多证据获取和多证据推理,来评估LLMs在长上下文环境下的表现。数据集使用随机生成的数字来模拟小企鹅数星星的场景,旨在测试模型在长文本中定位和处理信息的能力。该数据集适用于研究LLMs在处理复杂和多样应用时的效能,特别是在多文档问答和代码理解等领域。
Counting-Stars is a multi-evidence, position-aware, and scalable benchmark specifically designed for evaluating long-context large language models (LLMs). This benchmark evaluates the performance of LLMs in long-context scenarios through two tasks: multi-evidence acquisition and multi-evidence reasoning. The dataset uses randomly generated numbers to simulate the scenario of little penguins counting stars, aiming to test models' ability to locate and process information in long texts. This benchmark is suitable for researching the effectiveness of LLMs when handling complex and diverse applications, particularly in fields such as multi-document question answering and code understanding.
提供机构:
腾讯MLPD
创建时间:
2024-03-18



