HELMET
收藏arXiv2025-09-30 收录
下载链接:
https://osf.io/4pwj8/
下载链接
链接失效反馈官方服务:
资源简介:
该数据集名为HELMET,旨在对长上下文模型在不同应用场景的任务上进行评估,其上下文长度可达到128,000个标记。此外,该数据集还提供了基于模型的评估,优先考虑复杂任务,以便更准确地预测实际应用中的性能表现。其核心任务是评估长上下文模型。
This dataset, named HELMET, is developed to evaluate long-context models on tasks spanning diverse application scenarios, supporting a context length of up to 128,000 tokens. Furthermore, the dataset offers model-based evaluation procedures that prioritize complex tasks, enabling more accurate prediction of real-world application performance. The central objective of this dataset is to evaluate long-context models.



