HELMET

arXiv2025-09-30 收录

下载链接：

https://osf.io/4pwj8/

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集名为HELMET，旨在对长上下文模型在不同应用场景的任务上进行评估，其上下文长度可达到128,000个标记。此外，该数据集还提供了基于模型的评估，优先考虑复杂任务，以便更准确地预测实际应用中的性能表现。其核心任务是评估长上下文模型。

This dataset, named HELMET, is developed to evaluate long-context models on tasks spanning diverse application scenarios, supporting a context length of up to 128,000 tokens. Furthermore, the dataset offers model-based evaluation procedures that prioritize complex tasks, enabling more accurate prediction of real-world application performance. The central objective of this dataset is to evaluate long-context models.

5,000+

优质数据集

54 个

任务类型

进入经典数据集