chenzizhao/HELMET
收藏Hugging Face2026-04-13 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/chenzizhao/HELMET
下载链接
链接失效反馈官方服务:
资源简介:
---
configs:
- config_name: alce_asqa
data_files:
- split: test
path: "alce/asqa_eval_gtr_top2000.json"
- config_name: alce_qampari
data_files:
- split: test
path: "alce/qampari_eval_gtr_top2000.json"
- config_name: infbench_longbook
data_files:
- split: test
path: "infbench/longbook_sum_eng_keypoints.jsonl"
- config_name: json_kv
data_files:
- split: test
path: "json_kv/test_*"
- config_name: kilt_hotpotqa
data_files:
- split: dev
path: "kilt/hotpotqa-dev-*"
- split: train
path: "kilt/hotpotqa-train-*"
- config_name: kilt_nq
data_files:
- split: dev
path: "kilt/nq-dev-*"
- split: train
path: "kilt/nq-train-*"
- config_name: kilt_popqa
data_files:
- split: test
path: "kilt/popqa_test_*"
- config_name: kilt_triviaqa
data_files:
- split: dev
path: "kilt/triviaqa-dev-*"
- split: train
path: "kilt/triviaqa-train-*"
- config_name: msmarco
data_files:
- split: test
path: "msmarco/test_reranking_data_*"
- config_name: multi_lexsum
data_files:
- split: test
path: "multi_lexsum/multi_lexsum_val.jsonl"
- config_name: ruler_cwe
data_files:
- split: test
path: "ruler/cwe/*"
- config_name: ruler_fwe
data_files:
- split: test
path: "ruler/fwe/*"
- config_name: ruler_multikey_1
data_files:
- split: test
path: "ruler/niah_multikey_1/validation_*"
- config_name: ruler_multikey_2
data_files:
- split: test
path: "ruler/niah_multikey_2/validation_*"
- config_name: ruler_multikey_3
data_files:
- split: test
path: "ruler/niah_multikey_3/validation_*"
- config_name: ruler_multiquery
data_files:
- split: test
path: "ruler/niah_multiquery/validation_*"
- config_name: ruler_multivalue
data_files:
- split: test
path: "ruler/niah_multivalue/validation_*"
- config_name: ruler_single_1
data_files:
- split: test
path: "ruler/niah_single_1/validation_*"
- config_name: ruler_single_2
data_files:
- split: test
path: "ruler/niah_single_2/validation_*"
- config_name: ruler_single_3
data_files:
- split: test
path: "ruler/niah_single_3/validation_*"
- config_name: ruler_qa1
data_files:
- split: test
path: "ruler/qa_1/validation_*"
- config_name: ruler_qa2
data_files:
- split: test
path: "ruler/qa_2/validation_*"
- config_name: ruler_vt
data_files:
- split: test
path: "ruler/vt/validation_*"
license: mit
language:
- en
---
# HELMET: How to Evaluate Long-context Language Models Effectively and Thoroughly
[[Paper](https://arxiv.org/abs/2410.02694)][[Code](https://github.com/princeton-nlp/HELMET)]
HELMET is a comprehensive benchmark for long-context language models covering seven diverse categories of tasks.
The datasets are application-centric and are designed to evaluate models at different lengths and levels of complexity.
Please check out the paper for more details, and the code repo for how to process the data and run the evaluations
提供机构:
chenzizhao



