five

chenzizhao/HELMET

收藏
Hugging Face2026-04-13 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/chenzizhao/HELMET
下载链接
链接失效反馈
官方服务:
资源简介:
--- configs: - config_name: alce_asqa data_files: - split: test path: "alce/asqa_eval_gtr_top2000.json" - config_name: alce_qampari data_files: - split: test path: "alce/qampari_eval_gtr_top2000.json" - config_name: infbench_longbook data_files: - split: test path: "infbench/longbook_sum_eng_keypoints.jsonl" - config_name: json_kv data_files: - split: test path: "json_kv/test_*" - config_name: kilt_hotpotqa data_files: - split: dev path: "kilt/hotpotqa-dev-*" - split: train path: "kilt/hotpotqa-train-*" - config_name: kilt_nq data_files: - split: dev path: "kilt/nq-dev-*" - split: train path: "kilt/nq-train-*" - config_name: kilt_popqa data_files: - split: test path: "kilt/popqa_test_*" - config_name: kilt_triviaqa data_files: - split: dev path: "kilt/triviaqa-dev-*" - split: train path: "kilt/triviaqa-train-*" - config_name: msmarco data_files: - split: test path: "msmarco/test_reranking_data_*" - config_name: multi_lexsum data_files: - split: test path: "multi_lexsum/multi_lexsum_val.jsonl" - config_name: ruler_cwe data_files: - split: test path: "ruler/cwe/*" - config_name: ruler_fwe data_files: - split: test path: "ruler/fwe/*" - config_name: ruler_multikey_1 data_files: - split: test path: "ruler/niah_multikey_1/validation_*" - config_name: ruler_multikey_2 data_files: - split: test path: "ruler/niah_multikey_2/validation_*" - config_name: ruler_multikey_3 data_files: - split: test path: "ruler/niah_multikey_3/validation_*" - config_name: ruler_multiquery data_files: - split: test path: "ruler/niah_multiquery/validation_*" - config_name: ruler_multivalue data_files: - split: test path: "ruler/niah_multivalue/validation_*" - config_name: ruler_single_1 data_files: - split: test path: "ruler/niah_single_1/validation_*" - config_name: ruler_single_2 data_files: - split: test path: "ruler/niah_single_2/validation_*" - config_name: ruler_single_3 data_files: - split: test path: "ruler/niah_single_3/validation_*" - config_name: ruler_qa1 data_files: - split: test path: "ruler/qa_1/validation_*" - config_name: ruler_qa2 data_files: - split: test path: "ruler/qa_2/validation_*" - config_name: ruler_vt data_files: - split: test path: "ruler/vt/validation_*" license: mit language: - en --- # HELMET: How to Evaluate Long-context Language Models Effectively and Thoroughly [[Paper](https://arxiv.org/abs/2410.02694)][[Code](https://github.com/princeton-nlp/HELMET)] HELMET is a comprehensive benchmark for long-context language models covering seven diverse categories of tasks. The datasets are application-centric and are designed to evaluate models at different lengths and levels of complexity. Please check out the paper for more details, and the code repo for how to process the data and run the evaluations
提供机构:
chenzizhao
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作