five

lmms-lab/HLE-Verified

收藏
Hugging Face2026-02-28 更新2026-04-05 收录
下载链接:
https://hf-mirror.com/datasets/lmms-lab/HLE-Verified
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-4.0 source_datasets: - skylenage/HLE-Verified tags: - benchmark - evaluation - HLE --- # HLE-Verified (HF-native JSONL) This dataset is a **lightweight, evaluation-ready reformatting** of the [HLE-Verified](https://huggingface.co/datasets/skylenage/HLE-Verified) benchmark created by the [Skylenage Team](https://huggingface.co/skylenage). > **Original work:** Weiqi Zhai et al., *"HLE-Verified: A Systematic Verification and Structured Revision of Humanity's Last Exam"* ([arXiv:2602.13964](https://arxiv.org/abs/2602.13964)) > > **Original dataset:** [`skylenage/HLE-Verified`](https://huggingface.co/datasets/skylenage/HLE-Verified) > > **Original repository:** [`SKYLENAGE-AI/HLE-Verified`](https://github.com/SKYLENAGE-AI/HLE-Verified) ## Source & Snapshot Converted from [`skylenage/HLE-Verified`](https://huggingface.co/datasets/skylenage/HLE-Verified) snapshot `becad9f339dfce27df0ebb38e55dabef12ca5735`. ## Modifications The following changes were made from the original dataset: - **Format conversion:** Raw data converted to HF-native JSONL splits - **Image payloads removed:** Heavy raw image data dropped; text-eval fields retained for `hle_verified` task - **Split restructuring:** Data organized into the following splits: | Split | Count | Description | |-------|-------|-------------| | `full` | 2,500 | All items | | `test` | 1,811 | Gold + Revision (for evaluation) | | `gold` | 668 | Fully validated items | | `revision` | 1,143 | Revised and re-verified items | | `uncertain` | 689 | Items with inconclusive validity | No questions, answers, rationales, or verification metadata were altered. ## License This dataset is distributed under [CC-BY 4.0](https://creativecommons.org/licenses/by/4.0/), consistent with the original release. ## Citation If you use this dataset, please cite the original HLE-Verified paper and the original HLE benchmark: ```bibtex @misc{zhai2026hleverified, title={HLE-Verified: A Systematic Verification and Structured Revision of Humanity's Last Exam}, author={Weiqi Zhai and Zhihai Wang and Jinghang Wang and Boyu Yang and Xiaogang Li and Xiang Xu and Bohan Wang and Peng Wang and Xingzhe Wu and Anfeng Li and Qiyuan Feng and Yuhao Zhou and Shoulin Han and Wenjie Luo and Yiyuan Li and Yaxuan Wang and Ruixian Luo and Guojie Lin and Peiyao Xiao and Chengliang Xu and Ben Wang and Zeyu Wang and Zichao Chen and Jianan Ye and Yijie Hu and Jialong Chen and Zongwen Shen and Yuliang Xu and An Yang and Bowen Yu and Dayiheng Liu and Junyang Lin and Hu Wei and Que Shen and Bing Zhao}, year={2026}, eprint={2602.13964}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2602.13964}, } ``` ```bibtex @article{phan2025humanitysexam, title={A benchmark of expert-level academic questions to assess {AI} capabilities}, author={{Center for AI Safety} and {Scale AI} and {HLE Contributors Consortium}}, journal={Nature}, volume={649}, pages={1139--1146}, year={2026}, doi={10.1038/s41586-025-09962-4}, } ```
提供机构:
lmms-lab
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作