hle-gold-bio-chem
收藏魔搭社区2026-01-06 更新2025-07-26 收录
下载链接:
https://modelscope.cn/datasets/futurehouse/hle-gold-bio-chem
下载链接
链接失效反馈官方服务:
资源简介:
# Humanity's Last Exam (HLE) Bio/Chem Gold
[Humanity’s Last Exam (HLE)](https://lastexam.ai/paper) is a challenging question-answering AI benchmark covering advanced academic fields including Math, Physics, Chemistry, Biology, Engineering, and Computer Science.
At [FutureHouse](https://www.futurehouse.org/), we audited the biology and chemistry subsets of HLE using a combination of expert human evaluators and our in-house research agent, and found that around 30% of the questions contain answers directly contradicted by peer-reviewed literature.
To address this, we created HLE Bio/Chem Gold, a validated subset of HLE biology and chemistry scientifically sound questions, to support more reliable benchmarking in these disciplines.
For more information about our analysis, see our 📄 [blog](https://www.futurehouse.org/research-announcements/hle-exam).
#### Load the dataset:
```python
from datasets import load_dataset
dataset = load_dataset("futurehouse/hle-gold-bio-chem", split="train")
```
# 人类终极考试(HLE)生物/化学黄金数据集
[人类终极考试(Humanity’s Last Exam, HLE)基准测试(benchmark)](https://lastexam.ai/paper)是一款覆盖数学、物理、化学、生物、工程与计算机科学等高端学术领域的高难度问答式AI评测基准。
在FutureHouse(https://www.futurehouse.org/)开展工作期间,我们联合专业人类评估师与自研AI智能体(AI Agent)对HLE的生物与化学子集进行了审核评估,结果显示约30%的试题答案与同行评审文献(peer-reviewed literature)的结论直接相悖。
为解决这一问题,我们构建了HLE生物/化学黄金数据集(HLE Bio/Chem Gold)——该数据集是HLE生物与化学子集中经过科学验证的严谨试题集合,旨在为相关学科的基准测试提供更可靠的支撑。
如需了解更多分析细节,请参阅我们的📄博客文章(https://www.futurehouse.org/research-announcements/hle-exam)。
#### 数据集加载方法:
python
from datasets import load_dataset
dataset = load_dataset("futurehouse/hle-gold-bio-chem", split="train")
提供机构:
maas
创建时间:
2025-07-24
搜集汇总
数据集介绍

以上内容由遇见数据集搜集并总结生成



