ether0-benchmark
收藏魔搭社区2025-12-05 更新2025-06-14 收录
下载链接:
https://modelscope.cn/datasets/futurehouse/ether0-benchmark
下载链接
链接失效反馈官方服务:
资源简介:
[](https://huggingface.co/futurehouse/ether0)

# ether0-benchmark
QA benchmark (test set) for the ether0 reasoning language model:
https://huggingface.co/futurehouse/ether0
This benchmark is made from commonly used tasks - like reaction prediction in USPTO/ORD,
molecular captioning from PubChem, or predicting GHS classification.
It's unique from other benchmarks in that all answers are a molecule.
It's balanced so that each task is about 25 questions,
a reasonable amount for frontier model evaluations.
The tasks generally follow previously reported numbers -
e.g., a reaction prediction accuracy of 80% here would be approximately equivalent
to performance on a withheld split of the USPTO-50k dataset.
See our preprint [here](https://arxiv.org/abs/2506.17238) for more details on dataset construction and reward functions.
The tasks in this test set include:
- Completing SMILES fragments
- Designing molecules adhering to molecular formula and functional group constraints
- Predicting reaction outcomes
- Proposing one-step synthesis pathways
- Editing the solubility of a molecule
- Converting IUPAC name to SMILES
- Answering multiple-choice questions about safety, ADME properties, BBB permeability, toxicity, scent, and pKa
We have measured the performance of ether0 and several frontier LLMs (at time of writing) on this benchmark:

## Licensing
This dataset repository is CC BY 4.0, copyright 2025 FutureHouse.
## Code
Reward functions and problem templates used for this dataset can be found at
https://github.com/Future-House/ether0.
[](https://huggingface.co/futurehouse/ether0)

# ether0基准测试集
专为ether0推理型大语言模型打造的问答基准测试集(测试集):https://huggingface.co/futurehouse/ether0
该基准测试集由多项通用任务构建而成,涵盖USPTO/ORD数据库中的反应预测、PubChem的分子标题生成,以及GHS(全球化学品统一分类和标签制度)分类预测等任务。本数据集与其他基准测试集的独特区别在于,所有问题的答案均为分子结构。数据集经过均衡设计,每项任务约包含25个问题,这一规模对于前沿大语言模型的评估而言十分合理。
各项任务的性能基准基本沿用已发表的参考数值——例如,本基准中反应预测准确率达到80%,大致等同于在USPTO-50k数据集的预留划分集上的表现。
如需了解数据集构建与奖励函数的更多细节,请查阅我们的预印本[此处](https://arxiv.org/abs/2506.17238)。本测试集包含以下任务:
- 补全SMILES(简化分子线性输入规范)片段
- 设计符合分子式与官能团约束条件的分子
- 预测化学反应产物
- 提出单步合成路径
- 修改分子的溶解度属性
- 将IUPAC(国际纯粹与应用化学联合会)命名转换为SMILES格式
- 回答关于安全性、ADME(吸收、分布、代谢、排泄)性质、BBB(血脑屏障)通透性、毒性、气味以及pKa(酸解离常数)的多项选择题
我们已在该基准测试集上评估了ether0以及截至撰写本文时的多款前沿大语言模型的性能:

# 许可协议
本数据集仓库采用CC BY 4.0许可协议,版权归2025 FutureHouse所有。
# 代码
本数据集所使用的奖励函数与问题模板可从https://github.com/Future-House/ether0获取。
提供机构:
maas
创建时间:
2025-06-11



