ether0-benchmark

Name: ether0-benchmark
Creator: maas
Published: 2025-12-05 16:38:04
License: 暂无描述

魔搭社区2025-12-05 更新2025-06-14 收录

下载链接：

https://modelscope.cn/datasets/futurehouse/ether0-benchmark

下载链接

链接失效反馈

官方服务：

资源简介：

[![Model on HF](https://huggingface.co/datasets/huggingface/badges/resolve/main/model-on-hf-md-dark.svg)](https://huggingface.co/futurehouse/ether0) ![ether0 logo](images/ether0_logo.svg) # ether0-benchmark QA benchmark (test set) for the ether0 reasoning language model: https://huggingface.co/futurehouse/ether0 This benchmark is made from commonly used tasks - like reaction prediction in USPTO/ORD, molecular captioning from PubChem, or predicting GHS classification. It's unique from other benchmarks in that all answers are a molecule. It's balanced so that each task is about 25 questions, a reasonable amount for frontier model evaluations. The tasks generally follow previously reported numbers - e.g., a reaction prediction accuracy of 80% here would be approximately equivalent to performance on a withheld split of the USPTO-50k dataset. See our preprint [here](https://arxiv.org/abs/2506.17238) for more details on dataset construction and reward functions. The tasks in this test set include: - Completing SMILES fragments - Designing molecules adhering to molecular formula and functional group constraints - Predicting reaction outcomes - Proposing one-step synthesis pathways - Editing the solubility of a molecule - Converting IUPAC name to SMILES - Answering multiple-choice questions about safety, ADME properties, BBB permeability, toxicity, scent, and pKa We have measured the performance of ether0 and several frontier LLMs (at time of writing) on this benchmark: ![ether0 benchmarking](images/ether0_benchmark.png) ## Licensing This dataset repository is CC BY 4.0, copyright 2025 FutureHouse. ## Code Reward functions and problem templates used for this dataset can be found at https://github.com/Future-House/ether0.

[![Hugging Face 上的模型](https://huggingface.co/datasets/huggingface/badges/resolve/main/model-on-hf-md-dark.svg)](https://huggingface.co/futurehouse/ether0) ![ether0 标志](images/ether0_logo.svg) # ether0基准测试集专为ether0推理型大语言模型打造的问答基准测试集（测试集）：https://huggingface.co/futurehouse/ether0 该基准测试集由多项通用任务构建而成，涵盖USPTO/ORD数据库中的反应预测、PubChem的分子标题生成，以及GHS（全球化学品统一分类和标签制度）分类预测等任务。本数据集与其他基准测试集的独特区别在于，所有问题的答案均为分子结构。数据集经过均衡设计，每项任务约包含25个问题，这一规模对于前沿大语言模型的评估而言十分合理。各项任务的性能基准基本沿用已发表的参考数值——例如，本基准中反应预测准确率达到80%，大致等同于在USPTO-50k数据集的预留划分集上的表现。如需了解数据集构建与奖励函数的更多细节，请查阅我们的预印本[此处](https://arxiv.org/abs/2506.17238)。本测试集包含以下任务： - 补全SMILES（简化分子线性输入规范）片段 - 设计符合分子式与官能团约束条件的分子 - 预测化学反应产物 - 提出单步合成路径 - 修改分子的溶解度属性 - 将IUPAC（国际纯粹与应用化学联合会）命名转换为SMILES格式 - 回答关于安全性、ADME（吸收、分布、代谢、排泄）性质、BBB（血脑屏障）通透性、毒性、气味以及pKa（酸解离常数）的多项选择题我们已在该基准测试集上评估了ether0以及截至撰写本文时的多款前沿大语言模型的性能： ![ether0基准测试结果](images/ether0_benchmark.png) # 许可协议本数据集仓库采用CC BY 4.0许可协议，版权归2025 FutureHouse所有。 # 代码本数据集所使用的奖励函数与问题模板可从https://github.com/Future-House/ether0获取。

提供机构：

maas

创建时间：

2025-06-11

5,000+

优质数据集

54 个

任务类型

进入经典数据集