five

ether0-benchmark

收藏
魔搭社区2025-12-05 更新2025-06-14 收录
下载链接:
https://modelscope.cn/datasets/futurehouse/ether0-benchmark
下载链接
链接失效反馈
官方服务:
资源简介:
[![Model on HF](https://huggingface.co/datasets/huggingface/badges/resolve/main/model-on-hf-md-dark.svg)](https://huggingface.co/futurehouse/ether0) ![ether0 logo](images/ether0_logo.svg) # ether0-benchmark QA benchmark (test set) for the ether0 reasoning language model: https://huggingface.co/futurehouse/ether0 This benchmark is made from commonly used tasks - like reaction prediction in USPTO/ORD, molecular captioning from PubChem, or predicting GHS classification. It's unique from other benchmarks in that all answers are a molecule. It's balanced so that each task is about 25 questions, a reasonable amount for frontier model evaluations. The tasks generally follow previously reported numbers - e.g., a reaction prediction accuracy of 80% here would be approximately equivalent to performance on a withheld split of the USPTO-50k dataset. See our preprint [here](https://arxiv.org/abs/2506.17238) for more details on dataset construction and reward functions. The tasks in this test set include: - Completing SMILES fragments - Designing molecules adhering to molecular formula and functional group constraints - Predicting reaction outcomes - Proposing one-step synthesis pathways - Editing the solubility of a molecule - Converting IUPAC name to SMILES - Answering multiple-choice questions about safety, ADME properties, BBB permeability, toxicity, scent, and pKa We have measured the performance of ether0 and several frontier LLMs (at time of writing) on this benchmark: ![ether0 benchmarking](images/ether0_benchmark.png) ## Licensing This dataset repository is CC BY 4.0, copyright 2025 FutureHouse. ## Code Reward functions and problem templates used for this dataset can be found at https://github.com/Future-House/ether0.

[![Hugging Face 上的模型](https://huggingface.co/datasets/huggingface/badges/resolve/main/model-on-hf-md-dark.svg)](https://huggingface.co/futurehouse/ether0) ![ether0 标志](images/ether0_logo.svg) # ether0基准测试集 专为ether0推理型大语言模型打造的问答基准测试集(测试集):https://huggingface.co/futurehouse/ether0 该基准测试集由多项通用任务构建而成,涵盖USPTO/ORD数据库中的反应预测、PubChem的分子标题生成,以及GHS(全球化学品统一分类和标签制度)分类预测等任务。本数据集与其他基准测试集的独特区别在于,所有问题的答案均为分子结构。数据集经过均衡设计,每项任务约包含25个问题,这一规模对于前沿大语言模型的评估而言十分合理。 各项任务的性能基准基本沿用已发表的参考数值——例如,本基准中反应预测准确率达到80%,大致等同于在USPTO-50k数据集的预留划分集上的表现。 如需了解数据集构建与奖励函数的更多细节,请查阅我们的预印本[此处](https://arxiv.org/abs/2506.17238)。本测试集包含以下任务: - 补全SMILES(简化分子线性输入规范)片段 - 设计符合分子式与官能团约束条件的分子 - 预测化学反应产物 - 提出单步合成路径 - 修改分子的溶解度属性 - 将IUPAC(国际纯粹与应用化学联合会)命名转换为SMILES格式 - 回答关于安全性、ADME(吸收、分布、代谢、排泄)性质、BBB(血脑屏障)通透性、毒性、气味以及pKa(酸解离常数)的多项选择题 我们已在该基准测试集上评估了ether0以及截至撰写本文时的多款前沿大语言模型的性能: ![ether0基准测试结果](images/ether0_benchmark.png) # 许可协议 本数据集仓库采用CC BY 4.0许可协议,版权归2025 FutureHouse所有。 # 代码 本数据集所使用的奖励函数与问题模板可从https://github.com/Future-House/ether0获取。
提供机构:
maas
创建时间:
2025-06-11
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作