ITMO-NSS/ChemPaperBench
收藏Hugging Face2025-09-27 更新2025-11-01 收录
下载链接:
https://hf-mirror.com/datasets/ITMO-NSS/ChemPaperBench
下载链接
链接失效反馈官方服务:
资源简介:
ChemPaperBench是一个为AI准备的基准测试,旨在评估大型语言模型和多方代理系统在化学领域进行基于文献的多步骤推理的能力。它包含了来自9个化学子领域的376个由专家验证的问题,这些问题是从真实的科学出版物中提取的,并附有上下文证据(文本、图像、表格)和复杂度等级。该数据集还提供了GPT-5、Gemini 2.5 Pro、DeepSeek-V3.1-Terminus、Llama 4 Maverick、ChemToolAgent和一个自定义的MAS-RAG系统的评估结果。
ChemPaperBench is an AI-ready benchmark designed to evaluate how well LLMs and multi-agent systems perform literature-grounded, multi-step reasoning in chemistry. It features 376 expert-validated questions derived from real scientific publications across 9 chemical sub-disciplines, annotated with contextual evidence (text, images, tables) and complexity levels. Includes evaluation results for GPT-5, Gemini 2.5 Pro, DeepSeek-V3.1-Terminus, Llama 4 Maverick, ChemToolAgent and a custom MAS-RAG system.
提供机构:
ITMO-NSS



