introvoyz041/SUPERChem
收藏Hugging Face2025-12-14 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/introvoyz041/SUPERChem
下载链接
链接失效反馈官方服务:
资源简介:
SUPERChem是一个具有挑战性的、由专家策划的多模态基准数据集,旨在严格评估大型语言模型(LLMs)和多模态大型语言模型(MLLMs)的化学推理能力。该数据集包含500个推理密集型问题,每个问题均以多模态和纯文本两种格式提供,支持对模型整合视觉信息能力的严格分析。数据集引入了推理路径保真度(RPF)这一指标,用于评估模型的推理过程与专家解决方案路径的一致性,从而区分真正的理解与幸运猜测。此外,数据集还提供了化学知识和推理技能的细粒度分类,支持对模型在不同子领域中的优势和劣势进行详细诊断。数据集经过严格的人工参与策划过程,确保质量并减少从网络抓取的训练集中数据泄露的风险。
SUPERChem is a challenging, expert-curated multimodal benchmark designed for rigorously evaluating the chemical reasoning capabilities of Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs). It consists of 500 reasoning-intensive problems, each available in both multimodal and text-only formats, enabling a rigorous, controlled analysis of a models ability to integrate visual information. The dataset introduces Reasoning Path Fidelity (RPF), a metric to assess the alignment of a models reasoning with expert-authored solution paths, distinguishing genuine understanding from lucky guesses. Additionally, it provides a systematic categorization of chemical knowledge and reasoning skills, supporting detailed diagnosis of model strengths and weaknesses across various sub-domains. The dataset undergoes a rigorous human-in-the-loop curation process to ensure quality and reduce the risk of data leakage from web-scraped training sets.
提供机构:
introvoyz041



