introvoyz041/SUPERChem

Name: introvoyz041/SUPERChem
Creator: introvoyz041
Published: 2025-12-14 08:42:16
License: 暂无描述

Hugging Face2025-12-14 更新2025-12-20 收录

下载链接：

https://hf-mirror.com/datasets/introvoyz041/SUPERChem

下载链接

链接失效反馈

官方服务：

资源简介：

SUPERChem是一个具有挑战性的、由专家策划的多模态基准数据集，旨在严格评估大型语言模型（LLMs）和多模态大型语言模型（MLLMs）的化学推理能力。该数据集包含500个推理密集型问题，每个问题均以多模态和纯文本两种格式提供，支持对模型整合视觉信息能力的严格分析。数据集引入了推理路径保真度（RPF）这一指标，用于评估模型的推理过程与专家解决方案路径的一致性，从而区分真正的理解与幸运猜测。此外，数据集还提供了化学知识和推理技能的细粒度分类，支持对模型在不同子领域中的优势和劣势进行详细诊断。数据集经过严格的人工参与策划过程，确保质量并减少从网络抓取的训练集中数据泄露的风险。

SUPERChem is a challenging, expert-curated multimodal benchmark designed for rigorously evaluating the chemical reasoning capabilities of Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs). It consists of 500 reasoning-intensive problems, each available in both multimodal and text-only formats, enabling a rigorous, controlled analysis of a models ability to integrate visual information. The dataset introduces Reasoning Path Fidelity (RPF), a metric to assess the alignment of a models reasoning with expert-authored solution paths, distinguishing genuine understanding from lucky guesses. Additionally, it provides a systematic categorization of chemical knowledge and reasoning skills, supporting detailed diagnosis of model strengths and weaknesses across various sub-domains. The dataset undergoes a rigorous human-in-the-loop curation process to ensure quality and reduce the risk of data leakage from web-scraped training sets.

提供机构：

introvoyz041

5,000+

优质数据集

54 个

任务类型

进入经典数据集