ChemVQA-2K: A Visual Question Answering Dataset for Molecular Understanding
收藏Mendeley Data2026-04-18 收录
下载链接:
https://data.mendeley.com/datasets/8bspthn5fb
下载链接
链接失效反馈官方服务:
资源简介:
🧪 ChemVQA-2K: A Visual Question Answering Dataset for Molecular Understanding
📘 Overview
ChemVQA-2K is a novel Visual Question Answering (VQA) dataset designed to bridge chemistry and multimodal AI.
It contains approximately 2,000 high-resolution molecular images (512×512) generated from valid SMILES strings, accompanied by 10 structured Q&A pairs per molecule, resulting in ~20,000 image-question-answer triplets.
Each image represents a 2D chemical structure rendered using RDKit, while each question tests the model’s ability to reason over molecular features such as formula, atom counts, bonds, functional groups, and polarity.
________________________________________
🧬 Dataset Structure
Component Description
ChemVQA_2K_images.zip 2,000 molecule renderings (mol_0.png, mol_1.png, …)
ChemVQA_2K_full.csv Complete dataset with columns: id, image_name, question, answer
Each record follows:
{
"id": "mol_123",
"image_name": "mol_123.png",
"question": "What is the molecular formula of this molecule?",
"answer": "C6H6O2"
}
________________________________________
🔍 Example Questions
Each molecule has multiple Q&A pairs, e.g.:
Question Example Answer
What is the molecular formula of this molecule? C₂H₅OH
What is the molecular weight? 46.07 g/mol
How many total atoms are present? 9
Which functional groups are present? Alcohol
Is the molecule polar or non-polar? Polar
________________________________________
⚙️ Data Generation Process
• Molecules generated by concatenating random organic fragments and validated using RDKit.
• Each molecule’s image created with Draw.MolToFile() at 512×512 px resolution.
• Functional groups detected via SMARTS pattern matching.
• Q&A pairs auto-generated from chemical descriptors (MolWt, CalcMolFormula, substructure matches).
________________________________________
🚀 Intended Use
ChemVQA-2K is ideal for:
• Fine-tuning Vision-Language Models (VLMs) for scientific visual reasoning.
• Developing chemistry-aware question answering systems.
• Training vision encoders on molecular visual patterns.
• Exploring RL-based visual understanding of chemical structures.
________________________________________
📊 Dataset Statistics
Property Value
Images 1924
Image resolution 512×512 px
Q&A pairs 19240
Functional groups detected 16
File size (approx.) ~25 MB (images + CSVs)
🧠 Potential Research Directions
• Multimodal Chemistry Understanding — connecting visual structure with symbolic reasoning.
• Scientific Vision-Language Pretraining — use as domain-specific VQA benchmark.
• Explainable Chemistry AI — models that describe functional features and molecular properties.
📂 File Organization
ChemVQA-2K/
├── images/
│ ├── mol_0.png
│ ├── mol_1.png
│ └── ...
├── ChemVQA_2K_full.csv
________________________________________
✅ ChemVQA-2K Dataset Benefits the Chemistry Community
Bridges Chemistry and AI Literacy
• Helps chemistry students and researchers learn to interact with AI systems
创建时间:
2025-10-27



