Eternity-gaga/SymbolBench
收藏Hugging Face2026-04-08 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/Eternity-gaga/SymbolBench
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
task_categories:
- image-to-text
language:
- en
- zh
tags:
- multimodal
- STEM
- symbol
size_categories:
- 10K<n<100K
---
# Cognitive Mismatch in Multimodal Large Language Models for Discrete Symbol Understanding
<div align="center">
[](https://arxiv.org/abs/2603.18472)
[](https://huggingface.co/datasets/Eternity-gaga/SymbolBench)
[](https://github.com/THUKElab/SymbolBench)
</div>
This directory contains all benchmark data for the paper: *Cognitive Mismatch in Multimodal Large Language Models for Discrete Symbol Understanding*. The dataset spans **five domains**, each with a dedicated subdirectory.
---
## Directory Structure
```
data/
├── language/ # Language domain
│ ├── multimodal_dataset_new.json
│ └── images/ # Handwritten essay image files (*.jpg) — not included
│
├── chemistry/ # Chemistry domain
│ ├── symbol_chemistry.json
│ └── chemistry_images/ # Molecular structure images (*.png) — included
│
├── physics/ # Physics domain
│ ├── filtered_physics_final.json
│ └── images/ # Diagram images (*.png) — not included
│
├── mathematics/ # Mathematics domain
│ └── math_data/
│ ├── adjusted_multimath_symbol.json
│ ├── filter_mathvista.json
│ ├── images_multimath.json
│ └── images_mathvista/ # MathVista images (*.jpg) — included
│
└── culture/ # Emoji / Cultural domain
└── data/
├── Chinese_idiom_4/
│ ├── chengyu.xlsx # Answer metadata
│ └── images/ # Emoji images (*.jpg) — included
├── Chinese_idiom_multi/
│ ├── chengyu_fei4.xlsx
│ └── images/ # Emoji images (*.jpg)
├── English_idiom/
│ ├── idiom.xlsx
│ └── images/ # Emoji images (*.jpg)
└── English_word/
├── word.xlsx
└── images/ # Emoji images (*.jpg)
```
---
## Domain 1 — Language
**Source:** Scanned Chinese elementary-school handwritten compositions. Each sample is a single sentence extracted from an essay image, annotated at three task levels of increasing difficulty.
**Sample distribution by Task / Level:**
| Task Type | Level | Count | Description |
|-----------|-------|-------|-------------|
| 1 | 1 | 526 | **Unrecognizable character detection** — Output the sentence verbatim; replace unrecognizable characters with `X`, keep genuinely unknown characters as `U` |
| 2 | 2 | 488 | **Miswritten character detection** — Identify characters that are recognizable but incorrectly written; output a structured diff list `{"n": N, "diffs": [{"idx": i, "src": "原字", "tgt": "正字"}, ...]}` |
| 3 | 3 | 824 | **Sentence correction** — Output the fully corrected sentence with all errors fixed |
**JSON schema:**
```json
{
"image_id": "181_0001.jpg",
"level": "1",
"task_type": "1",
"question_zh": "任务1:错字检测。...",
"question_en": "Task 1: Wrong-character detection. ...",
"answer": "一条蚯蚓X好地帮XUUX感X地说:..."
}
```
**Evaluation metrics:**
| Task | Metric |
|------|--------|
| Task 1 | Character-level F1 on `X` positions (Precision / Recall / F1) |
| Task 2 | Token-pair F1 on `(idx, src)` sets |
| Task 3 | Exact Match (EM) after punctuation normalization; Edit Distance |
---
## Domain 2 — Chemistry
**Source:** Chemical molecule structures rendered from SMILES strings using RDKit. The dataset tests whether models can parse structural chemical diagrams at increasing levels of complexity.
**Sample distribution by Level and Task Type:**
| Level | Task Type | Count | Task Description |
|-------|-----------|-------|-----------------|
| 1 | 1 | 481 | Atom identification and counting |
| 1 | 2 | 200 | Chemical bond type identification and counting |
| 2 | 3 | 1,000 | Chemical reaction type classification |
| 2 | 4 | 1,000 | Chemical reaction coefficient verification / correction |
| 3 | 5 | 191 | Chemical symbol / equation error detection |
| 3 | 6 | 300 | Reaction condition reading (multiple-choice, from literature figures) |
| 3 | 7 | 202 | Reaction duration and temperature reading (multiple-choice) |
| 3 | 8 | 300 | Reaction yield estimation (multiple-choice) |
**JSON schema:**
```json
{
"image_id": "che_l1_t1_1040.png",
"smiles": "CC(=O)Oc1cccc(C)c1",
"question_zh": "图中的化学分子有哪些原子,对应的数量是多少",
"question_en": "What atoms are present in the chemical molecule and how many of each?",
"answer_zh": {"C": 9, "O": 2, "H": 10},
"answer_en": {"C": 9, "O": 2, "H": 10},
"level": "1",
"task_type": "1"
}
```
**Notes:**
- The `smiles` field provides the canonical SMILES string for ground-truth verification.
- Image naming convention: `che_l{level}_t{task_type}_{id}.png`
- Task types 6–8 use multiple-choice answers (single letter A/B/C/D); the remaining task types use structured or free-form answers.
---
## Domain 3 — Physics
**Source:** Filtered and reannotated from benchmark MMMU-pro, Olympaidbench, and GAOKAO-bench, retaining samples that require reading and interpreting physics / engineering **symbolic diagrams** (circuit diagrams, force diagrams, thermodynamic charts, etc.).
**Task type descriptions:**
| Task Type | Level | Description |
|-----------|-------|-------------|
| 1 | level1 | Chinese physics calculation / derivation problems with diagrams |
| 2 | level1 | English free-response problems requiring diagram reading |
| 3 | level2 | Chinese problems requiring multi-step reasoning from diagrams |
| 4 | level2 | Multiple-choice (A/B/C/D) from MMMU |
| 5 | level3 | Physics / circuit diagram error detection |
| 6 | level3 | Multiple-choice requiring symbolic diagram interpretation |
**JSON schema:**
```json
{
"id": "MMMU_1_test_Physics_103",
"question_en": "[Question] <image 1> A wire moves with velocity v ... \n[A] into the page\n[B] ...",
"subject": "Physics",
"answer": "A",
"task_type": "6",
"level": "level3",
"image_1": "MMMU_1_test_Physics_103_image_1.png"
}
```
---
## Domain 4 — Mathematics
**Source:** Curated from Chinese middle / high school math exercises spanning geometry and function topics, with multi-step solution annotations. All answers are formatted in LaTeX `\boxed{}` notation.
**Level distribution:**
| Level | Count | Description |
|-------|-------|-------------|
| 1 | 639 | Basic symbolic reading (graph/table reading, simple computation) |
| 2 | 1,524 | Multi-step reasoning from symbolic diagrams |
| 3 | 436 | Error detection, definition verification, advanced proof |
**JSON schema:**
```json
{
"image_id": "6fb1abf7f9c72c67be68625a0e7d19a0.png",
"data_type": "geometry",
"question_type": "填空",
"level": "2",
"task_type": "9",
"QA_pair": [
{
"question_zh": "则根据题意可列出方程为________ _.",
"condition_zh": "如图,在一块长为22米、宽为17米的矩形地面上,...",
"answer_zh": "(22-x)(17-x)=300",
"question_en": "According to the meaning of the question, the equation can be set up as ____.",
"condition_en": "As shown in the figure, on a rectangular ground with length 22 m ...",
"solution_zh": "Step 1 (设变量): ...\nAnswer: \\boxed{(22-x)(17-x)=300}",
"solution_en": "Step 1 (Set the variable): ...\nAnswer: \\boxed{(22-x)(17-x)=300}"
}
]
}
```
**Key fields:**
| Field | Description |
|-------|-------------|
| `data_type` | `geometry` or `function` |
| `question_type` | `填空` (fill-in) / `解答` (open-ended) / `选择` (multiple-choice) / `证明` (proof) / `判断` (true-false) |
| `condition_zh/en` | Background information or diagram description for the question |
| `answer_zh` | Ground-truth answer, typically wrapped in `\boxed{}` |
| `solution_zh/en` | Step-by-step solution with final `\boxed{}` answer |
**Evaluation notes:**
- Rule-based evaluation extracts `\boxed{...}` content from both prediction and answer for exact-match comparison.
- LLM-as-Judge evaluation checks mathematical equivalence (ignoring formatting and units).
---
## Domain 5 — Cultural
**Source:** Purpose-built dataset testing cross-lingual, cross-cultural symbolic understanding. Each image is a composite of emoji characters that together encode a Chinese idiom or English idiom/word. The task requires recognizing the symbolic meaning of the emoji sequence.
**Sub-dataset overview:**
| Sub-dataset | Metadata file | Image dir | Task description |
|-------------|--------------|-----------|----------|-----------------|
| `Chinese_idiom_4` | `chengyu.xlsx` | `chengyu_4/images` | Predict 4-character Chinese idiom (成语) from emoji composite image |
| `Chinese_idiom_multi` | `chengyu_fei4.xlsx` | `chengyu_fei4/images` | Predict variable-length Chinese idiom from emoji image |
| `English_idiom` | `idiom.xlsx` | `idiom/images` | Predict English idiom/phrase from emoji image |
| `English_word` | `word.xlsx` | `word/images` | Predict English word from emoji image |
Each row corresponds to one image and contains at minimum:
- Image filename (matching the corresponding `.jpg` in the image directory)
- Ground-truth idiom / word answer
```json
{
"image_id": "1.png",
"emoji_sequence": "🐢🥐🙅🏻🦌",
"level": "3",
"task_type": "3",
"question_zh": "图中所示的emoji组合表示什么成语?",
"question_en": "What idiom does the emoji combination represent?",
"answer_zh": "圭角不露",
}
```
提供机构:
Eternity-gaga



