five

Eternity-gaga/SymbolBench

收藏
Hugging Face2026-04-08 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/Eternity-gaga/SymbolBench
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 task_categories: - image-to-text language: - en - zh tags: - multimodal - STEM - symbol size_categories: - 10K<n<100K --- # Cognitive Mismatch in Multimodal Large Language Models for Discrete Symbol Understanding <div align="center"> [![Paper](https://img.shields.io/badge/arXiv-2603.18472-b31b1b.svg?style=for-the-badge&logo=arxiv)](https://arxiv.org/abs/2603.18472) [![Dataset](https://img.shields.io/badge/🤗%20Hugging%20Face-SymbolBench-yellow.svg?style=for-the-badge)](https://huggingface.co/datasets/Eternity-gaga/SymbolBench) [![GitHub](https://img.shields.io/badge/GitHub-100000?style=for-the-badge&logo=github&logoColor=white)](https://github.com/THUKElab/SymbolBench) </div> This directory contains all benchmark data for the paper: *Cognitive Mismatch in Multimodal Large Language Models for Discrete Symbol Understanding*. The dataset spans **five domains**, each with a dedicated subdirectory. --- ## Directory Structure ``` data/ ├── language/ # Language domain │ ├── multimodal_dataset_new.json │ └── images/ # Handwritten essay image files (*.jpg) — not included │ ├── chemistry/ # Chemistry domain │ ├── symbol_chemistry.json │ └── chemistry_images/ # Molecular structure images (*.png) — included │ ├── physics/ # Physics domain │ ├── filtered_physics_final.json │ └── images/ # Diagram images (*.png) — not included │ ├── mathematics/ # Mathematics domain │ └── math_data/ │ ├── adjusted_multimath_symbol.json │ ├── filter_mathvista.json │ ├── images_multimath.json │ └── images_mathvista/ # MathVista images (*.jpg) — included │ └── culture/ # Emoji / Cultural domain └── data/ ├── Chinese_idiom_4/ │ ├── chengyu.xlsx # Answer metadata │ └── images/ # Emoji images (*.jpg) — included ├── Chinese_idiom_multi/ │ ├── chengyu_fei4.xlsx │ └── images/ # Emoji images (*.jpg) ├── English_idiom/ │ ├── idiom.xlsx │ └── images/ # Emoji images (*.jpg) └── English_word/ ├── word.xlsx └── images/ # Emoji images (*.jpg) ``` --- ## Domain 1 — Language **Source:** Scanned Chinese elementary-school handwritten compositions. Each sample is a single sentence extracted from an essay image, annotated at three task levels of increasing difficulty. **Sample distribution by Task / Level:** | Task Type | Level | Count | Description | |-----------|-------|-------|-------------| | 1 | 1 | 526 | **Unrecognizable character detection** — Output the sentence verbatim; replace unrecognizable characters with `X`, keep genuinely unknown characters as `U` | | 2 | 2 | 488 | **Miswritten character detection** — Identify characters that are recognizable but incorrectly written; output a structured diff list `{"n": N, "diffs": [{"idx": i, "src": "原字", "tgt": "正字"}, ...]}` | | 3 | 3 | 824 | **Sentence correction** — Output the fully corrected sentence with all errors fixed | **JSON schema:** ```json { "image_id": "181_0001.jpg", "level": "1", "task_type": "1", "question_zh": "任务1:错字检测。...", "question_en": "Task 1: Wrong-character detection. ...", "answer": "一条蚯蚓X好地帮XUUX感X地说:..." } ``` **Evaluation metrics:** | Task | Metric | |------|--------| | Task 1 | Character-level F1 on `X` positions (Precision / Recall / F1) | | Task 2 | Token-pair F1 on `(idx, src)` sets | | Task 3 | Exact Match (EM) after punctuation normalization; Edit Distance | --- ## Domain 2 — Chemistry **Source:** Chemical molecule structures rendered from SMILES strings using RDKit. The dataset tests whether models can parse structural chemical diagrams at increasing levels of complexity. **Sample distribution by Level and Task Type:** | Level | Task Type | Count | Task Description | |-------|-----------|-------|-----------------| | 1 | 1 | 481 | Atom identification and counting | | 1 | 2 | 200 | Chemical bond type identification and counting | | 2 | 3 | 1,000 | Chemical reaction type classification | | 2 | 4 | 1,000 | Chemical reaction coefficient verification / correction | | 3 | 5 | 191 | Chemical symbol / equation error detection | | 3 | 6 | 300 | Reaction condition reading (multiple-choice, from literature figures) | | 3 | 7 | 202 | Reaction duration and temperature reading (multiple-choice) | | 3 | 8 | 300 | Reaction yield estimation (multiple-choice) | **JSON schema:** ```json { "image_id": "che_l1_t1_1040.png", "smiles": "CC(=O)Oc1cccc(C)c1", "question_zh": "图中的化学分子有哪些原子,对应的数量是多少", "question_en": "What atoms are present in the chemical molecule and how many of each?", "answer_zh": {"C": 9, "O": 2, "H": 10}, "answer_en": {"C": 9, "O": 2, "H": 10}, "level": "1", "task_type": "1" } ``` **Notes:** - The `smiles` field provides the canonical SMILES string for ground-truth verification. - Image naming convention: `che_l{level}_t{task_type}_{id}.png` - Task types 6–8 use multiple-choice answers (single letter A/B/C/D); the remaining task types use structured or free-form answers. --- ## Domain 3 — Physics **Source:** Filtered and reannotated from benchmark MMMU-pro, Olympaidbench, and GAOKAO-bench, retaining samples that require reading and interpreting physics / engineering **symbolic diagrams** (circuit diagrams, force diagrams, thermodynamic charts, etc.). **Task type descriptions:** | Task Type | Level | Description | |-----------|-------|-------------| | 1 | level1 | Chinese physics calculation / derivation problems with diagrams | | 2 | level1 | English free-response problems requiring diagram reading | | 3 | level2 | Chinese problems requiring multi-step reasoning from diagrams | | 4 | level2 | Multiple-choice (A/B/C/D) from MMMU | | 5 | level3 | Physics / circuit diagram error detection | | 6 | level3 | Multiple-choice requiring symbolic diagram interpretation | **JSON schema:** ```json { "id": "MMMU_1_test_Physics_103", "question_en": "[Question] <image 1> A wire moves with velocity v ... \n[A] into the page\n[B] ...", "subject": "Physics", "answer": "A", "task_type": "6", "level": "level3", "image_1": "MMMU_1_test_Physics_103_image_1.png" } ``` --- ## Domain 4 — Mathematics **Source:** Curated from Chinese middle / high school math exercises spanning geometry and function topics, with multi-step solution annotations. All answers are formatted in LaTeX `\boxed{}` notation. **Level distribution:** | Level | Count | Description | |-------|-------|-------------| | 1 | 639 | Basic symbolic reading (graph/table reading, simple computation) | | 2 | 1,524 | Multi-step reasoning from symbolic diagrams | | 3 | 436 | Error detection, definition verification, advanced proof | **JSON schema:** ```json { "image_id": "6fb1abf7f9c72c67be68625a0e7d19a0.png", "data_type": "geometry", "question_type": "填空", "level": "2", "task_type": "9", "QA_pair": [ { "question_zh": "则根据题意可列出方程为________ _.", "condition_zh": "如图,在一块长为22米、宽为17米的矩形地面上,...", "answer_zh": "(22-x)(17-x)=300", "question_en": "According to the meaning of the question, the equation can be set up as ____.", "condition_en": "As shown in the figure, on a rectangular ground with length 22 m ...", "solution_zh": "Step 1 (设变量): ...\nAnswer: \\boxed{(22-x)(17-x)=300}", "solution_en": "Step 1 (Set the variable): ...\nAnswer: \\boxed{(22-x)(17-x)=300}" } ] } ``` **Key fields:** | Field | Description | |-------|-------------| | `data_type` | `geometry` or `function` | | `question_type` | `填空` (fill-in) / `解答` (open-ended) / `选择` (multiple-choice) / `证明` (proof) / `判断` (true-false) | | `condition_zh/en` | Background information or diagram description for the question | | `answer_zh` | Ground-truth answer, typically wrapped in `\boxed{}` | | `solution_zh/en` | Step-by-step solution with final `\boxed{}` answer | **Evaluation notes:** - Rule-based evaluation extracts `\boxed{...}` content from both prediction and answer for exact-match comparison. - LLM-as-Judge evaluation checks mathematical equivalence (ignoring formatting and units). --- ## Domain 5 — Cultural **Source:** Purpose-built dataset testing cross-lingual, cross-cultural symbolic understanding. Each image is a composite of emoji characters that together encode a Chinese idiom or English idiom/word. The task requires recognizing the symbolic meaning of the emoji sequence. **Sub-dataset overview:** | Sub-dataset | Metadata file | Image dir | Task description | |-------------|--------------|-----------|----------|-----------------| | `Chinese_idiom_4` | `chengyu.xlsx` | `chengyu_4/images` | Predict 4-character Chinese idiom (成语) from emoji composite image | | `Chinese_idiom_multi` | `chengyu_fei4.xlsx` | `chengyu_fei4/images` | Predict variable-length Chinese idiom from emoji image | | `English_idiom` | `idiom.xlsx` | `idiom/images` | Predict English idiom/phrase from emoji image | | `English_word` | `word.xlsx` | `word/images` | Predict English word from emoji image | Each row corresponds to one image and contains at minimum: - Image filename (matching the corresponding `.jpg` in the image directory) - Ground-truth idiom / word answer ```json { "image_id": "1.png", "emoji_sequence": "🐢🥐🙅🏻‍🦌", "level": "3", "task_type": "3", "question_zh": "图中所示的emoji组合表示什么成语?", "question_en": "What idiom does the emoji combination represent?", "answer_zh": "圭角不露", } ```
提供机构:
Eternity-gaga
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作