DigitConfuse-23k: A Synthetic Dataset of Digit Confusion Patterns
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://data.mendeley.com/datasets/vb7br8ym46
下载链接
链接失效反馈官方服务:
资源简介:
📊 DigitConfuse-23k: A Synthetic Dataset of Digit Confusion Patterns
DigitConfuse-23k is a synthetic dataset containing 23,000 images of digit pairs designed to capture visual anomalies and confusion cases commonly encountered in OCR, CAPTCHA recognition, optical illusions and human digit interpretation tasks.
Each image contains two-digit numbers generated using the Humor-Sans font (font_size=32, cell_w=60, cell_h=40). For each confusion category, ~1000 images are included.
🔢 Categories of Digit Anomalies
🔸 Digit shape confusion (similar glyphs) → 11 ↔ 17, 21 ↔ 27, 71 ↔ 77
🔄 Mirror / rotation confusion → 69 ↔ 96, 68 ↔ 86, 89↔98, 26 ↔ 62
🎯 One-pixel stroke differences → 33 ↔ 38, 35 ↔ 36, 53 ↔ 58, 39↔89
🌀 Closed vs. open loop confusion → 38 ↔ 88, 98 ↔ 99, 18 ↔ 19, 56↔58, 28↔88
➿ Nearly identical when repeated → 88 ↔ 89, 11 ↔ 12, 55 ↔ 56
👀 Human OCR-like errors (CAPTCHA/OCR cases) → 47 ↔ 17, 57 ↔ 37, 12 ↔ 72, 14 ↔ 74
🎯Applications
🧪 Benchmarking OCR systems
🛡 Studying digit recognition robustness
🔑 Training models for noisy / CAPTCHA-like digits
🚨 Anomaly detection in digit datasets
⚙️ Technical Details
📂 Total images: 23,000
📑 Categories: 23 confusion pairs
✍️ Font: Humor-Sans.ttf
🔠 Font size: 32
📏 Image cell size: 60 × 40 pixels
👉 This dataset provides a controlled testbed for studying digit misclassification under visually ambiguous conditions.
📦 How to Use
1️⃣ JSONL format (VQA-style for VLM testing)
Each entry includes:
🖼 image → file path to the digit image
❓ question → natural language query
✅ answer → ground truth numbers
2️⃣ CSV format (digit confusion localization)
The .csv file provides metadata about anomaly location:
🖼 image → file path
📌 location → anomaly position (row, col)
🚀 Suggested Use Cases
🤖 VLM evaluation → Test Qwen-VL, InternVL, LLaVA on fine-grained OCR tasks
📊 OCR benchmarking → Compare CNN-based OCR vs. multimodal LLMs
🔄 Data augmentation research → Train models to handle ambiguity
🕵️ Anomaly detection → Use confusion pairs as “hard negatives” for OCR
🧪 Real-World Testing with Ovis 2.5-9B (Latest Release)
We evaluated a subset of images using Ovis 2.5-9B (released Aug 2025).
🖼 Native-resolution ViT (NaViT) → preserves fine details for loop/ stroke differences
🔎 Reflective inference mode → improves reasoning under ambiguous digit confusions
🏆 Benchmark leader → achieves 78.3 avg. score on OpenCompass (best among <40B param open-source models)
📌 Observation: Ovis 2.5-9B performed robustly across one-pixel stroke, mirror/rotation, and loop closure confusions, proving this dataset’s value for fine-grained OCR evaluation with VLMs.
创建时间:
2025-08-21



