DigitConfuse-23k: A Synthetic Dataset of Digit Confusion Patterns

NIAID Data Ecosystem2026-05-02 收录

下载链接：

https://data.mendeley.com/datasets/vb7br8ym46

下载链接

链接失效反馈

官方服务：

资源简介：

📊 DigitConfuse-23k: A Synthetic Dataset of Digit Confusion Patterns DigitConfuse-23k is a synthetic dataset containing 23,000 images of digit pairs designed to capture visual anomalies and confusion cases commonly encountered in OCR, CAPTCHA recognition, optical illusions and human digit interpretation tasks.  Each image contains two-digit numbers generated using the Humor-Sans font (font_size=32, cell_w=60, cell_h=40). For each confusion category, ~1000 images are included. 🔢 Categories of Digit Anomalies 🔸 Digit shape confusion (similar glyphs) → 11 ↔ 17, 21 ↔ 27, 71 ↔ 77 🔄 Mirror / rotation confusion → 69 ↔ 96, 68 ↔ 86, 89↔98, 26 ↔ 62 🎯 One-pixel stroke differences → 33 ↔ 38, 35 ↔ 36, 53 ↔ 58, 39↔89 🌀 Closed vs. open loop confusion → 38 ↔ 88, 98 ↔ 99, 18 ↔ 19, 56↔58, 28↔88 ➿ Nearly identical when repeated → 88 ↔ 89, 11 ↔ 12, 55 ↔ 56 👀 Human OCR-like errors (CAPTCHA/OCR cases) → 47 ↔ 17, 57 ↔ 37, 12 ↔ 72, 14 ↔ 74 🎯Applications 🧪 Benchmarking OCR systems 🛡 Studying digit recognition robustness 🔑 Training models for noisy / CAPTCHA-like digits 🚨 Anomaly detection in digit datasets ⚙️ Technical Details 📂 Total images: 23,000 📑 Categories: 23 confusion pairs ✍️ Font: Humor-Sans.ttf 🔠 Font size: 32 📏 Image cell size: 60 × 40 pixels 👉 This dataset provides a controlled testbed for studying digit misclassification under visually ambiguous conditions. 📦 How to Use 1️⃣ JSONL format (VQA-style for VLM testing) Each entry includes: 🖼 image → file path to the digit image ❓ question → natural language query ✅ answer → ground truth numbers 2️⃣ CSV format (digit confusion localization) The .csv file provides metadata about anomaly location: 🖼 image → file path 📌 location → anomaly position (row, col) 🚀 Suggested Use Cases 🤖 VLM evaluation → Test Qwen-VL, InternVL, LLaVA on fine-grained OCR tasks 📊 OCR benchmarking → Compare CNN-based OCR vs. multimodal LLMs 🔄 Data augmentation research → Train models to handle ambiguity 🕵️ Anomaly detection → Use confusion pairs as “hard negatives” for OCR 🧪 Real-World Testing with Ovis 2.5-9B (Latest Release) We evaluated a subset of images using Ovis 2.5-9B (released Aug 2025). 🖼 Native-resolution ViT (NaViT) → preserves fine details for loop/ stroke differences 🔎 Reflective inference mode → improves reasoning under ambiguous digit confusions 🏆 Benchmark leader → achieves 78.3 avg. score on OpenCompass (best among <40B param open-source models) 📌 Observation: Ovis 2.5-9B performed robustly across one-pixel stroke, mirror/rotation, and loop closure confusions, proving this dataset’s value for fine-grained OCR evaluation with VLMs.

创建时间：

2025-08-21

5,000+

优质数据集

54 个

任务类型

进入经典数据集