five

DigitConfuse-23k: A Synthetic Dataset of Digit Confusion Patterns

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://data.mendeley.com/datasets/vb7br8ym46
下载链接
链接失效反馈
官方服务:
资源简介:
📊 DigitConfuse-23k: A Synthetic Dataset of Digit Confusion Patterns DigitConfuse-23k is a synthetic dataset containing 23,000 images of digit pairs designed to capture visual anomalies and confusion cases commonly encountered in OCR, CAPTCHA recognition, optical illusions and human digit interpretation tasks.  Each image contains two-digit numbers generated using the Humor-Sans font (font_size=32, cell_w=60, cell_h=40). For each confusion category, ~1000 images are included. 🔢 Categories of Digit Anomalies 🔸 Digit shape confusion (similar glyphs) → 11 ↔ 17, 21 ↔ 27, 71 ↔ 77 🔄 Mirror / rotation confusion → 69 ↔ 96, 68 ↔ 86, 89↔98, 26 ↔ 62 🎯 One-pixel stroke differences → 33 ↔ 38, 35 ↔ 36, 53 ↔ 58, 39↔89 🌀 Closed vs. open loop confusion → 38 ↔ 88, 98 ↔ 99, 18 ↔ 19, 56↔58, 28↔88 ➿ Nearly identical when repeated → 88 ↔ 89, 11 ↔ 12, 55 ↔ 56 👀 Human OCR-like errors (CAPTCHA/OCR cases) → 47 ↔ 17, 57 ↔ 37, 12 ↔ 72, 14 ↔ 74 🎯Applications 🧪 Benchmarking OCR systems 🛡 Studying digit recognition robustness 🔑 Training models for noisy / CAPTCHA-like digits 🚨 Anomaly detection in digit datasets ⚙️ Technical Details 📂 Total images: 23,000 📑 Categories: 23 confusion pairs ✍️ Font: Humor-Sans.ttf 🔠 Font size: 32 📏 Image cell size: 60 × 40 pixels 👉 This dataset provides a controlled testbed for studying digit misclassification under visually ambiguous conditions. 📦 How to Use 1️⃣ JSONL format (VQA-style for VLM testing) Each entry includes: 🖼 image → file path to the digit image ❓ question → natural language query ✅ answer → ground truth numbers 2️⃣ CSV format (digit confusion localization) The .csv file provides metadata about anomaly location: 🖼 image → file path 📌 location → anomaly position (row, col) 🚀 Suggested Use Cases 🤖 VLM evaluation → Test Qwen-VL, InternVL, LLaVA on fine-grained OCR tasks 📊 OCR benchmarking → Compare CNN-based OCR vs. multimodal LLMs 🔄 Data augmentation research → Train models to handle ambiguity 🕵️ Anomaly detection → Use confusion pairs as “hard negatives” for OCR 🧪 Real-World Testing with Ovis 2.5-9B (Latest Release) We evaluated a subset of images using Ovis 2.5-9B (released Aug 2025). 🖼 Native-resolution ViT (NaViT) → preserves fine details for loop/ stroke differences 🔎 Reflective inference mode → improves reasoning under ambiguous digit confusions 🏆 Benchmark leader → achieves 78.3 avg. score on OpenCompass (best among <40B param open-source models) 📌 Observation: Ovis 2.5-9B performed robustly across one-pixel stroke, mirror/rotation, and loop closure confusions, proving this dataset’s value for fine-grained OCR evaluation with VLMs.
创建时间:
2025-08-21
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作