five

EACS Dataset: A Real-World Academic Admit Card Dataset from Bangladeshi Educational Institutions for OCR and Document Verification Research

收藏
DataCite Commons2026-04-17 更新2026-05-04 收录
下载链接:
https://data.mendeley.com/datasets/mvzdksjzvb/1
下载链接
链接失效反馈
官方服务:
资源简介:
The EACS Dataset (Exam Admit Card Dataset) is a specialized image-based dataset developed to address challenges in document verification and automated data entry within educational administrative systems. The dataset was collected over a four-year period (2022–2025) from the operational ERP platform (NationalSchoolIntra) of a Bangladeshi educational institution located in Chittagong. It comprises 4,407 unique admit card images, each standardized to a resolution of 800 × 460 pixels and stored in JPG format to ensure a balance between visual clarity and computational efficiency for machine learning applications. A key characteristic of the dataset is its structural complexity. Each admit card contains nine distinct textual fields, including Student Name, Father’s Name, Mother’s Name, Roll Number, Student ID, Class, Session, Group, and Semester. The dataset captures real-world variability in font styles (predominantly serif fonts such as Times New Roman), alignment inconsistencies, and layout variations, making it particularly suitable for evaluating the robustness of Optical Character Recognition (OCR) systems and Document AI models. To facilitate supervised learning and benchmarking, a corresponding CSV metadata file is provided, containing manually verified ground-truth annotations for each field. This enables precise evaluation at both character-level and word-level accuracy. To ensure privacy protection and ethical compliance, all sensitive personal identifiers (such as phone numbers, email addresses, photographs, and other confidential attributes) have been removed or anonymized. Additionally, personal names (e.g., student, father, mother names , student id , student roll number) have been systematically randomized while preserving their structural and linguistic characteristics, ensuring that no real individual can be identified while maintaining the dataset’s utility for OCR research. The dataset is intended for academic and non-commercial research, particularly in the areas of document analysis, OCR benchmarking, and automated verification systems. By making this dataset publicly available, we aim to support the development of secure, AI-driven solutions for educational document processing and fraud detection.
提供机构:
Mendeley Data
创建时间:
2026-04-17
二维码
社区交流群
二维码
科研交流群
商业服务