Nigeria-Health-data-OCR-pipeline/African-Medical-Records
收藏Hugging Face2026-03-26 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/Nigeria-Health-data-OCR-pipeline/African-Medical-Records
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-4.0
task_categories:
- feature-extraction
language:
- en
tags:
- medical
---
# African Medical Records (AMR): Nigerian Handwritten Medical Records Dataset
**A Benchmark Dataset for OCR, Clinical Text Extraction, and Healthcare Insight Modeling**
---
## Overview
African Medical Records (AMR) is a large-scale, growing dataset designed to support **optical character recognition (OCR)**, **medical text extraction**, and **healthcare data analysis** from handwritten clinical records.
The dataset consists of **paired handwritten medical notes and their corresponding digital ground truth**, enabling robust benchmarking and training of AI systems for real-world healthcare environments.
This initiative is driven by a distributed network of **107+ contributors (and growing)** across **40 universities in Nigeria**, ensuring diversity in handwriting styles, formats, and regional medical practices.
> The current batch features **13 contributors** from this volunteer pool, with additional contributors submitting datasets on a **rolling basis**.
Our long-term vision is to build **the largest African medical handwriting dataset**, expanding across regions in Nigeria and eventually across Africa.
---
## Project Goal
The primary goal of AMR is to:
* Build a **realistic and reproducible OCR benchmark** for handwritten medical notes
* Enable **accurate digitization of healthcare records** in low-resource settings
* Support development of **edge-deployable AI systems** for healthcare environments
* Provide a dataset that reflects **clinical usefulness, not just text accuracy**
---
## Dataset Structure
Each contribution follows a standardized format:
### Document Types
Volunteers generate **4 types of medical records**, each with **2 versions**:
| Specialty | Record Type |
| ---------------- | ------------------ |
| General Medicine | Patient Visit Note |
| Nursing | Vital Signs Chart |
| Pharmacy | Prescription Note |
| Laboratory | Lab Request Form |
Each record includes:
* **Synthetic Handwritten Version**
* **Digital Ground Truth Version**
This results in:
> **8 documents per contributor**
---
## Dataset Features
* Diverse **handwriting styles across regions and institutions**
* Realistic **clinical abbreviations and formatting**
* Paired **image-to-text ground truth alignment**
* Structured to support:
* OCR benchmarking (CER, WER)
* Field-level extraction (drug names, dosage, vitals)
* Abbreviation recognition
* Clinical reasoning pipelines
---
## File Naming Convention
To ensure traceability and structure, files follow standardized naming.
### Volunteer-Based Naming
```
[VolunteerNumber]-[DocumentNumber]-[TYPE]
```
Example:
```
1-001-syn
1-001-truth
```
---
### Patient-Based Dataset Naming (Recommended)
To improve dataset organization and scalability:
```
[DatasetCode]-PATIENT-[Number]
```
Examples:
* **SUNN-PATIENT-001** → Synthetic dataset from University of Nigeria Nsukka
* **SUIL-PATIENT-001** → Synthetic dataset from University of Ilorin
* **TUNN-PATIENT-001** → Ground truth dataset from University of Nigeria Nsukka
Example sequence:
```
SUNN-PATIENT-001
TUNN-PATIENT-001
SUNN-PATIENT-002
TUNN-PATIENT-002
```
Each synthetic record must have a **corresponding ground truth pair**.
---
## Data Collection Standards
### Document Requirements
* A4 plain white paper
* Blue or black ink
* Natural handwriting (no printed text)
* Proper margins and full-page visibility
### Scan Requirements
**Acceptable:**
* Well-lit images
* Camera directly above the paper
* Fully visible and cropped page
* Clear and readable text
**Not Acceptable:**
* Blurry or dark images
* Tilted or angled captures
* Shadows or obstructions
* Incomplete or poorly cropped scans
---
## Use Cases
This dataset is designed for:
### AI Researchers
* Benchmarking OCR and VLM models
* Evaluating handwriting recognition in low-resource settings
* Training models for structured medical extraction
### Health-Tech Developers
* Building EHR digitization systems
* Developing edge AI for clinics and rural hospitals
### Policy Makers & Public Health Analysts
* Understanding **patterns between diseases and regions**
* Informing **health infrastructure planning and prioritization**
* Supporting data-driven healthcare decisions
---
## Dataset Growth Vision
AMR is a **living dataset**.
* New batches will be added continuously
* Coverage will expand across **regions in Nigeria**
* Long-term goal: scale to **pan-African healthcare datasets**
---
## Ethical Statement
> The following datasets are synthetic, and any similarities to medical conditions of members of the public are intended and not obtained from any medical institution.
* No real patient data is included
* All records are **fictional and generated for research purposes**
* Contributors are instructed to avoid any identifiable information
---
## Intended Use
This dataset is for **research purposes only**.
> It should **not be used for clinical diagnosis, treatment, or real-world medical decision-making**.
---
## License
Recommended: **Creative Commons Attribution 4.0 (CC BY 4.0)**
This allows broad usage while ensuring proper attribution to the AMR project and its contributors.
---
## Citation
```bibtex
@dataset{amr_2026,
title={African Medical Records (AMR): Nigerian Handwritten Medical Records Dataset},
author={ AMR Contributors},
year={2026},
publisher={Hugging Face},
url={https://huggingface.co/datasets/Nigeria-Health-data-OCR-pipeline/African-Medical-Records}
}
```
---
## Contributors
This dataset is made possible by a distributed network of contributors.
* **107+ contributors (and growing)**
* **40 universities across Nigeria**
### Volunteers
See `CONTRIBUTORS.md` for the full list.
---
## Acknowledgements
We acknowledge all student contributors, institutions, and collaborators supporting the AMR initiative.
---
## Final Note
AMR is not just a dataset—it is **infrastructure for African AI in healthcare**.
提供机构:
Nigeria-Health-data-OCR-pipeline



