Kofi24/adaption-uganda-malaria-clinical-notes
收藏Hugging Face2026-04-27 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/Kofi24/adaption-uganda-malaria-clinical-notes
下载链接
链接失效反馈官方服务:
资源简介:
---
annotations_creators: []
language:
- en
language_creators: []
license: []
multilinguality:
- monolingual
pretty_name: 'uganda_malaria_clinical_notes'
size_categories:
- n<1K
source_datasets:
- 'extended|https://huggingface.co/datasets/Kofi24/afrihealth-malaria-reasoning'
tags:
- adaption
- instruction-tuning
- medical
task_categories: []
task_ids: []
---

This dataset is a remastered version of this [dataset](https://huggingface.co/datasets/Kofi24/afrihealth-malaria-reasoning) prepared using [Adaption's](https://adaptionlabs.ai/app/auth) Adaptive Data platform.
### Overview
This Dataset is a clinician-informed, multilingual clinical reasoning dataset designed to support the development of adaptive AI systems for healthcare in low-resource, multilingual settings.
It captures realistic clinical workflows across Uganda, where clinician–patient interactions often occur in local languages, but clinical documentation must be in English.
### Motivation
Over a two-week field study, we engaged 20+ clinicians across multiple regions in Uganda to understand challenges in their daily workflows.
#### Key Insights
1. Uganda is highly multilingual, yet:
- Clinical conversations occur in local languages
- Clinical notes must be written in English
2. Language barriers often require human translators, which introduce:
- Delays
- Loss of clinical nuance
- Potential misinterpretation
3. Clinicians expressed the need for AI systems that can:
- Generate clinical notes automatically in English even though the encounter happens in local languages
- Provide suggested diagnoses
- Recommend confirmatory investigations
- Suggest treatment plans
### Adaptive Data Framework
We designed a multilingual, structured dataset that enables:
- Cross-lingual understanding (local language → English)
- Clinical reasoning
- Decision support
- Documentation automation
### Dataset Description
This dataset contains synthetic clinical notes documenting acute febrile illnesses, primarily suspected malaria cases, across various patient demographics in Uganda. The notes feature rich code-switching between English and local Ugandan languages (such as Runyankore, Luganda, and Luo) to reflect authentic patient-provider interactions in endemic regions. Each entry includes detailed patient history, symptom progression, clinical assessments, differential diagnoses, and treatment plans tailored to specific age groups and severity levels.
#### Primary Languages Covered
- Luganda
- Acholi
- Lugbara
- Runyankore
- Swahili
- Ugandan English
- Expanded to 18+ other languages with adaption platform
#### Clinical Scenarios
The dataset includes 5 common outpatient conditions in most African health facilities:
- Malaria
- Pneumonia (work in progress)
- Diarrhea (work in progress)
- Common Cold (work in progress)
- Malnutrition (work in progress)
#### Age Groups
Each case is stratified across:
- 0–5 years
- 6–13 years
- 14–17 years
- 18–35 years
- 36–55 years
#### African Health facility Settings
- Rural health facilities (in progress)
- Urban hospitals
### Data Pipeline
Each sample follows a 3-stage clinical pipeline:
1. Conversation
- Multilingual (local language + English code-switching)
- Simulates real OPD interaction
- Designed to be compatible with:
- TTS (Text-to-Speech)
- Voice cloning
- ASR training
2. Structured Clinical Reasoning
- Extracted symptoms
- Risk factors
- Differential diagnoses
- Clinical interpretation
3. Clinical Notes (English)
- Final standardized documentation
- Reflects real-world requirement in Ugandan healthcare
### Use Cases
This dataset is multi-purpose and supports:
1. Sythentic audio generation for Medical ASR (Automatic Speech Recognition)
- Train models on African multilingual speech patterns
2. Clinical Reasoning Models
- Predict diagnoses from conversations
3. Decision Support Systems
- Suggest:
- Diagnoses
- Investigations
- Treatment plans
4. Clinical Note Generation
- Convert conversations → structured English notes
5. Cross-lingual NLP
- Map local language → standardized medical English
### Dataset size
There are 80 data points in this dataset. This is an instruction tuning dataset.
### Quality of Remastered Dataset
The final quality is A, with a relative quality improvement of 10.0%.
### Domain
- Medical (100%)
### Language
- English (100%)
### Tone
- Professional (77%)
- Technical (23%)
### Evaluation Results
- **Quality Gains:**
<img src="https://proteus-prod-public.s3.us-east-1.amazonaws.com/temp/46fa29ff-a614-4835-8e4f-9dedb0253961.png" alt="QualityGains" style="max-width: 50%; display: block; margin-left: auto; margin-right: auto;" />
- **Grade Improvement:**
<img src="https://proteus-prod-public.s3.us-east-1.amazonaws.com/temp/d9916ecf-96ce-40d3-8497-fb37c392bf60.png" alt="Grade" style="max-width: 50%; display: block; margin-left: auto; margin-right: auto;" />
- **Percentile Chart:**
<img src="https://proteus-prod-public.s3.us-east-1.amazonaws.com/temp/fc43687d-4a32-427e-811d-007209178034.png" alt="Percentile Chart" style="max-width: 50%; display: block; margin-left: auto; margin-right: auto;" />
提供机构:
Kofi24



