five

Kofi24/adaption-uganda-malaria-clinical-notes

收藏
Hugging Face2026-04-27 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/Kofi24/adaption-uganda-malaria-clinical-notes
下载链接
链接失效反馈
官方服务:
资源简介:
--- annotations_creators: [] language: - en language_creators: [] license: [] multilinguality: - monolingual pretty_name: 'uganda_malaria_clinical_notes' size_categories: - n<1K source_datasets: - 'extended|https://huggingface.co/datasets/Kofi24/afrihealth-malaria-reasoning' tags: - adaption - instruction-tuning - medical task_categories: [] task_ids: [] --- ![banner](https://proteus-prod-public.s3.us-east-1.amazonaws.com/temp/9cd883de-e14c-4e05-8e51-ba6ff951bdf1.png) This dataset is a remastered version of this [dataset](https://huggingface.co/datasets/Kofi24/afrihealth-malaria-reasoning) prepared using [Adaption's](https://adaptionlabs.ai/app/auth) Adaptive Data platform. ### Overview This Dataset is a clinician-informed, multilingual clinical reasoning dataset designed to support the development of adaptive AI systems for healthcare in low-resource, multilingual settings. It captures realistic clinical workflows across Uganda, where clinician–patient interactions often occur in local languages, but clinical documentation must be in English. ### Motivation Over a two-week field study, we engaged 20+ clinicians across multiple regions in Uganda to understand challenges in their daily workflows. #### Key Insights 1. Uganda is highly multilingual, yet: - Clinical conversations occur in local languages - Clinical notes must be written in English 2. Language barriers often require human translators, which introduce: - Delays - Loss of clinical nuance - Potential misinterpretation 3. Clinicians expressed the need for AI systems that can: - Generate clinical notes automatically in English even though the encounter happens in local languages - Provide suggested diagnoses - Recommend confirmatory investigations - Suggest treatment plans ### Adaptive Data Framework We designed a multilingual, structured dataset that enables: - Cross-lingual understanding (local language → English) - Clinical reasoning - Decision support - Documentation automation ### Dataset Description This dataset contains synthetic clinical notes documenting acute febrile illnesses, primarily suspected malaria cases, across various patient demographics in Uganda. The notes feature rich code-switching between English and local Ugandan languages (such as Runyankore, Luganda, and Luo) to reflect authentic patient-provider interactions in endemic regions. Each entry includes detailed patient history, symptom progression, clinical assessments, differential diagnoses, and treatment plans tailored to specific age groups and severity levels. #### Primary Languages Covered - Luganda - Acholi - Lugbara - Runyankore - Swahili - Ugandan English - Expanded to 18+ other languages with adaption platform #### Clinical Scenarios The dataset includes 5 common outpatient conditions in most African health facilities: - Malaria - Pneumonia (work in progress) - Diarrhea (work in progress) - Common Cold (work in progress) - Malnutrition (work in progress) #### Age Groups Each case is stratified across: - 0–5 years - 6–13 years - 14–17 years - 18–35 years - 36–55 years #### African Health facility Settings - Rural health facilities (in progress) - Urban hospitals ### Data Pipeline Each sample follows a 3-stage clinical pipeline: 1. Conversation - Multilingual (local language + English code-switching) - Simulates real OPD interaction - Designed to be compatible with: - TTS (Text-to-Speech) - Voice cloning - ASR training 2. Structured Clinical Reasoning - Extracted symptoms - Risk factors - Differential diagnoses - Clinical interpretation 3. Clinical Notes (English) - Final standardized documentation - Reflects real-world requirement in Ugandan healthcare ### Use Cases This dataset is multi-purpose and supports: 1. Sythentic audio generation for Medical ASR (Automatic Speech Recognition) - Train models on African multilingual speech patterns 2. Clinical Reasoning Models - Predict diagnoses from conversations 3. Decision Support Systems - Suggest: - Diagnoses - Investigations - Treatment plans 4. Clinical Note Generation - Convert conversations → structured English notes 5. Cross-lingual NLP - Map local language → standardized medical English ### Dataset size There are 80 data points in this dataset. This is an instruction tuning dataset. ### Quality of Remastered Dataset The final quality is A, with a relative quality improvement of 10.0%. ### Domain - Medical (100%) ### Language - English (100%) ### Tone - Professional (77%) - Technical (23%) ### Evaluation Results - **Quality Gains:** <img src="https://proteus-prod-public.s3.us-east-1.amazonaws.com/temp/46fa29ff-a614-4835-8e4f-9dedb0253961.png" alt="QualityGains" style="max-width: 50%; display: block; margin-left: auto; margin-right: auto;" /> - **Grade Improvement:** <img src="https://proteus-prod-public.s3.us-east-1.amazonaws.com/temp/d9916ecf-96ce-40d3-8497-fb37c392bf60.png" alt="Grade" style="max-width: 50%; display: block; margin-left: auto; margin-right: auto;" /> - **Percentile Chart:** <img src="https://proteus-prod-public.s3.us-east-1.amazonaws.com/temp/fc43687d-4a32-427e-811d-007209178034.png" alt="Percentile Chart" style="max-width: 50%; display: block; margin-left: auto; margin-right: auto;" />
提供机构:
Kofi24
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作