ELJAOUHARY/YeMedQA_Mutilangual

Name: ELJAOUHARY/YeMedQA_Mutilangual
Creator: ELJAOUHARY
Published: 2026-04-21 14:07:28
License: 暂无描述

Hugging Face2026-04-21 更新2026-04-26 收录

下载链接：

https://hf-mirror.com/datasets/ELJAOUHARY/YeMedQA_Mutilangual

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: features: - name: id dtype: string - name: question dtype: string - name: context_question dtype: string - name: answer dtype: string - name: language dtype: string - name: urgency dtype: string - name: speciality dtype: string - name: article_title dtype: string - name: entities struct: - name: age list: string - name: medicament list: string - name: sympt list: string - name: medical_field list: string - name: disease list: string - name: Test list: string - name: Result list: string splits: - name: train num_bytes: 6948163.361080951 num_examples: 7460 - name: test num_bytes: 772121.6389190493 num_examples: 829 download_size: 4170389 dataset_size: 7720285.0 configs: - config_name: default data_files: - split: train path: data/train-* - split: test path: data/test-* --- ## Question Answering Mutilangue Dataset for Healthcare. ![YeMedQA.drawio (2)](https://cdn-uploads.huggingface.co/production/uploads/6962771c8b0bef761b53df3f/_M4txQSX_wMRNsxTsyTiq.png) # Overview: **YeMedQA** is a multilingual Question-Answering dataset designed for healthcare NLP applications. It focuses on **patient–doctor medical conversations** in: - Darija - English - French **Keywords:** Medical Question Answering (MedQA), Large Language Models (LLMs), Natural Language Processing (NLP), AI in Healthcare The dataset supports the development of **culturally and linguistically adapted medical AI systems**. ## 🌐 Data Collection YeMedQA was constructed using: ### 1. Web Scraping (Verified Medical Sources) Medical content was collected and curated from trusted healthcare platforms: - www.icliniq.com - www.altibbi.com ### 2. Hugging Face Open Data - Publicly available medical QA datasets (ANR-Maladies) These sources were selected for their: - High medical credibility - Real patient–doctor interactions - Multilingual content availability ### Dataset Splits | Split | Examples | Size (MB) | | :--- | :---: | :---: | | **Train** | 7,460 | 6.95 MB | | **Test** | 829 | 0.77 MB | | **Total** | **8,289** | **7.72 MB** | ## Column: | Feature | Type | Description | | :--- | :--- | :--- | | `id` | `string` | Unique ID | | `question` | `string` | The patient question(e.g., in Darija) | | `context_question` | `string` | Clinical context or patient background | | `answer` | `string` | Responce by Doctor Professional medical | | `article_title` | `string` | Title of the reference medical article | | `language` | `string` | Language of the entry (Darija, FR, EN) | | `urgency` | `string` | Severity level (Low, Medium, High) | | `speciality` | `string` | Medical department (e.g., Cardiology, Immunology) | | `NER` | `string` | Name Entity Recognition (disease , Symptoms , Test ...) | ## NER Entities Metadata (`entities` column) | Entity | Type | Description | | :--- | :--- | :--- | | `disease` | `list[string]` | Diagnosed conditions or illnesses | | `sympt` | `list[string]` | Reported symptoms (e.g., "حكة", "fever") | | `medicament` | `list[string]` | Prescribed or mentioned drugs | | `medical_field` | `list[string]` | Broad medical categories (e.g., "Allergologie") | | `age` | `list[string]` | Patient age or age group mentions | | `Test` / `Result` | `list[string]` | Clinical exams and their respective outcomes |  ## ✍️ Author & Citation This dataset was curated and processed by **Youssef Eljaouhary**. If you use this dataset in your research or project, please cite it as: > Eljaouhary, Y. (2026). MedQA Multilingual Dataset (Darija/FR/EN). Hugging Face. ## ⚖️ License This project is licensed under the **MIT License**. You are free to use, modify, and distribute this dataset for both commercial and non-commercial purposes, provided that the original author is credited.

提供机构：

ELJAOUHARY

5,000+

优质数据集

54 个

任务类型

进入经典数据集