Kakyoin03/MixedDataset
收藏Hugging Face2026-04-28 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/Kakyoin03/MixedDataset
下载链接
链接失效反馈官方服务:
资源简介:
一个大规模的双语医学问答数据集,包含21,941个高质量条目。该数据集是两个严格医学语料库的组合:- **英语**:13,812个条目 - **摩洛哥达里贾语**:8,129个条目。该混合数据集专门用于微调强大的多语言LLM,使其能够无缝处理高资源(英语)和低资源(摩洛哥达里贾语)环境中的医学问答。关键特征包括:- **语言**:英语和摩洛哥达里贾语 - **格式**:患者场景(`context_question`)、具体问题(`question`)、医生回答(`answer`) - **元数据**:紧急程度、医学专业、文章标题 - **实体**:丰富的症状、疾病、药物、医学测试和结果注释。
A large-scale bilingual medical question-answer dataset containing 21,941 high-quality entries. This dataset is a combination of two rigorous medical corpora: - **English**: 13,812 entries - **Moroccan Darija**: 8,129 entries. This mixed dataset is specifically tailored for fine-tuning robust multilingual LLMs capable of handling medical QA in both high-resource (English) and low-resource (Moroccan Darija) settings seamlessly. Key Features: - **Language**: English and Moroccan Darija - **Format**: Patient Scenario (`context_question`), Specific Question (`question`), Doctor Response (`answer`) - **Metadata**: Urgency level, medical specialty, article title - **Entities**: Richly annotated with arrays for symptoms, diseases, medications, medical tests, and results.
提供机构:
Kakyoin03



