BrainboxAI/medical-training-il
收藏Hugging Face2026-04-22 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/BrainboxAI/medical-training-il
下载链接
链接失效反馈官方服务:
资源简介:
Medical-Training-IL 是一个双语(希伯来语/英语)医学指令调整语料库,专为以色列居民准备A阶段考试而设计,用于训练小型、设备上的医学模型。该数据集结合了高质量的英语医学问答(USMLE风格、基础科学、基于研究的)和约5000个源自希伯来语维基百科的希伯来语本土医学示例。数据集旨在在Google的medgemma-1.5-4b-it模型基础上进行进一步微调,该模型已经预训练了医学内容,我们进一步针对希伯来语和以色列背景进行了专门化。数据集包括来自MedQA(USMLE)、MedMCQA(过滤后)、PubMedQA、希伯来语维基百科医学文章以及手工制作的身份集的示例,总计约21,432个示例,其中约35%为希伯来语,65%为英语。数据集分为训练集(20,361个示例)和验证集(1,071个示例),并提供了详细的记录格式和预期的使用场景。
Medical-Training-IL is a bilingual (Hebrew / English) medical instruction-tuning corpus curated for training small, on-device medical models for Israeli residents preparing for Stage A exams. It combines high-quality English medical QA (USMLE-style, basic sciences, research-grounded) with ~5,000 Hebrew-native medical examples derived from Hebrew Wikipedia. The dataset is designed for continued fine-tuning on top of Googles medgemma-1.5-4b-it — a model already pre-trained on medical content, which is specialized further for Hebrew and Israeli context. The dataset includes examples from MedQA (USMLE), filtered MedMCQA, PubMedQA, Hebrew Wikipedia medical articles, and a handcrafted identity set, totaling ~21,432 examples (~35% Hebrew / 65% English). It is split into train (20,361 examples) and validation (1,071 examples) sets, with a detailed record format and intended use cases provided.
提供机构:
BrainboxAI



