five

cyrille-elie/CHSA-Triage-Medic-Full-Dataset

收藏
Hugging Face2025-12-15 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/cyrille-elie/CHSA-Triage-Medic-Full-Dataset
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集是在AI工程师项目(CHSA项目)框架下构建的,旨在训练一个能够进行紧急分诊和提供临床推理的智能医疗助手。数据集分为三个不同的子集,对应于不同的训练阶段(监督微调和对齐)。 1. **sft_medical_dataset**:包含一般医学知识和临床案例(法语/英语),平衡为50%法语和50%英语,已通过Microsoft Presidio进行匿名化处理。 2. **sft_expert_dataset**:包含高质量的合成数据,专门用于紧急分诊教学,使用医学同义词数据增强技术以避免过拟合。 3. **dpo_dataset**:用于直接偏好优化(DPO)阶段,使模型学会偏好详细、结构化和临床准确的回答。

This dataset was created as part of an **AI Engineer** project (CHSA Project). It is designed to train an Intelligent Medical Assistant capable of performing **emergency triage** and providing clinical reasoning. The dataset is divided into **3 distinct subsets** corresponding to different training phases (Supervised Fine-Tuning and Alignment). 1. **`sft_medical_dataset`**: General medical knowledge and clinical cases (FR/EN), balanced at 50% French and 50% English, anonymized via Microsoft Presidio. 2. **`sft_expert_dataset`**: Synthetic data specialized in emergency triage (FR), using medical synonym data augmentation techniques to avoid overfitting. 3. **`dpo_dataset`**: Preference pairs for alignment (Direct Preference Optimization), favoring detailed, structured, and clinically accurate responses.
提供机构:
cyrille-elie
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作