five

ApolloCorpus

收藏
魔搭社区2025-12-04 更新2025-01-25 收录
下载链接:
https://modelscope.cn/datasets/FreedomIntelligence/ApolloCorpus
下载链接
链接失效反馈
官方服务:
资源简介:
# Multilingual Medicine: Model, Dataset, Benchmark, Code Covering English, Chinese, French, Hindi, Spanish, Hindi, Arabic So far <p align="center"> 👨🏻‍💻<a href="https://github.com/FreedomIntelligence/Apollo" target="_blank">Github</a> •📃 <a href="https://arxiv.org/abs/2403.03640" target="_blank">Paper</a> • 🌐 <a href="https://apollo.llmzoo.com/" target="_blank">Demo</a> • 🤗 <a href="https://huggingface.co/datasets/FreedomIntelligence/ApolloCorpus" target="_blank">ApolloCorpus</a> • 🤗 <a href="https://huggingface.co/datasets/FreedomIntelligence/XMedbench" target="_blank">XMedBench</a> <br> <a href="./README_zh.md"> 中文 </a> | <a href="./README.md"> English </p> ![Apollo](assets/apollo_medium_final.png) ## 🌈 Update * **[2024.03.07]** [Paper](https://arxiv.org/abs/2403.03640) released. * **[2024.02.12]** <a href="https://huggingface.co/datasets/FreedomIntelligence/ApolloCorpus" target="_blank">ApolloCorpus</a> and <a href="https://huggingface.co/datasets/FreedomIntelligence/XMedbench" target="_blank">XMedBench</a> is published!🎉 * **[2024.01.23]** Apollo repo is published!🎉 ## Results <a href="https://huggingface.co/FreedomIntelligence/Apollo-0.5B" target="_blank">Apollo-0.5B</a> • 🤗 <a href="https://huggingface.co/FreedomIntelligence/Apollo-1.8B" target="_blank">Apollo-1.8B</a> • 🤗 <a href="https://huggingface.co/FreedomIntelligence/Apollo-2B" target="_blank">Apollo-2B</a> • 🤗 <a href="https://huggingface.co/FreedomIntelligence/Apollo-6B" target="_blank">Apollo-6B</a> • 🤗 <a href="https://huggingface.co/FreedomIntelligence/Apollo-7B" target="_blank">Apollo-7B</a> <details><summary>Click to expand</summary> ![Apollo](assets/result.png) </details> ## Data: Huge, Diverse, Clean, Multilingual ![Apollo](assets/dataset.png) ## Usage - [Zip File](https://huggingface.co/datasets/FreedomIntelligence/Medbase_data/blob/main/Medbase_data-datasets.zip) - [Data category](https://huggingface.co/datasets/FreedomIntelligence/Medbase_data/tree/main/train) - Pretrain: - json_name: {data_source}_\{language}_\{data_type}.json - data_type: medicalBook, medicalGuideline, medicalPaper, medicalWeb(from online forum), medicalWiki - language: en(English), zh(chinese), es(spanish), fr(french), hi(Hindi) - data_type: qa(generated qa from text) - data item: - data_type==text: list of string ``` [ "string1", "string2", ... ] ``` - data_type==qa: list of qa pairs(list of string) ``` [ [ "q1", "a1", "q2", "a2", ... ], ... ] ``` - SFT: - json_name: {data_source}_{language}.json - data_type: code, general, math, medicalExam, medicalPatient - data item: list of qa pairs(list of string) ``` [ [ "q1", "a1", "q2", "a2", ... ], ... ] ``` ## Citation ``` @misc{wang2024apollo, title={Apollo: Lightweight Multilingual Medical LLMs towards Democratizing Medical AI to 6B People}, author={Xidong Wang and Nuo Chen and Junyin Chen and Yan Hu and Yidong Wang and Xiangbo Wu and Anningzhe Gao and Xiang Wan and Haizhou Li and Benyou Wang}, year={2024}, eprint={2403.03640}, archivePrefix={arXiv}, primaryClass={cs.CL} } ```

# 多语言医学:模型、数据集、基准测试与代码 目前覆盖英语、中文、法语、印地语、西班牙语与阿拉伯语 <p align="center"> 👨🏻‍💻<a href="https://github.com/FreedomIntelligence/Apollo" target="_blank">Github仓库</a> •📃 <a href="https://arxiv.org/abs/2403.03640" target="_blank">研究论文</a> • 🌐 <a href="https://apollo.llmzoo.com/" target="_blank">在线演示</a> • 🤗 <a href="https://huggingface.co/datasets/FreedomIntelligence/ApolloCorpus" target="_blank">ApolloCorpus</a> • 🤗 <a href="https://huggingface.co/datasets/FreedomIntelligence/XMedbench" target="_blank">XMedBench</a> <br> <a href="./README_zh.md"> 中文文档 </a> | <a href="./README.md"> 英文文档 </a> </p> ![Apollo](assets/apollo_medium_final.png) ## 🌈 更新日志 * **[2024.03.07]** 研究论文[论文链接](https://arxiv.org/abs/2403.03640)正式发布。 * **[2024.02.12]** 数据集<a href="https://huggingface.co/datasets/FreedomIntelligence/ApolloCorpus" target="_blank">ApolloCorpus</a>与<a href="https://huggingface.co/datasets/FreedomIntelligence/XMedbench" target="_blank">XMedBench</a>正式上线!🎉 * **[2024.01.23]** Apollo代码仓库正式发布!🎉 ## 实验结果 🤗<a href="https://huggingface.co/FreedomIntelligence/Apollo-0.5B" target="_blank">Apollo-0.5B</a> • 🤗<a href="https://huggingface.co/FreedomIntelligence/Apollo-1.8B" target="_blank">Apollo-1.8B</a> • 🤗<a href="https://huggingface.co/FreedomIntelligence/Apollo-2B" target="_blank">Apollo-2B</a> • 🤗<a href="https://huggingface.co/FreedomIntelligence/Apollo-6B" target="_blank">Apollo-6B</a> • 🤗<a href="https://huggingface.co/FreedomIntelligence/Apollo-7B" target="_blank">Apollo-7B</a> <details><summary>点击展开详情</summary> ![Apollo](assets/result.png) </details> ## 数据集:海量、多元、洁净、多语言 ![Apollo](assets/dataset.png) ## 使用方法 - [数据集压缩包](https://huggingface.co/datasets/FreedomIntelligence/Medbase_data/blob/main/Medbase_data-datasets.zip) - [数据分类说明](https://huggingface.co/datasets/FreedomIntelligence/Medbase_data/tree/main/train) - 预训练(Pretrain): - 文件名格式:`{data_source}_{language}_{data_type}.json` - 数据类型(data_type):`medicalBook`(医学书籍)、`medicalGuideline`(医学指南)、`medicalPaper`(医学论文)、`medicalWeb`(在线论坛医学内容)、`medicalWiki`(医学维基);另有`qa`类型(基于文本生成的问答对) - 语言(language)可选:`en`(英语)、`zh`(中文)、`es`(西班牙语)、`fr`(法语)、`hi`(印地语) - 数据条目格式: - 当`data_type`为文本类时:字符串列表,格式如下: [ "string1", "string2", ... ] - 当`data_type`为`qa`时:嵌套式问答对列表,格式如下: [ [ "q1", "a1", "q2", "a2", ... ], ... ] - 监督微调(SFT): - 文件名格式:`{data_source}_{language}.json` - 数据类型可选:`code`(代码)、`general`(通用场景)、`math`(数学)、`medicalExam`(医学考试)、`medicalPatient`(医患对话) - 数据条目格式:嵌套式问答对列表,格式如下: [ [ "q1", "a1", "q2", "a2", ... ], ... ] ## 引用格式 @misc{wang2024apollo, title={Apollo: Lightweight Multilingual Medical LLMs towards Democratizing Medical AI to 6B People}, author={Xidong Wang and Nuo Chen and Junyin Chen and Yan Hu and Yidong Wang and Xiangbo Wu and Anningzhe Gao and Xiang Wan and Haizhou Li and Benyou Wang}, year={2024}, eprint={2403.03640}, archivePrefix={arXiv}, primaryClass={cs.CL} }
提供机构:
maas
创建时间:
2025-01-20
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作