ApolloCorpus
收藏魔搭社区2025-12-04 更新2025-01-25 收录
下载链接:
https://modelscope.cn/datasets/FreedomIntelligence/ApolloCorpus
下载链接
链接失效反馈官方服务:
资源简介:
# Multilingual Medicine: Model, Dataset, Benchmark, Code
Covering English, Chinese, French, Hindi, Spanish, Hindi, Arabic So far
<p align="center">
👨🏻💻<a href="https://github.com/FreedomIntelligence/Apollo" target="_blank">Github</a> •📃 <a href="https://arxiv.org/abs/2403.03640" target="_blank">Paper</a> • 🌐 <a href="https://apollo.llmzoo.com/" target="_blank">Demo</a> • 🤗 <a href="https://huggingface.co/datasets/FreedomIntelligence/ApolloCorpus" target="_blank">ApolloCorpus</a> • 🤗 <a href="https://huggingface.co/datasets/FreedomIntelligence/XMedbench" target="_blank">XMedBench</a>
<br> <a href="./README_zh.md"> 中文 </a> | <a href="./README.md"> English
</p>

## 🌈 Update
* **[2024.03.07]** [Paper](https://arxiv.org/abs/2403.03640) released.
* **[2024.02.12]** <a href="https://huggingface.co/datasets/FreedomIntelligence/ApolloCorpus" target="_blank">ApolloCorpus</a> and <a href="https://huggingface.co/datasets/FreedomIntelligence/XMedbench" target="_blank">XMedBench</a> is published!🎉
* **[2024.01.23]** Apollo repo is published!🎉
## Results
<a href="https://huggingface.co/FreedomIntelligence/Apollo-0.5B" target="_blank">Apollo-0.5B</a> • 🤗 <a href="https://huggingface.co/FreedomIntelligence/Apollo-1.8B" target="_blank">Apollo-1.8B</a> • 🤗 <a href="https://huggingface.co/FreedomIntelligence/Apollo-2B" target="_blank">Apollo-2B</a> • 🤗 <a href="https://huggingface.co/FreedomIntelligence/Apollo-6B" target="_blank">Apollo-6B</a> • 🤗 <a href="https://huggingface.co/FreedomIntelligence/Apollo-7B" target="_blank">Apollo-7B</a>
<details><summary>Click to expand</summary>

</details>
## Data: Huge, Diverse, Clean, Multilingual

## Usage
- [Zip File](https://huggingface.co/datasets/FreedomIntelligence/Medbase_data/blob/main/Medbase_data-datasets.zip)
- [Data category](https://huggingface.co/datasets/FreedomIntelligence/Medbase_data/tree/main/train)
- Pretrain:
- json_name: {data_source}_\{language}_\{data_type}.json
- data_type: medicalBook, medicalGuideline, medicalPaper, medicalWeb(from online forum), medicalWiki
- language: en(English), zh(chinese), es(spanish), fr(french), hi(Hindi)
- data_type: qa(generated qa from text)
- data item:
- data_type==text: list of string
```
[
"string1",
"string2",
...
]
```
- data_type==qa: list of qa pairs(list of string)
```
[
[
"q1",
"a1",
"q2",
"a2",
...
],
...
]
```
- SFT:
- json_name: {data_source}_{language}.json
- data_type: code, general, math, medicalExam, medicalPatient
- data item: list of qa pairs(list of string)
```
[
[
"q1",
"a1",
"q2",
"a2",
...
],
...
]
```
## Citation
```
@misc{wang2024apollo,
title={Apollo: Lightweight Multilingual Medical LLMs towards Democratizing Medical AI to 6B People},
author={Xidong Wang and Nuo Chen and Junyin Chen and Yan Hu and Yidong Wang and Xiangbo Wu and Anningzhe Gao and Xiang Wan and Haizhou Li and Benyou Wang},
year={2024},
eprint={2403.03640},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```
# 多语言医学:模型、数据集、基准测试与代码
目前覆盖英语、中文、法语、印地语、西班牙语与阿拉伯语
<p align="center">
👨🏻💻<a href="https://github.com/FreedomIntelligence/Apollo" target="_blank">Github仓库</a> •📃 <a href="https://arxiv.org/abs/2403.03640" target="_blank">研究论文</a> • 🌐 <a href="https://apollo.llmzoo.com/" target="_blank">在线演示</a> • 🤗 <a href="https://huggingface.co/datasets/FreedomIntelligence/ApolloCorpus" target="_blank">ApolloCorpus</a> • 🤗 <a href="https://huggingface.co/datasets/FreedomIntelligence/XMedbench" target="_blank">XMedBench</a>
<br> <a href="./README_zh.md"> 中文文档 </a> | <a href="./README.md"> 英文文档 </a>
</p>

## 🌈 更新日志
* **[2024.03.07]** 研究论文[论文链接](https://arxiv.org/abs/2403.03640)正式发布。
* **[2024.02.12]** 数据集<a href="https://huggingface.co/datasets/FreedomIntelligence/ApolloCorpus" target="_blank">ApolloCorpus</a>与<a href="https://huggingface.co/datasets/FreedomIntelligence/XMedbench" target="_blank">XMedBench</a>正式上线!🎉
* **[2024.01.23]** Apollo代码仓库正式发布!🎉
## 实验结果
🤗<a href="https://huggingface.co/FreedomIntelligence/Apollo-0.5B" target="_blank">Apollo-0.5B</a> • 🤗<a href="https://huggingface.co/FreedomIntelligence/Apollo-1.8B" target="_blank">Apollo-1.8B</a> • 🤗<a href="https://huggingface.co/FreedomIntelligence/Apollo-2B" target="_blank">Apollo-2B</a> • 🤗<a href="https://huggingface.co/FreedomIntelligence/Apollo-6B" target="_blank">Apollo-6B</a> • 🤗<a href="https://huggingface.co/FreedomIntelligence/Apollo-7B" target="_blank">Apollo-7B</a>
<details><summary>点击展开详情</summary>

</details>
## 数据集:海量、多元、洁净、多语言

## 使用方法
- [数据集压缩包](https://huggingface.co/datasets/FreedomIntelligence/Medbase_data/blob/main/Medbase_data-datasets.zip)
- [数据分类说明](https://huggingface.co/datasets/FreedomIntelligence/Medbase_data/tree/main/train)
- 预训练(Pretrain):
- 文件名格式:`{data_source}_{language}_{data_type}.json`
- 数据类型(data_type):`medicalBook`(医学书籍)、`medicalGuideline`(医学指南)、`medicalPaper`(医学论文)、`medicalWeb`(在线论坛医学内容)、`medicalWiki`(医学维基);另有`qa`类型(基于文本生成的问答对)
- 语言(language)可选:`en`(英语)、`zh`(中文)、`es`(西班牙语)、`fr`(法语)、`hi`(印地语)
- 数据条目格式:
- 当`data_type`为文本类时:字符串列表,格式如下:
[
"string1",
"string2",
...
]
- 当`data_type`为`qa`时:嵌套式问答对列表,格式如下:
[
[
"q1",
"a1",
"q2",
"a2",
...
],
...
]
- 监督微调(SFT):
- 文件名格式:`{data_source}_{language}.json`
- 数据类型可选:`code`(代码)、`general`(通用场景)、`math`(数学)、`medicalExam`(医学考试)、`medicalPatient`(医患对话)
- 数据条目格式:嵌套式问答对列表,格式如下:
[
[
"q1",
"a1",
"q2",
"a2",
...
],
...
]
## 引用格式
@misc{wang2024apollo,
title={Apollo: Lightweight Multilingual Medical LLMs towards Democratizing Medical AI to 6B People},
author={Xidong Wang and Nuo Chen and Junyin Chen and Yan Hu and Yidong Wang and Xiangbo Wu and Anningzhe Gao and Xiang Wan and Haizhou Li and Benyou Wang},
year={2024},
eprint={2403.03640},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
提供机构:
maas
创建时间:
2025-01-20



