five

LampsteR/Asclepius-Synthetic-Clinical-Notes

收藏
Hugging Face2025-12-10 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/LampsteR/Asclepius-Synthetic-Clinical-Notes
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-nc-sa-4.0 task_categories: - question-answering - summarization - text-generation language: - en tags: - medical - synthetic pretty_name: 'Asclepius: Synthetic Clincal Notes & Instruction Dataset' size_categories: - 100K<n<1M --- # Asclepius: Synthetic Clincal Notes & Instruction Dataset ## Dataset Description - **Repository:** [Github](https://github.com/starmpcc/Asclepius) - **Paper:** https://arxiv.org/abs/2309.00237 ### Dataset Summary This dataset is official dataset for Asclepius [(arxiv)](https://arxiv.org/abs/2309.00237) This dataset is composed with Clinical Note - Question - Answer format to build a clinical LLMs. - We first synthesized synthetic notes from [PMC-Patients](https://huggingface.co/datasets/zhengyun21/PMC-Patients) case reports with GPT-3.5 - Then, we generate instruction-answer pairs for 157k synthetic discharge summaries ### Supported Tasks - This dataset covers below 8 tasks - Named Entity Recognition - Abbreviation Expansion - Relation Extraction - Temporal Information Extraction - Coreference Resolution - Paraphrasing - Summarization - Question Answering ### Languages English ## Dataset Structure ### Data Instances - `synthetic.csv` - Clinical Note - Question - Answer pairs ### Data Fields - `patient_id`: Unique case report id from PMC-Patients - `patient`: Case report text - `question`: GPT-3.5 generated instruction from patient. The used prompt can be checked on github. - `answer`: GPT-3.5 generated answer for given case report and question - `task`: Corresponding category of question. One of above listsed ## Dataset Creation ### Source Data [PMC-Patients](https://huggingface.co/datasets/zhengyun21/PMC-Patients) ### Annotations We used GPT-3.5-turbo (version 0314). You can check the prompts on our github. ## Additional Information ### Models - [Asclepius-7B](https://huggingface.co/starmpcc/Asclepius-7B) - [Asclepius-13B](https://huggingface.co/starmpcc/Asclepius-13B) - [Asclepius-Llama2-7B](https://huggingface.co/starmpcc/Asclepius-Llama2-7B) - [Asclepius-Llama2-13B](https://huggingface.co/starmpcc/Asclepius-Llama2-13B) - [Asclepius-Llama3-8B](https://huggingface.co/starmpcc/Asclepius-Llama3-8B) - [Asclepius-Mistral-7B-v0.3](https://huggingface.co/starmpcc/Asclepius-Mistral-7B-v0.3) ### Variants - The instruction-answer pairs generated from MIMIC-III discharge summaries and the models trained with them are now available on [Physionet](https://physionet.org/content/asclepius-r/1.0.0/)! ### Licensing Information CC-BY-NC-SA 4.0 ### Citation Information ``` @misc{kweon2023publicly, title={Publicly Shareable Clinical Large Language Model Built on Synthetic Clinical Notes}, author={Sunjun Kweon and Junu Kim and Jiyoun Kim and Sujeong Im and Eunbyeol Cho and Seongsu Bae and Jungwoo Oh and Gyubok Lee and Jong Hak Moon and Seng Chan You and Seungjin Baek and Chang Hoon Han and Yoon Bin Jung and Yohan Jo and Edward Choi}, year={2023}, eprint={2309.00237}, archivePrefix={arXiv}, primaryClass={cs.CL} } ```
提供机构:
LampsteR
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作