five

xz97/MedInstruct

收藏
Hugging Face2023-10-24 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/xz97/MedInstruct
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-nc-4.0 language: - en tags: - instruction-finetuning pretty_name: MedInstruct task_categories: - text-generation size_categories: - 10K<n<100K --- # Dataset Card for MedInstruct ## Dataset Description - **Homepage:** - **Repository:** https://github.com/XZhang97666/AlpaCare - **Paper:** - **Point of Contact:** [Xinlu Zhang](xinluzhang@ucsb,edu) ### Dataset Summary **MedInstruct** encompasses: 1. **MedInstruct-52k**: A dataset comprising 52,000 medical instructions and responses. Instructions are crafted by OpenAI's GPT-4 engine, and the responses are formulated by the GPT-3.5-turbo engine. 2. **MedInstruct-test**: A set of 217 clinical craft free-form instruction evaluation tests. 3. **med_seed**: The clinician-crafted seed set as a denomination to prompt GPT-4 for task generation. **MedInstruct-52k** can be used to conduct instruction-tuning of language models, aiming to enhance their instruction-following capability and medical proficiency. \ On the other hand, **MedInstruct-test** can sever as an evaluator of models' instruction-following ability in the medical area. Specifically, the **MedInstruct-52k** dataset was obtained by: *Instructions*: - Beginning with a medical task seed set that contains 167 clinical-craft clinical tasks. - Utilizing the GPT-4 engine for instruction data generation. - Crafting a [prompt](https://github.com/XZhang97666/AlpaCare/blob/master/prompt/instruction_prompt.txt) to serve as the medical instruction generation guidelines for GPT-4. - To reduce costs in task generation and introduce greater randomness, 12 instructions are generated at once. This is followed by the selection of 3 random seed instructions from the seed set. *Responses*: - The GPT-3.5-turbo engine is used to synthesize responses, processing each instruction individually. Here's an illustration of an instruction-response pair: ```json { instruction: "Rewrite this instruction on how to use an inhaler to be more patient-friendly and less technical." input: "Exhale deeply, engage mouthpiece of inhaler, depress canister while simultaneously inhaling long and deep, hold the breath for ten seconds, then gently exhale" output: "Here's a more patient-friendly version of the instruction on how to use an inhaler: 1. Take a deep breath and completely breathe out. 2. Put the mouthpiece of the inhaler into your mouth and seal your lips around it properly. 3. Press down the canister while breathing in slowly and deeply at the same time. 4. Hold your breath for ten seconds. 5. Finally, breathe out gently." } ``` ## Citation: If you think it is a useful repo, please cite the paper: ``` @misc{zhang2023alpacareinstructiontuned, title={AlpaCare:Instruction-tuned Large Language Models for Medical Application}, author={Xinlu Zhang and Chenxin Tian and Xianjun Yang and Lichang Chen and Zekun Li and Linda Ruth Petzold}, year={2023}, eprint={2310.14558}, archivePrefix={arXiv}, primaryClass={cs.CL} } ```
提供机构:
xz97
原始信息汇总

数据集卡片 for MedInstruct

数据集描述

数据集概述

MedInstruct 包含以下内容:

  1. MedInstruct-52k: 一个包含52,000条医疗指令和响应的数据集。指令由OpenAI的GPT-4引擎生成,响应由GPT-3.5-turbo引擎生成。
  2. MedInstruct-test: 一组217个临床自由形式指令评估测试。
  3. med_seed: 作为提示GPT-4进行任务生成的临床医生手工制作的种子集。

MedInstruct-52k 可用于进行语言模型的指令微调,旨在增强其遵循指令的能力和医疗专业知识。另一方面,MedInstruct-test 可作为评估模型在医疗领域遵循指令能力的工具。

具体来说,MedInstruct-52k 数据集是通过以下方式获得的:

指令

  • 从一个包含167个临床任务的医疗任务种子集开始。
  • 利用GPT-4引擎进行指令数据生成。
  • 制作一个提示作为GPT-4的医疗指令生成指南。
  • 为了降低任务生成成本并引入更大的随机性,一次生成12条指令,然后从种子集中随机选择3条种子指令。

响应

  • 使用GPT-3.5-turbo引擎合成响应,逐条处理每条指令。

以下是一个指令-响应对的示例:

json { "instruction": "Rewrite this instruction on how to use an inhaler to be more patient-friendly and less technical.", "input": "Exhale deeply, engage mouthpiece of inhaler, depress canister while simultaneously inhaling long and deep, hold the breath for ten seconds, then gently exhale", "output": "Heres a more patient-friendly version of the instruction on how to use an inhaler:

  1. Take a deep breath and completely breathe out.
  2. Put the mouthpiece of the inhaler into your mouth and seal your lips around it properly.
  3. Press down the canister while breathing in slowly and deeply at the same time.
  4. Hold your breath for ten seconds. 5. Finally, breathe out gently." }

引用

如果你认为这是一个有用的资源,请引用论文:

@misc{zhang2023alpacareinstructiontuned, title={AlpaCare:Instruction-tuned Large Language Models for Medical Application}, author={Xinlu Zhang and Chenxin Tian and Xianjun Yang and Lichang Chen and Zekun Li and Linda Ruth Petzold}, year={2023}, eprint={2310.14558}, archivePrefix={arXiv}, primaryClass={cs.CL} }

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作