Slovene instruction-following dataset for large language models GaMS-Instruct-MED 1.0

Name: Slovene instruction-following dataset for large language models GaMS-Instruct-MED 1.0
Creator: hdl.handle.net
License: 暂无描述

hdl.handle.net2025-01-09 收录

下载链接：

http://hdl.handle.net/11356/1982

下载链接

链接失效反馈

官方服务：

资源简介：

GaMS-Instruct-MED is an instruction-following dataset designed to fine-tune Slovene large language models to follow instructions in the medical domain. It consists of pairs of prompts and responses from the field of medicine, particularly those pertaining to the use of pharmaceutical drugs and medications. The dataset was generated in several steps. After consulting with experts from the medical field, a series of prompts was manually compiled containing questions interesting in the context of drug and medication use. For each medication in the PoVeJMo-VeMo-Med 1.0 dataset (http://hdl.handle.net/11356/1983), approximately 10-15 questions were automatically generated using prompt tuning. The questions followed the context of the instructions of use for the medication in question. Inadequate questions were manually excluded, while the responses were generated entirely automatically using a specialized RAG system. Please note that the current version of the dataset (containing 18,897 prompt-response pairs) does not guarantee clinical accuracy and may contain errors as a consequence of LLM hallucinations.

GaMS-Instruct-MED 是一款旨在微调斯洛文尼亚大语言模型以遵循医疗领域指令的数据集。该数据集由医学领域的提示和响应对组成，尤其关注药物和药品的使用。数据集的生成经历了多个步骤。在咨询医学领域的专家后，人工编制了一系列包含药物和药品使用情境下感兴趣问题的提示。对于 PoVeJMo-VeMo-Med 1.0 数据集中的每种药物（http://hdl.handle.net/11356/1983），通过提示微调自动生成了大约 10-15 个问题。这些问题遵循了特定药物使用说明的上下文。不恰当的问题被手动排除，而响应则完全由专门的 RAG 系统自动生成。请注意，当前版本的数据集（包含 18,897 个提示-响应对）不保证临床准确性，并可能由于大语言模型的幻觉而包含错误。

提供机构：

hdl.handle.net

5,000+

优质数据集

54 个

任务类型

进入经典数据集