Slovene instruction-following dataset for large language models GaMS-Instruct-DH 1.0
收藏hdl.handle.net2025-03-25 收录
下载链接:
http://hdl.handle.net/11356/1975
下载链接
链接失效反馈官方服务:
资源简介:
GaMS-Instruct-DH is an instruction-following dataset designed to fine-tune Slovene large language models to follow instructions. It consists of pairs of prompts and responses, some of which contain an additional context field, as well as a field in which the source of the information included in the response is listed.
The dataset focuses on prompts from the field of digital humanities and museum documentation. Its primary goal is to provide a resource that allows existing large language models already available for the field of digital humanities to be expanded to cover Slovene and other similar, but less-resourced languages (e.g. Bosnian).
Version 1.0 include approx. 10,000 prompt-response pairs which were compiled entirely by hand by a team of linguists and experts from the field of digital humanities.
GaMS-Instruct-DH 是一款旨在微调斯洛文尼亚大语言模型以遵循指令的指令遵循数据集。该数据集由提示和响应的对组成,其中一些包含额外的上下文字段,以及一个列出响应中包含信息来源的字段。数据集聚焦于数字人文领域和博物馆文献领域的提示。其主要目标是为现有的数字人文领域大语言模型提供一个资源,使其能够扩展至涵盖斯洛文尼亚语以及其他类似但资源较少的语言(例如波斯尼亚语)。版本 1.0 包含约 10,000 个提示-响应对,这些对完全由一支由语言学家和数字人文领域的专家组成的团队手工编纂。
提供机构:
hdl.handle.net



