ctu-aic/cs_instruction_tuning_collection

Name: ctu-aic/cs_instruction_tuning_collection
Creator: ctu-aic
Published: 2025-06-15 11:37:02
License: 暂无描述

Hugging Face2025-06-15 更新2025-02-15 收录

下载链接：

https://hf-mirror.com/datasets/ctu-aic/cs_instruction_tuning_collection

下载链接

链接失效反馈

官方服务：

资源简介：

这是一个针对捷克语LLM指令微调的数据集集合。它由布拉格捷克技术大学人工智能中心策划，包含捷克语（cs，ces）数据，并遵循cc-by-nc-4.0许可。数据来源于多个资源，包括MURI-IT、Bactrian-X、OASST-2、ASK LIBRARY和QUESTIONS UJC CAS。该数据集旨在用于LLM的指令微调，以提高对捷克语言的知识。数据集包括原始ID、对话、来源、指令类型、指令是否翻译、输出类型和输出是否翻译等字段。数据集包含底层数据集的偏差、风险和限制。

This is a collection of datasets for Czech LLM instruction tuning. Curated by the Artificial Intelligence Center, FEE, CTU in Prague, it contains Czech (cs, ces) data and is licensed under cc-by-nc-4.0. The data sources include MURI-IT, Bactrian-X, OASST-2, ASK LIBRARY, and QUESTIONS UJC CAS. The dataset is intended for instruction tuning of LLMs to improve knowledge of the Czech language. The dataset includes fields such as original_id, conversations, origin, instruction_type, instruction_translated, output_type, and output_translated. The dataset carries the biases, risks, and limitations of the underlying datasets.

提供机构：

ctu-aic

5,000+

优质数据集

54 个

任务类型

进入经典数据集