ctu-aic/cs_instruction_tuning_collection
收藏Hugging Face2025-06-15 更新2025-02-15 收录
下载链接:
https://hf-mirror.com/datasets/ctu-aic/cs_instruction_tuning_collection
下载链接
链接失效反馈官方服务:
资源简介:
这是一个针对捷克语LLM指令微调的数据集集合。它由布拉格捷克技术大学人工智能中心策划,包含捷克语(cs,ces)数据,并遵循cc-by-nc-4.0许可。数据来源于多个资源,包括MURI-IT、Bactrian-X、OASST-2、ASK LIBRARY和QUESTIONS UJC CAS。该数据集旨在用于LLM的指令微调,以提高对捷克语言的知识。数据集包括原始ID、对话、来源、指令类型、指令是否翻译、输出类型和输出是否翻译等字段。数据集包含底层数据集的偏差、风险和限制。
This is a collection of datasets for Czech LLM instruction tuning. Curated by the Artificial Intelligence Center, FEE, CTU in Prague, it contains Czech (cs, ces) data and is licensed under cc-by-nc-4.0. The data sources include MURI-IT, Bactrian-X, OASST-2, ASK LIBRARY, and QUESTIONS UJC CAS. The dataset is intended for instruction tuning of LLMs to improve knowledge of the Czech language. The dataset includes fields such as original_id, conversations, origin, instruction_type, instruction_translated, output_type, and output_translated. The dataset carries the biases, risks, and limitations of the underlying datasets.
提供机构:
ctu-aic



