LVSTCK/sft-mk
收藏Hugging Face2025-07-06 更新2025-02-15 收录
下载链接:
https://hf-mirror.com/datasets/LVSTCK/sft-mk
下载链接
链接失效反馈官方服务:
资源简介:
这个数据集包含了大约10万多个样本,涵盖多个类别,主要用于问题回答、聊天式对话、推理、论文和代码等领域的任务。数据集由多个马其顿语数据集组成,包括翻译和合成的数据,部分数据由人工注释者增强。数据来源包括saillab的alpaca-macedonian-cleaned、LVSTCK的ultrachat-sft-mk、trajkovnikola的Capybara-mk和databricks-dolly-15k-mk,以及LVSTCK的Open-Platypus-MK等。还有自定义的合成数据,包括关于马其顿的问题回答、一般性问题回答、代码问题回答和论文。
This dataset contains about 100,000 samples across multiple categories, mainly used for tasks such as question answering, chat-like conversations, reasoning, essays, and code. The dataset is composed of several Macedonian datasets, including translated and synthetic data, some of which have been enhanced by human annotators. Sources include saillabs alpaca-macedonian-cleaned, LVSTCKs ultrachat-sft-mk, trajkovnikolas Capybara-mk and databricks-dolly-15k-mk, as well as LVSTCKs Open-Platypus-MK. There is also custom synthetic data, including QA about Macedonia, general QA, code QA, and essays, etc.
提供机构:
LVSTCK



