sh2orc/bccard-maywell-jojo0217-markai-lcw99-kendamarron-microsoft
收藏Hugging Face2024-08-19 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/sh2orc/bccard-maywell-jojo0217-markai-lcw99-kendamarron-microsoft
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是一个韩语数据集,包含多个子数据集的合并,涵盖了问答、金融、商业、维基百科、数学问题等多种领域。数据集的特征包括指令和输出,主要用于训练和评估自然语言处理模型。
This dataset is a multilingual dataset primarily containing Korean content, encompassing various sub-datasets such as Korean Wikidata QA, Korean RLHF dataset, BCCard Finance Korean QnA, Korean Commercial Dataset, Korean Wikipedia 20240501 1 million QnA, GPT4 evolution dataset, Jimba Wiki instruction dataset, Orca math word problems dataset, and WizardLM Orca dataset. The main features of the dataset are instruction and output, both of string type. The dataset is divided into a training set containing 1706579 samples, with a total size of 1347152241 bytes and a download size of 709953599 bytes.
提供机构:
sh2orc



