MyeongHo0621/korean-quality-cleaned
收藏Hugging Face2025-10-11 更新2025-10-25 收录
下载链接:
https://hf-mirror.com/datasets/MyeongHo0621/korean-quality-cleaned
下载链接
链接失效反馈官方服务:
资源简介:
这是一个经过清理和标准化的韩国语 instruction 数据集,结合了多个高质量的开放源韩国语数据集,具有统一的格式和质量过滤。主要特征包括统一的格式、质量过滤、清洁的结构和即用性。数据集统计显示总样本数为 54,190,文件大小为 116.3 MB,格式为 JSON。数据来源包括 KULLM-v2 和 KoAlpaca。该数据集适用于各种 NLP 任务,如韩国语语言模型微调、指令微调、对话式 AI 训练、问答系统和通用韩国语 LLM 训练。
This is a cleaned and standardized Korean instruction dataset, combining multiple high-quality open-source Korean datasets with unified formatting and quality filtering. The main features include a unified format, quality filtering, a clean structure, and readiness for use. Dataset statistics show a total of 54,190 samples, a file size of 116.3 MB, and a format of JSON. The data sources include KULLM-v2 and KoAlpaca. This dataset is suitable for various NLP tasks such as Korean language model fine-tuning, instruction tuning, conversational AI training, question-answering systems, and general-purpose Korean LLM training.
提供机构:
MyeongHo0621



