MyeongHo0621/korean-quality-cleaned

Name: MyeongHo0621/korean-quality-cleaned
Creator: MyeongHo0621
Published: 2025-10-11 11:53:25
License: 暂无描述

Hugging Face2025-10-11 更新2025-10-25 收录

下载链接：

https://hf-mirror.com/datasets/MyeongHo0621/korean-quality-cleaned

下载链接

链接失效反馈

官方服务：

资源简介：

这是一个经过清理和标准化的韩国语 instruction 数据集，结合了多个高质量的开放源韩国语数据集，具有统一的格式和质量过滤。主要特征包括统一的格式、质量过滤、清洁的结构和即用性。数据集统计显示总样本数为 54,190，文件大小为 116.3 MB，格式为 JSON。数据来源包括 KULLM-v2 和 KoAlpaca。该数据集适用于各种 NLP 任务，如韩国语语言模型微调、指令微调、对话式 AI 训练、问答系统和通用韩国语 LLM 训练。

This is a cleaned and standardized Korean instruction dataset, combining multiple high-quality open-source Korean datasets with unified formatting and quality filtering. The main features include a unified format, quality filtering, a clean structure, and readiness for use. Dataset statistics show a total of 54,190 samples, a file size of 116.3 MB, and a format of JSON. The data sources include KULLM-v2 and KoAlpaca. This dataset is suitable for various NLP tasks such as Korean language model fine-tuning, instruction tuning, conversational AI training, question-answering systems, and general-purpose Korean LLM training.

提供机构：

MyeongHo0621

5,000+

优质数据集

54 个

任务类型

进入经典数据集