hugmanskj/korean-news-topic-classification

Name: hugmanskj/korean-news-topic-classification
Creator: hugmanskj
Published: 2025-12-12 02:55:25
License: 暂无描述

Hugging Face2025-12-12 更新2025-12-20 收录

下载链接：

https://hf-mirror.com/datasets/hugmanskj/korean-news-topic-classification

下载链接

链接失效反馈

官方服务：

资源简介：

这是一个用于韩语新闻主题分类的合成数据集，专为自然语言处理教育目的而设计。数据集包含新闻风格的韩语句子，分为四个类别：经济、社会、生活文化和IT科学。数据通过模板生成，每个类别有约20个模板和数十到数百个特定领域关键词。数据集分为训练集（5000条）、验证集（500条）和测试集（500条），每个类别均匀分布。该数据集旨在支持深度学习/NLP初学者的实践、韩语文本分类模型的训练和评估，以及预训练模型的微调练习。由于是合成数据，它可能缺乏真实新闻数据的复杂性和多样性，主要用于教育和实验目的。

This is a synthetic dataset for Korean news topic classification, designed for educational purposes in natural language processing. The dataset contains news-style Korean sentences categorized into four topics: economy, society, culture & lifestyle, and technology & science. Data is generated using templates, with approximately 20 templates per category and dozens to hundreds of domain-specific keywords. The dataset is divided into training (5,000 examples), validation (500 examples), and test (500 examples) sets, with equal distribution across categories. It is intended to support practice for deep learning/NLP beginners, training and evaluation of Korean text classification models, and fine-tuning exercises for pre-trained models. Being synthetic, it may lack the complexity and diversity of real news data and is primarily for educational and experimental purposes.

提供机构：

hugmanskj

5,000+

优质数据集

54 个

任务类型

进入经典数据集