hugmanskj/korean-ner
收藏Hugging Face2025-12-12 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/hugmanskj/korean-ner
下载链接
链接失效反馈官方服务:
资源简介:
这是一个用于韩语命名实体识别(NER)的合成数据集,专为自然语言处理实践设计的教育用途数据集。数据集包含韩语新闻标题风格的文本,采用BIO(Begin-Inside-Outside)标记方案,支持标记分类任务。数据集通过模板生成,包含人物(PER)、机构(ORG)、地点(LOC)和日期(DAT)等实体类型。数据集分为训练集(5000例)、验证集(500例)和测试集(500例),适用于深度学习/NLP初学者、韩语NLP模型训练和BERT微调实践。由于是合成数据,其文本多样性和复杂性可能不如真实文本,主要用于教育和实验目的。
This is a synthetic dataset for Korean Named Entity Recognition (NER), designed for educational purposes in natural language processing practice. The dataset includes text in the style of Korean news headlines, using the BIO (Begin-Inside-Outside) tagging scheme, and supports token classification tasks. The dataset is generated through templates and includes entity types such as Person (PER), Organization (ORG), Location (LOC), and Date (DAT). It is divided into training (5,000 examples), validation (500 examples), and test (500 examples) sets, suitable for deep learning/NLP beginners, Korean NLP model training, and BERT fine-tuning practice. As synthetic data, its text diversity and complexity may not match real texts, and it is primarily intended for educational and experimental purposes.
提供机构:
hugmanskj



