nlp-kmu/kor_ner

Name: nlp-kmu/kor_ner
Creator: nlp-kmu
Published: 2024-01-18 11:07:39
License: 暂无描述

Hugging Face2024-01-18 更新2024-06-15 收录

下载链接：

https://hf-mirror.com/datasets/nlp-kmu/kor_ner

下载链接

链接失效反馈

官方服务：

资源简介：

KorNER数据集是一个用于命名实体识别（NER）任务的韩语单语数据集，包含文本、注释文本、词元、词性标签和命名实体识别标签等字段。数据集的规模在1K到10K之间，采用MIT许可证。数据集的创建者未明确说明，但标注是由专家生成的。数据集的结构包括训练集、测试集和验证集，分别包含2928、366和366个样本。

The KorNER dataset is a Korean monolingual dataset for named entity recognition (NER) tasks, which includes fields such as text, annotated text, tokens, part-of-speech tags, and named entity recognition tags. It has a sample size ranging from 1,000 to 10,000 and is released under the MIT License. The creator of the dataset is not explicitly specified, but the annotations were generated by experts. The dataset is structured into training, test, and validation splits, with 2928, 366, and 366 samples respectively.

提供机构：

nlp-kmu

原始信息汇总

数据集概述

基本信息

数据集名称: KorNER
语言: 韩语 (ko)
许可证: MIT
多语言性: 单语种
数据集大小: 1K<n<10K
源数据: 原始数据
任务类别: 词性标注 (token-classification)
任务ID: 命名实体识别 (named-entity-recognition)

数据结构

数据字段

text: 完整文本，字符串类型
annot_text: 包含词性标注信息的注释文本，字符串类型
tokens: 从完整文本中提取的有序词列表，字符串序列
pos_tags: 每个词的词性标签，字符串序列
ner_tags: 每个词的命名实体识别标签，字符串序列

词性标签 (pos_tags)

标签列表: [SO, SS, VV, XR, VCP, JC, VCN, JKB, MM, SP, XSN, SL, NNP, NP, EP, JKQ, IC, XSA, EC, EF, SE, XPN, ETN, SH, XSV, MAG, SW, ETM, JKO, NNB, MAJ, NNG, JKV, JKC, VA, NR, JKG, VX, SF, JX, JKS, SN]

命名实体识别标签 (ner_tags)

标签列表: ["I", "O", "B_OG", "B_TI", "B_LC", "B_DT", "B_PS"]
标签含义:
- B: 短语的第一个词
- I: 非初始词
- OG: 组织
- TI: 时间
- DT: 日期
- PS: 人物

数据分割

训练集: 2928个样本，3948938字节
测试集: 366个样本，476850字节
验证集: 366个样本，486178字节

数据集大小

下载大小: 3493175字节
数据集大小: 4911966字节

5,000+

优质数据集

54 个

任务类型

进入经典数据集