KIND (Kessler Italian Named-entities Dataset)

Name: KIND (Kessler Italian Named-entities Dataset)
Creator: OpenDataLab
Published: 2026-05-31 08:30:18
License: 暂无描述

OpenDataLab2026-05-31 更新2024-05-09 收录

下载链接：

https://opendatalab.org.cn/OpenDataLab/KIND

下载链接

链接失效反馈

官方服务：

资源简介：

KIND 是用于命名实体识别的意大利数据集。它包含超过一百万个标记，其注释涵盖三个类别：人员、位置和组织。大部分数据集（大约 600K 代币）包含三个不同领域的人工黄金注释：新闻、文学和政治话语。对于数据集的构建，我们决定使用公开可用的文本，并获得允许研究和商业的许可使用。特别是我们发布了四个章节，其中的文本来自： (i) Wikinews (WN) 作为过去几十年的新闻文本的来源； (ii) 一些公共领域的意大利小说书籍 (FIC)； (iii) 意大利政治家 Aldo Moro (AM) 和 (iv) Alcide De Gasperi (ADG) 的著作和演讲。

KIND is an Italian dataset for named entity recognition (NER). It contains over one million annotated tokens, with annotations covering three categories: person, location, and organization. The majority of the dataset (approximately 600K tokens) includes human-curated gold standard annotations from three distinct domains: news, literature, and political discourse. For the construction of this dataset, we utilized publicly available texts and obtained permissions for both research and commercial use. Specifically, we have released four subsets with texts sourced from: (i) Wikinews (WN) as a source of news texts spanning the past several decades; (ii) a collection of Italian public-domain fictional books (FIC); (iii) the works and speeches of Italian politician Aldo Moro (AM); and (iv) the works and speeches of Alcide De Gasperi (ADG).

提供机构：

OpenDataLab

创建时间：

2022-06-23

搜集汇总

数据集介绍