KIND
收藏arXiv2022-06-14 更新2024-06-21 收录
下载链接:
https://github.com/dhfbk/KIND
下载链接
链接失效反馈官方服务:
资源简介:
KIND是一个意大利多领域命名实体识别数据集,由Bruno Kessler基金会创建。该数据集包含超过一百万个标记,其中约60万标记为人工黄金标注,涵盖新闻、文学和政治演讲三个领域。KIND的主要优势在于其多领域特性,覆盖不同风格和语言使用,是目前最大的意大利语命名实体识别数据集。数据集文本和标注可自由从GitHub仓库下载,适用于意大利语命名实体识别系统的训练。
KIND is an Italian multi-domain named entity recognition (NER) dataset created by the Bruno Kessler Foundation. It comprises over one million annotated tokens, among which approximately 600,000 are manually gold-standard annotated. The dataset covers three domains: news, literature, and political speeches. The core advantage of KIND lies in its multi-domain characteristics, covering diverse writing styles and language usage patterns, making it the largest Italian NER dataset currently available. The dataset texts and annotations can be freely downloaded from its GitHub repository, and it is suitable for training Italian NER systems.
提供机构:
Bruno Kessler基金会 – Via Sommarive 18, Trento, Italy
创建时间:
2021-12-30



