five

Nerwip Corpus

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/6815049
下载链接
链接失效反馈
官方服务:
资源简介:
Description. This corpus contains 408 Wikipedia articles. Those are biographies, manually annotated to highlight entities of the following types: Dates, Locations, Organizations and Persons. It was designed to be used by our tool Nerwip, in order to evaluate and compare existing NER tools on biographic data. The other files are NER tools-related data (models, dictionaries, etc.), needed by Nerwip to detect entities. If you want to use the tool, you need to unzip these files as explained in the README file associated to Nerwip on GitHub. It was constituted by Burcu Küpelioğlu during her end of study project, and then cleaned and corrected by Samet Atdağ during his MSc, to get a total of 250 articles (v3). Vincent Labatut then completed it further, to reach 408 articles (v4). Source code. The source code of our tool Nerwip is available online: https://github.com/CompNet/nerwip License. The dataset is shared under a Creative Commons 0 license. Citation. If you use this corpus, please cite the following article: A Comparison of Named Entity Recognition Tools Applied to Biographical Texts, S. Atdağ & V. Labatut, 2013. ⟨hal-00849797⟩ - DOI: 10.1109/IcConSCS.2013.6632052 @InProceedings{Atdag2013,  author    = {Atdağ, Samet and Labatut, Vincent},  title     = {A Comparison of Named Entity Recognition Tools Applied to Biographical Texts},  booktitle = {2\textsuperscript{nd} International Conference on Systems and Computer Science},  year      = {2013},  pages     = {228-233},  address   = {Lille, FR},  publisher = {IEEE Publishing},  doi       = {10.1109/IcConSCS.2013.6632052},}
创建时间:
2024-10-01
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作