NERetrieve
收藏arXiv2023-10-22 更新2024-06-21 收录
下载链接:
https://github.com/katzurik/NERetrieve
下载链接
链接失效反馈官方服务:
资源简介:
NERetrieve数据集由巴伊兰大学创建,包含约400万段英文维基百科段落,标记了500种实体类型的实体范围。数据集设计用于支持从细粒度监督识别到零样本全面检索的一系列任务,特别强调跨领域鲁棒性、细粒度、特定和交叉实体类型处理,以及从识别到检索的零样本设置扩展。该数据集旨在推动实体识别技术的发展,解决现有模型在处理复杂和特定实体类型时的局限性。
The NERetrieve dataset was developed by Bar-Ilan University. It comprises approximately 4 million English Wikipedia paragraphs, with entity spans annotated for 500 distinct entity types. The dataset is designed to support a spectrum of tasks ranging from fine-grained supervised entity recognition to zero-shot comprehensive retrieval, placing particular emphasis on cross-domain robustness, fine-grained processing, specific and cross-entity type handling, as well as the extension of zero-shot settings from recognition to retrieval. This dataset aims to advance the development of entity recognition technologies and address the limitations of existing models when processing complex and specialized entity types.
提供机构:
巴伊兰大学
创建时间:
2023-10-22



