Benchmark data and models for fine-grained geographic NER with few-shot learning
收藏DataCite Commons2025-05-07 更新2025-09-08 收录
下载链接:
https://figshare.com/articles/dataset/Benchmark_data_and_models_for_fine-grained_geographic_NER_with_few-shot_learning/25559193
下载链接
链接失效反馈官方服务:
资源简介:
Geographic Named Entity Recognition (GNER) focuses on extracting geographic entity names from text and classifying them into pre-defined categories. Previous methods have not paid much attention to identifying fine-grained categories of geographic entities in sparse data situations, thus remaining limited in serving various geographic applications. To address this limitation, this paper presents a fine-grained GNER task and proposes a fine-grained GNER model, LH-FGNER, which incorporates a prototype network and hierarchical contrastive learning to improve fine-grained GNER. Specifically, the model designs label-guided sentence-level prototypes to capture the contextual semantics of geographic entities. It introduces a hierarchy tree to guide the construction of prototypes in vector space, which utilizes the hierarchy as a priori knowledge to improve the discrimination of fine-grained categories. In addition, two datasets are constructed to support the study of the fine-grained GNER task. Experimental results show that the proposed model is superior to the baseline and is robust. This work provides a methodological reference for few-shot GNER, which can be used to facilitate various geographic applications with text.The materials include the following:data: Data from the preparation process.pre_model: Model storage files from the experimental process of the paper, which allow for direct testing without the need for retraining the model.LH-FGNER: Code for the method proposed in the paper (LH-FGNER), along with code for related experimental discussions, non-open-source baselines, etc. It also includes environment files for reproduction and a README file for guidance.
地理命名实体识别(Geographic Named Entity Recognition,简称GNER)旨在从文本中提取地理实体名称,并将其归类至预定义类别。现有方法鲜有关注稀疏数据场景下地理实体的细粒度类别识别,因此在支撑各类地理应用时存在局限。为解决该局限,本文提出细粒度GNER任务,并构建细粒度地理命名实体识别模型LH-FGNER,该模型融合原型网络与层级对比学习方法以优化细粒度GNER任务。具体而言,该模型设计标签引导的句子级原型以捕捉地理实体的上下文语义;引入层级树指导向量空间内原型的构建,利用层级结构作为先验知识,提升细粒度类别的区分度。此外,本文构建了两个数据集以支撑细粒度GNER任务的研究。实验结果表明,所提模型优于基线模型且具备良好鲁棒性。本研究为少样本地理命名实体识别提供了方法论参考,可用于依托文本赋能各类地理应用。本研究附带的材料如下:
data:数据预处理流程中产出的数据集;
pre_model:本文实验环节生成的模型存储文件,可直接用于测试,无需重新训练模型;
LH-FGNER:本文所提方法(LH-FGNER)的代码,包含相关实验讨论代码、非开源基线模型代码等,同时附带复现所需的环境配置文件与指导用README文档。
提供机构:
figshare
创建时间:
2025-05-07



