Training and development dataset for information extraction in plant epidemiomonitoring
收藏DataCite Commons2025-05-16 更新2025-04-16 收录
下载链接:
https://entrepot.recherche.data.gouv.fr/citation?persistentId=doi:10.57745/ZDNOGF
下载链接
链接失效反馈官方服务:
资源简介:
The “Training and development dataset for information extraction in plant epidemiomonitoring” is the annotation set of the “Corpus for the epidemiomonitoring of plant”. The annotations include seven entity types (e.g. species, locations, disease), their normalisation by the NCBI taxonomy and GeoNames and binary (seven) and ternary relationships. The annotations refer to character positions within the documents of the corpus. The annotation guidelines give their definitions and representative examples. Both datasets are intended for the training and validation of information extraction methods.
「植物疫情监测信息抽取训练与开发数据集」为「植物疫情监测语料库」的标注集合。该标注涵盖7类实体(如物种、地点、病害),并通过NCBI分类法(NCBI taxonomy)与GeoNames地名数据库完成实体归一化,同时包含7类二元关系与三元关系。标注对应语料库各文档内的字符位置。标注指南对各类标注的定义与典型示例进行了说明。上述两个数据集均用于信息抽取方法的训练与验证。
提供机构:
Recherche Data Gouv
创建时间:
2025-01-27



