DFKI-SLT/copious
收藏Hugging Face2025-10-17 更新2025-10-25 收录
下载链接:
https://hf-mirror.com/datasets/DFKI-SLT/copious
下载链接
链接失效反馈官方服务:
资源简介:
COPIOUS语料库(Corpus of named entities towards extracting Species Occurrence)是一个针对生物多样性领域的命名实体提取任务而创建的金标准注释语料库。该语料库包含了五个关键实体类别:物种(Taxon)、地理位置(Geographical Location)、栖息地(Habitat)、时间表达式(Temporal Expression)和人名(Person)。它由668篇文档组成,这些文档是从生物多样性遗产图书馆(Biodiversity Heritage Library)下载的超过169,000篇英文页面中随机选取的。数据集分为训练集、验证集和测试集。该语料库是为了提供一个足够可靠和大规模的资源来训练和评估生物多样性领域的命名实体识别工具,特别是用于物种出现信息的提取。
The COPIOUS corpus (Corpus of named entities towards extracting Species Occurrence) is a gold standard annotated corpus created for the task of named entity extraction in the biodiversity domain. It includes five key entity categories: Taxon, Geographical Location, Habitat, Temporal Expression, and Person names. The corpus consists of 668 documents randomly selected from over 169,000 English-language pages downloaded from the Biodiversity Heritage Library. The dataset is split into training, validation, and test sets. The corpus was developed to provide a sufficiently reliable and sizeable resource for training and evaluating Named Entity Recognition tools in the biodiversity domain, specifically for the task of species occurrence extraction.
提供机构:
DFKI-SLT



