MIM-GOLD-EL
收藏arXiv2022-06-10 更新2024-06-21 收录
下载链接:
https://repository.clarin.is/repository/xmlui/handle/20.500.12537/168
下载链接
链接失效反馈官方服务:
资源简介:
MIM-GOLD-EL是冰岛首个实体链接语料库,由冰岛大学等机构创建,包含超过21,000个已链接到Wikidata中相应命名实体的提及。该数据集基于多种文本来源,旨在为冰岛语或跨语言实体链接模型提供训练材料。创建过程中,研究团队采用了多语言实体链接模型mGENRE与Wikipedia API Search相结合的方法,有效提高了数据标注的覆盖率。该数据集主要应用于信息提取领域,解决实体识别与链接中的歧义问题。
MIM-GOLD-EL is Iceland's first entity linking corpus, developed by institutions including the University of Iceland and other relevant organizations. It contains over 21,000 entity mentions linked to their corresponding named entities in Wikidata. Built on diverse textual sources, this dataset is designed to serve as training material for Icelandic or cross-lingual entity linking models. During its development, the research team employed a hybrid approach combining the multilingual entity linking model mGENRE and Wikipedia API Search, which effectively improved the coverage of data annotation. This dataset is primarily utilized in the field of information extraction to address ambiguity challenges in entity recognition and linking.
提供机构:
冰岛大学
创建时间:
2022-06-10
搜集汇总
背景与挑战
背景概述
MIM-GOLD-EL是冰岛首个实体链接语料库,包含超过21,000个已链接到Wikidata的提及,基于多种文本来源,旨在为冰岛语或跨语言实体链接模型提供训练材料。该数据集采用mGENRE与Wikipedia API Search结合的方法,提高了标注覆盖率,主要应用于信息提取领域,解决实体识别与链接中的歧义问题。
以上内容由遇见数据集搜集并总结生成



