Hansel
收藏arXiv2023-10-29 更新2024-06-21 收录
下载链接:
https://github.com/HITsz-TMG/Hansel
下载链接
链接失效反馈官方服务:
资源简介:
Hansel是一个针对中文的少样本和零样本实体链接基准数据集,由哈尔滨工业大学深圳分校创建。该数据集包含10,000篇来自新闻、社交媒体帖子和其他网络文章的多样化文档,以Wikidata作为其目标知识库。数据集的创建过程采用了新颖的方法,特别是在收集零样本实体链接数据集方面。Hansel数据集旨在解决实体链接系统中的流行偏见问题,特别是在处理尾部和新兴实体时的性能问题。
Hansel is a few-shot and zero-shot entity linking benchmark dataset for the Chinese language, developed by Harbin Institute of Technology, Shenzhen. It contains 10,000 diverse documents sourced from news, social media posts and other web articles, with Wikidata as its target knowledge base. The dataset was constructed using a novel methodology, specifically for the curation of zero-shot entity linking datasets. The Hansel benchmark aims to address prevalent bias issues in entity linking systems, especially the performance limitations encountered when handling tail and emerging entities.
提供机构:
哈尔滨工业大学深圳分校
创建时间:
2022-07-27



