Wikipedia-based datasets
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/epfl-dlab/pti-candgen
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是从维基百科文章中创建的,旨在评估在低资源语言情境下的候选生成方法。它不仅包含了全面的评估信息,可以用于分析不同学习设置(零样本学习和联合学习)下的各种候选生成方法,而且还覆盖了9对语言(从低资源语言到高、中、低资源枢纽语言)。该数据集的主要任务是实体链接和候选生成。
This dataset is curated from Wikipedia articles, designed to evaluate candidate generation methods in low-resource language contexts. It includes comprehensive evaluation information that enables analysis of various candidate generation approaches across different learning settings, namely zero-shot learning and joint learning, and covers 9 language pairs ranging from low-resource languages to high-, medium-, and low-resource hub languages. The primary tasks of this dataset are entity linking and candidate generation.
提供机构:
Wikipedia



