ISWC 2023 LM-KBC Challenge Dataset
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/bohuizhang/LLMKE
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含了21种维基数据关系类型,覆盖了7个领域,包括音乐、电视剧、体育、地理、化学、商业、行政区划以及公众人物信息。总共有1,940条陈述用于训练、验证和测试。该数据集可用于研究大型语言模型与维基数据之间的知识差距,并通过维基数据的SPARQL查询生成了用于离线评估的基准真值。在规模上,训练、验证和测试集共包含了1,940条陈述。这项任务的目的是知识工程和知识库的完善。
This dataset includes 21 Wikidata relationship types, covering 7 domains including music, television series, sports, geography, chemistry, business, administrative divisions, and public figure information. A total of 1,940 statements are allocated for training, validation and testing. This dataset can be used to investigate the knowledge gap between large language models and Wikidata, and the ground truth for offline evaluation is generated via SPARQL queries conducted on Wikidata. In terms of scale, the training, validation and test sets collectively contain 1,940 statements. The purpose of this task is knowledge engineering and knowledge base enrichment.
提供机构:
Wikidata



