Text2KGBench
收藏arXiv2023-08-04 更新2024-06-21 收录
下载链接:
https://github.com/cenguix/Text2KGBench
下载链接
链接失效反馈官方服务:
资源简介:
Text2KGBench是一个用于评估语言模型从文本生成知识图谱能力的基准数据集,由IBM研究院欧洲爱尔兰分院的研究人员创建。该数据集包含两个子集:Wikidata-TekGen和DBpedia-WebNLG,分别基于Wikidata和DBpedia的本体结构,涵盖了电影、音乐、体育等多个领域。Wikidata-TekGen包含10个本体和13,474个句子,而DBpedia-WebNLG则有19个本体和4,860个句子。数据集的创建过程涉及手动构建本体、使用SPARQL查询生成三元组,并通过远监督技术与Wikipedia句子对齐。Text2KGBench旨在通过这些数据集评估语言模型在遵循特定本体的前提下,从文本中提取事实的能力,适用于自然语言处理和语义网领域的研究,特别是在知识图谱的自动构建和完善方面。
Text2KGBench is a benchmark dataset for evaluating the capability of language models to generate knowledge graphs from text, created by researchers from IBM Research Europe, Ireland. The dataset comprises two subsets: Wikidata-TekGen and DBpedia-WebNLG, which are respectively based on the ontology structures of Wikidata and DBpedia, covering multiple domains including film, music, sports and others. Wikidata-TekGen contains 10 ontologies and 13,474 sentences, while DBpedia-WebNLG includes 19 ontologies and 4,860 sentences. The dataset creation process involves manually constructing ontologies, generating triples via SPARQL queries, and aligning with Wikipedia sentences through distant supervision techniques. Text2KGBench is designed to evaluate the ability of language models to extract factual knowledge from text while adhering to specific ontologies, and is applicable to research in the fields of natural language processing and the Semantic Web, particularly in the automatic construction and refinement of knowledge graphs.
提供机构:
IBM研究院欧洲爱尔兰分院
创建时间:
2023-08-04



