PrimeKGQA, the dataset from paper: Bridging the Gap: Generating a Comprehensive Biomedical Knowledge Graph Question Answering Dataset

NIAID Data Ecosystem2026-05-02 收录

下载链接：

https://zenodo.org/record/13348626

下载链接

链接失效反馈

官方服务：

资源简介：

Despite the plethora of resources such as large-scale corpora and manually curated Knowledge Graphs (KGs), the ability to perform reasoning with natural language inputs over biomedical graphs remains challenging due to insufficient training data. We propose a novel method for automatically constructing a Biomedical Knowledge Graph Question Answering (BioKGQA) dataset sourced from PrimeKG, the largest precision medicine-oriented KG. In total,we create 83999 question-answer pairs along with their respective SPARQL queries. Our approach generates a diverse array of contextually relevant questions covering a wide spectrum of biomedical concepts and levels of complexity. We evaluate our method based on automatic metrics alongside manual annotations. We establish novel standards tailored for KGQA systems to highlight the linguistic correctness and semantical faithfulness of the generated questions based on extracted KG facts. The compiled dataset – PrimeKGQA – serves as a valuable benchmarking resource for advancing knowledge-driven biomedical research and evaluating KGQA system.

创建时间：

2024-10-02

5,000+

优质数据集

54 个

任务类型

进入经典数据集