BiomedSQL
收藏arXiv2025-09-30 收录
下载链接:
https://huggingface.co/datasets/NIH-CARD/BiomedSQL
下载链接
链接失效反馈官方服务:
资源简介:
该数据集名为BiomedSQL,包含了68,000组问题/SQL查询/答案的三元组,这些三元组基于一个整合了基因与疾病关联、从组学数据中进行的因果推断以及药物批准记录的统一BigQuery知识库。该数据集旨在评估在生物医学背景下文本到SQL生成中的科学推理能力,并包含了各种SQL操作类型和生物医学推理类别。其规模达到了68,000组问题/SQL查询/答案的三元组,任务重点在于生物医学语境中的科学推理文本到SQL生成。
This dataset, designated as BiomedSQL, comprises 68,000 triplets of question, SQL query, and answer. All these triplets are developed upon a unified BigQuery knowledge base that integrates gene-disease associations, causal inference extracted from omics data, and drug approval records. The primary goal of this dataset is to evaluate scientific reasoning capabilities in text-to-SQL generation within the biomedical context, and it covers various SQL operation types and biomedical reasoning categories. With a total of 68,000 such triplets, its core task focuses on scientific reasoning-oriented text-to-SQL generation in the biomedical domain.
提供机构:
NIH-CARD



