EHR-SeqSQL
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/seonhee99/EHR-SeqSQL
下载链接
链接失效反馈官方服务:
资源简介:
该数据集名为EHR-SeqSQL,是一个针对电子健康记录(EHR)数据库设计的新颖的顺序文本到SQL数据集。它专注于交互性、组合性和效率,包含了顺序和上下文相关的问题,并且是医学文本到SQL领域最大的基准测试。该数据集在SQL查询中整合了特别设计的标记,以提高执行效率,并包含了一个新的数据分割方式,其中有一个新的测试集,用于评估组合泛化能力。其特点还在于,在生成问题过程中,涉及SQL分解和自然语言查询(NLQ)生成,确保了生成问题的清晰度和自然性。该任务的目的是进行文本到SQL解析。
This dataset, named EHR-SeqSQL, is a novel sequential text-to-SQL dataset designed for electronic health record (EHR) databases. It focuses on interactivity, compositionality and efficiency, contains sequential and context-dependent queries, and stands as the largest benchmark in the medical text-to-SQL field. This dataset integrates specially designed tokens into SQL queries to improve execution efficiency, and includes a novel data splitting strategy with a new test set for evaluating compositional generalization capabilities. Another key feature is that during the query generation process, it involves SQL decomposition and natural language query (NLQ) generation, which ensures the clarity and naturalness of the generated queries. The objective of this task is text-to-SQL parsing.



