richardr1126/spider-natsql-context-instruct
收藏数据集卡片 for Spider NatSQL Context Instruct
数据集概述
Spider 是一个大规模复杂且跨领域的语义解析和文本到SQL数据集,由11名耶鲁大学学生标注。Spider挑战的目标是开发跨领域数据库的自然语言接口。
该数据集旨在使用NatSQL对Spider数据集进行数据库上下文的微调。
NatSQL
NatSQL 是一种SQL的中间表示,简化了查询并减少了自然语言和SQL之间的不匹配。NatSQL保留了SQL的核心功能,但去除了一些难以从自然语言描述中推断的子句和关键字。NatSQL通过减少预测的架构项数量,使架构链接更加容易。NatSQL可以轻松转换为可执行的SQL查询,并可以提高文本到SQL模型的性能。
语言
数据集中的文本为英语。
许可信息
Spider数据集的许可为 CC BY-SA 4.0。
引用
@article{yu2018spider, title={Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-sql task}, author={Yu, Tao and Zhang, Rui and Yang, Kai and Yasunaga, Michihiro and Wang, Dongxu and Li, Zifan and Ma, James and Li, Irene and Yao, Qingning and Roman, Shanelle and others}, journal={arXiv preprint arXiv:1809.08887}, year={2018} }
@inproceedings{gan-etal-2021-natural-sql, title = "Natural {SQL}: Making {SQL} Easier to Infer from Natural Language Specifications", author = "Gan, Yujian and Chen, Xinyun and Xie, Jinxia and Purver, Matthew and Woodward, John R. and Drake, John and Zhang, Qiaofu", booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2021", month = nov, year = "2021", address = "Punta Cana, Dominican Republic", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2021.findings-emnlp.174", doi = "10.18653/v1/2021.findings-emnlp.174", pages = "2030--2042", }



