five

AmbiQT

收藏
arXiv2023-10-21 更新2024-06-21 收录
下载链接:
https://github.com/testzer0/AmbiQT
下载链接
链接失效反馈
官方服务:
资源简介:
AmbiQT是由印度理工学院孟买分校开发的创新基准数据集,包含超过3000个示例,旨在评估文本到SQL转换模型在模糊性情况下的性能。每个文本查询可以解释为两个合理的SQL语句,由于词汇和/或结构模糊性。数据集通过结合ChatGPT基于同义词生成和规则基础扰动来生成,旨在解决现实数据库中由于重叠的架构名称和多个混淆的关系路径而经常涉及的模糊性问题。AmbiQT的应用领域包括数据库查询的自然语言接口,特别是在需要集成多个数据源进行数据分析的情况下。

AmbiQT is an innovative benchmark dataset developed by the Indian Institute of Technology Bombay, containing over 3000 examples, designed to evaluate the performance of text-to-SQL translation models under ambiguous scenarios. Each text query can be interpreted as two plausible SQL statements due to lexical and/or structural ambiguities. The dataset is generated by combining synonym generation via ChatGPT and rule-based perturbation, aiming to address the ambiguity issues frequently encountered in real-world databases caused by overlapping schema names and multiple confusing relational paths. Application scenarios of AmbiQT include natural language interfaces for database queries, particularly in cases where integrating multiple data sources is required for data analysis.
提供机构:
印度理工学院孟买分校
创建时间:
2023-10-21
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作