CLAMBSQL
收藏CLEAR: A Parser-Independent Disambiguation Framework for NL2SQL
数据集概述
数据集列表
| 数据集名称 | 描述 |
|---|---|
| testzer0/AmbiQT | 用于评估和改进在模糊性下的文本到SQL生成(EMNLP 2023) |
| AMBROSIA | 用于解析模糊问题并将其转换为数据库查询的基准(arxiv) |
| BIRD | 用于评估大型数据库基础的文本到SQL生成的大型基准(NeurIPS 2023) |
| CLAMBSQL | 我们提出的用于系统评估模糊性解析的基准。实例和数据库可在此处获取:CLAMBSQL |
CLAMBSQL 数据格式
每个示例包含以下字段:
index: 示例的索引。db_id: 示例的数据库名称。ambig_type: 示例的模糊类型。question: 示例的模糊问题。schema_without_content: 不包含数据库内容的数据库模式提取。schema_with_content: 包含数据库内容的数据库模式提取。ambiguous_queries: 回答问题的所有可能的SQL查询。gold_ambiguity: 模糊性的黄金候选映射。clarification_context: 用于模糊性澄清的自然语言反馈。clear_ambiguity: 模糊性澄清的黄金选择映射。gold_query: 与澄清相对应的问题的实际意图的黄金SQL解析。
示例
json { "index": 0, "db_id": "world_1", "ambig_type": "column", "db_file": "column/world_1/world_1.sqlite", "question": "What is the continent name which Anguilla belongs to?", "schema_without_content": "city : countrycode , name , population , district , id | sqlite_sequence : name , seq | country : capital , headofstate , localname , lifeexpectancy , gnp , gnpold , continent_name , code , surfacearea , population , code2 , mainland , region , indepyear , governmentform , name | countrylanguage : language , percentage , isofficial , countrycode", "schema_with_content": "city : countrycode ("DMA", "NER", "NLD"), name ("Scottsdale", "Taxco de Alarcón", "Wellington"), population (89423, 245772, 315382), district ("Borsod-Abaúj-Zemplén", "West Java", "Midi-Pyrénées"), id (3788, 3629, 340) | sqlite_sequence : name ("city"), seq (4079) | country : capital (2973, 3243, 3212), headofstate ("Hamad ibn Isa al-Khalifa", None, "Vicente Fox Quesada"), localname ("México", "Makedonija", "Sverige"), lifeexpectancy (77.6, 77.0, 54.8), gnp (340238.0, 6041.0, 211860.0), gnpold (573.0, 360478.0, 2141.0), continent_name ("Europe", "Oceania", "South America"), code ("VCT", "SYR", "NFK"), surfacearea (774815.0, 96.0, 1862.0), population (453000, 50456000, 9586000), code2 ("AD", "ID", "SK"), mainland ("Europe", "Oceania", "South America"), region ("Eastern Europe", "Polynesia", "Polynesia"), indepyear (836, 1143, 1581), governmentform ("Islamic Emirate", "Occupied by Marocco", "Constitutional Monarchy"), name ("French Polynesia", "Iran", "Chad") | countrylanguage : language ("Kanem-bornu", "Dari", "Yao"), percentage (8.2, 14.0, 11.4), isofficial ("T", "F"), countrycode ("SYC", "UMI", "LBY")", "ambiguous_queries": [ "select mainland from country where name = Anguilla", "select continent_name from country where name = Anguilla" ], "gold_ambiguity": { "match": "{"continent": [{"country": ["mainland"]}, {"country": ["continent_name"]}]}", "query": "{}" }, "clarification_context": ""continent" refers to the schema "country"."continent_name"", "clear_ambiguity": "{"continent": {"country": ["continent_name"]}}", "gold_query": "select continent_name from country where name = Anguilla" }




