ramachetan22/sql-create-context-v2
收藏sql-create-context-v2 数据集
概述
sql-create-context-v2 数据集是在 WikiSQL 和 Spider 数据集的基础上进行增强的,专注于文本到 SQL 任务,特别强调减少列和表名的幻觉。该版本引入了 JSON Lines (JSONL) 格式,以提高大数据集的处理效率和迭代速度,并采用结构化的方法来表示数据集条目中的 SQL 查询。
关键增强
- 数据集格式: 转换为 JSON Lines (JSONL) 格式,以改善大型数据集的处理和单个记录的流线型处理。
- 结构化查询表示: 每个 SQL 查询答案现在都封装在一个以
SQL_Query为键的对象中,便于查询文本和其他元数据的清晰分离。
示例条目
json { "question": "Please show the themes of competitions with host cities having populations larger than 1000.", "context": "CREATE TABLE city (City_ID VARCHAR, Population INTEGER); CREATE TABLE farm_competition (Theme VARCHAR, Host_city_ID VARCHAR)", "answer": {"SQL_Query": "SELECT T2.Theme FROM city AS T1 JOIN farm_competition AS T2 ON T1.City_ID = T2.Host_city_ID WHERE T1.Population > 1000"} }, { "question": "Please show the different statuses of cities and the average population of cities with each status.", "context": "CREATE TABLE city (Status VARCHAR, Population INTEGER)", "answer": {"SQL_Query": "SELECT Status, AVG(Population) FROM city GROUP BY Status"} }



