Snake
收藏Snake: 大规模SQL查询数据集,专注于DDL和DCL命令
文件
snake_dataset.tar.bz2: 包含110万个数据点的压缩数据集文件,以JSON格式存储的标记化SQL查询和语句。decompress.py: 用于解压snake_dataset.tar.bz2并提取snake_dataset.json的Python脚本。sample_dataset.py: 用于从snake_dataset.json文件中获取100000个数据点样本的Python脚本。sample_dataset.tar.bz2: 主数据集的一个样本(可以使用decompress.py解压)。requirements.txt: 运行decompress.py和sample_dataset.py文件所需的库。README.md: 数据集的README文件。
数据集概览
数据集包含超过110万个条目,每个条目包括一个文本SQL查询描述及其对应的SQL语句。查询涵盖了广泛的操作用于数据库管理、查询优化和机器学习。这些查询对应于以下数据库:
HR: [Employees, Projects, Departments]Education: [Courses, Students]Library: [Books]eCommerce: [Orders, Products, Suppliers, Customers]Finance: [Invoices, Payments, Expenses, Budgets, Assets, Liabilities]Logistics: [Shipments, Categories]Sales: [Sales, Reviews, Campaigns, Promotions, Coupons]IT: [Tasks, Assignments, Resources]Support: [Feedback, Complaints]Events: [Events, Locations, Schedules]Transport: [Tickets, Flights]Hospitality: [Hotels, Reservations]Membership: [Memberships, Subscriptions]Legal: [Contracts, Leases, Policies, Claims]Messaging: [Messages, Notifications]Logs: [Logs]Reports: [Reports]Alerts: [Alerts]Requests: [Requests, Issues]Documents: [Documents, Notes]Calendar: [Calendars, Agendas]Widgets: [Widgets]Profiles: [Profiles]Jobs: [Jobs]Social: [Posts, Comments, Likes, Followers, Tags]Books: [Authors, Genres]Monitoring: [Audits]Actions: [Actions, Errors, Warnings]Default: [] # 适用于上述未涵盖的任何表
数据内容和格式
json { "query": "显示用户johnowens的所有授权。", "query_toks": [ "SHOW", "GRANTS", "FOR", "johnowens", "", ";" ], "sql": "SHOW GRANTS FOR johnowens;", "question_toks": [ "Show", "all", "grants", "for", "the", "user", "johnowens", "", "." ], "db_id": "Default", "qid": 92003752 }, { "query": "在表Reviews的列DownloadCount上创建名为hospital的索引。", "query_toks": [ "CREATE", "INDEX", "hospital", "ON", "Reviews", "(", "DownloadCount", ")", ";" ], "sql": "CREATE INDEX hospital ON Reviews(DownloadCount);", "question_toks": [ "Create", "an", "index", "named", "hospital", "on", "column", "DownloadCount", "in", "table", "Reviews", "." ], "db_id": "Sales", "qid": 62944826 }




