five

birdsql/six-gym-pg-1.5

收藏
Hugging Face2026-03-28 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/birdsql/six-gym-pg-1.5
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-sa-4.0 language: - en tags: - text-to-sql - database - postgresql --- ## Update 2026-03-27 We release [Six-Gym-PG-1.5](https://huggingface.co/datasets/birdsql/six-gym-pg-1.5), a train split of BIRD-Critic, comprising 5,000 data instances for model training and development. ### Dataset Structure Below is a description of the dataset fields and additional information about the structure: - **instance_id**: Unique identifier for each task. - **issue_sql**: The buggy SQL query written by the user. - **dialect**: The SQL dialect (PostgreSQL). - **version**: The dialect version (14.12). - **db_id**: The name of the database. - **clean_up_sql**: SQL queries to run after the test cases to revert any changes made to the database. - **test_cases**: Test case functions (Python code). - **sol_sql**: Ground-truth solution SQL. - **query**: The user query rewritten in the BIRD environment. - **preprocess_sql**: SQL queries to run before executing the solution or prediction. - **category**: The task category (Query, Management, or Personalization). ### Quick Start #### 1. Download the dataset ```bash # Install the Hugging Face CLI (if not installed) pip install -U huggingface_hub # Login (optional, required for gated datasets) hf login # Download the full repository hf download birdsql/six-gym-pg-1.5 --repo-type dataset --local-dir six-gym-pg-1.5 ``` #### 2. Set up PostgreSQL 14 Make sure you have PostgreSQL 14 installed and running. You can check our [GitHub repo](https://github.com/bird-bench/BIRD-CRITIC-1) to find how to setup the docker container with PostgreSQL 14. Alternatively, you can install PostgreSQL 14 locally using conda: ```bash conda create -n pg14 python=3.10 postgresql=14 -c conda-forge -y conda activate pg14 # Initialize and start PostgreSQL initdb -D ~/pgdata pg_ctl -D ~/pgdata -l ~/pgdata/logfile start # Create the root user (used by default in init script) createuser -s root ``` #### 3. Initialize databases ```bash cd six-gym-pg-1.5 # Run the init script (adjust host/port/user as needed) bash init_databases.sh -p 5432 -U root # Or with custom settings: # bash init_databases.sh -h localhost -p 5433 -U root -W mypassword -d ./databases ``` This will create 13 template databases and their corresponding working copies from the SQL dumps in the `databases/` folder. #### 4. Load the dataset ```python import json with open("six_gym_postgresql.jsonl", "r") as f: data = [json.loads(line) for line in f] print(f"Loaded {len(data)} instances") print(f"First instance: {data[0]['instance_id']}, db: {data[0]['db_id']}, category: {data[0]['category']}") ``` #### 5. Evaluation We use the same evaluation protocol as [BIRD-CRITIC-1](https://github.com/bird-bench/BIRD-CRITIC-1/tree/main/evaluation). For each instance, run the `test_cases` against the predicted SQL query on the corresponding database. The prediction is considered correct if all test cases pass without errors.
提供机构:
birdsql
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作