birdsql/six-gym-pg-1.5
收藏Hugging Face2026-03-28 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/birdsql/six-gym-pg-1.5
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-sa-4.0
language:
- en
tags:
- text-to-sql
- database
- postgresql
---
## Update 2026-03-27
We release [Six-Gym-PG-1.5](https://huggingface.co/datasets/birdsql/six-gym-pg-1.5), a train split of BIRD-Critic, comprising 5,000 data instances for model training and development.
### Dataset Structure
Below is a description of the dataset fields and additional information about the structure:
- **instance_id**: Unique identifier for each task.
- **issue_sql**: The buggy SQL query written by the user.
- **dialect**: The SQL dialect (PostgreSQL).
- **version**: The dialect version (14.12).
- **db_id**: The name of the database.
- **clean_up_sql**: SQL queries to run after the test cases to revert any changes made to the database.
- **test_cases**: Test case functions (Python code).
- **sol_sql**: Ground-truth solution SQL.
- **query**: The user query rewritten in the BIRD environment.
- **preprocess_sql**: SQL queries to run before executing the solution or prediction.
- **category**: The task category (Query, Management, or Personalization).
### Quick Start
#### 1. Download the dataset
```bash
# Install the Hugging Face CLI (if not installed)
pip install -U huggingface_hub
# Login (optional, required for gated datasets)
hf login
# Download the full repository
hf download birdsql/six-gym-pg-1.5 --repo-type dataset --local-dir six-gym-pg-1.5
```
#### 2. Set up PostgreSQL 14
Make sure you have PostgreSQL 14 installed and running. You can check our [GitHub repo](https://github.com/bird-bench/BIRD-CRITIC-1) to find how to setup the docker container with PostgreSQL 14. Alternatively, you can install PostgreSQL 14 locally using conda:
```bash
conda create -n pg14 python=3.10 postgresql=14 -c conda-forge -y
conda activate pg14
# Initialize and start PostgreSQL
initdb -D ~/pgdata
pg_ctl -D ~/pgdata -l ~/pgdata/logfile start
# Create the root user (used by default in init script)
createuser -s root
```
#### 3. Initialize databases
```bash
cd six-gym-pg-1.5
# Run the init script (adjust host/port/user as needed)
bash init_databases.sh -p 5432 -U root
# Or with custom settings:
# bash init_databases.sh -h localhost -p 5433 -U root -W mypassword -d ./databases
```
This will create 13 template databases and their corresponding working copies from the SQL dumps in the `databases/` folder.
#### 4. Load the dataset
```python
import json
with open("six_gym_postgresql.jsonl", "r") as f:
data = [json.loads(line) for line in f]
print(f"Loaded {len(data)} instances")
print(f"First instance: {data[0]['instance_id']}, db: {data[0]['db_id']}, category: {data[0]['category']}")
```
#### 5. Evaluation
We use the same evaluation protocol as [BIRD-CRITIC-1](https://github.com/bird-bench/BIRD-CRITIC-1/tree/main/evaluation). For each instance, run the `test_cases` against the predicted SQL query on the corresponding database. The prediction is considered correct if all test cases pass without errors.
提供机构:
birdsql



