birdsql/bird-critic-1.0-sqlite
收藏Hugging Face2026-03-23 更新2026-04-05 收录
下载链接:
https://hf-mirror.com/datasets/birdsql/bird-critic-1.0-sqlite
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-sa-4.0
language:
- en
tags:
- text-to-sql
- database
- sqlite
---
## 📢 Update 2026-03-23
We release BIRD-Critic-SQLite, a dataset containing 500 high-quality user issues focused on real-world SQLite database applications. Along with the dataset, we also release three RL-trained models: [BIRD-Talon-14B](https://huggingface.co/birdsql/BIRD-Talon-14b), [BIRD-Talon-7B](https://huggingface.co/birdsql/BIRD-Talon-7b), and [BIRD-Zeno-7B](https://huggingface.co/birdsql/BIRD-Zeno-7b). The schema file is included in the code repository https://github.com/bird-bench/BIRD-CRITIC-1/blob/main/baseline/data/sqlite_schema.jsonl
# BIRD-CRITIC-1.0-SQLite
BIRD-Critic is the first SQL debugging benchmark designed to answer a critical question:
**Can large language models (LLMs) fix user issues in real-world database applications?** \
Each task in BIRD-CRITIC has been verified by human experts on the following dimensions:
1) Reproduction of errors on BIRD env to prevent data leakage.
2) Carefully curate test case functions for each task specifically.
- **Soft EX**: This metric can evaluate SELECT-ONLY tasks.
- **Soft EX + Parsing**: This metric can evaluate tasks with user specific requirements or refinements.
- **Test Case**: For DBA tasks, such as CRUD (CREATE, READ, UPDATE, DELETE), test cases should be promised to evaluate the correct logic. This is also effective for user issues requiring multiple sequential SQLs to resolve.
3) Lightweight evaluation via SQLite (no Docker required).
4) Created new RDBs in different scale and professional domains.
We are pleased to announce the release of BIRD-Critic-SQLite (500), `bird-critic-1.0-sqlite`, which includes high-quality user issues focused on SQLite when developing real-world applications. We curate tasks by:
- Collecting and understanding realistic user issues.
- Distilling problem definitions and SQL knowledge.
- Reproducing bugs and solutions in the BIRD environment.
- Designing test cases for evaluation.
# 📊 Model Performance Results
| Model | SR (%) | Level | Rank |
|:-----:|:------:|:-----:|:----:|
| Gemini-3.1-Pro-Preview | 48.80 | 🏆 Leading | 1 |
| **BIRD-Talon-14B** | 48.00 | 🌟 Elite | 2 |
| Claude-Opus-4-6 | 46.20 | 🌟 Elite | 3 |
| **BIRD-Zeno-7B** | 44.60 | 💎 Superior | 4 |
| **BIRD-Talon-7B** | 44.40 | 💎 Superior | 5 |
| GLM-4.7 | 42.80 | 💎 Superior | 6 |
| GPT-5.4-Pro | 42.00 | 🔸 Advanced | 7 |
| Kimi-K2.5 | 42.00 | 🔸 Advanced | 8 |
| Claude-Sonnet-4.5 | 41.80 | 🔸 Advanced | 9 |
| Qwen3-Coder-480b | 41.60 | 💫 Standard | 10 |
| Minimax-M2.1 | 35.40 | 💫 Standard | 11 |
| Qwen2.5-Coder-14B-Instruct | 33.60 | ⚪ Basic | 12 |
| Qwen2.5-Coder-7B-Instruct | 27.40 | ⚪ Basic | 13 |
**Tier Classification (By Ranking):**
- 🏆 Leading: The Best!
- 🌟 Elite: Top 15%
- 💎 Superior: Top 30%
- 🔸 Advanced: Top 45%
- 💫 Standard: Top 70%
- ⚪ Basic: Bottom 30%
**Instance Categories:**
- **Query**: Instances that involve classic retrieval operations (i.e., SELECT).
- **Management**: Instances that perform database management (e.g., CREATE, UPDATE, INSERT).
- **Personalization**: Instances requiring a custom approach to achieve.
Represented as `category` in each data instance.
## 📁 Dataset Details
### 🔒 Accessing Complete Data
To avoid data leakage by auto-crawling, certain fields (e.g., sol_sql, test_cases) are excluded from the public dataset. For the full dataset, please email: 📧 bird.bench25@gmail.com with subject tag [bird-critic-1 GT&Test Cases], which will be sent automatically within 30 mins.
### 📋 Dataset Structure
Below is a description of the dataset fields and additional information about the structure:
- **dialect**: The SQL dialect (SQLite).
- **version**: The dialect version (3).
- **instance_id**: Unique identifier for each task (SQLite_0 to SQLite_499).
- **db_id**: The name of the database.
- **query**: The user query rewritten in the BIRD environment.
- **issue_sql**: The buggy SQL query written by the user.
- **preprocess_sql**: SQL queries to run before executing the solution or prediction.
- **clean_up_sql**: SQL queries to run after the test cases to revert any changes made to the database.
- **category**: The task category (Query, Management, or Personalization).
The SQLite database files can be found in the `database/` directory of this repository, organized by `db_id` (e.g., `database/financial/financial.sqlite`).
## 🚀 Quick Start
```python
from datasets import load_dataset
# Load the dataset
dataset = load_dataset("birdsql/bird-critic-1.0-sqlite")
# Browse instances
for instance in dataset["train"]:
print(instance["instance_id"], instance["db_id"], instance["category"])
break
```
To download the dataset files manually:
```bash
# Install the Hugging Face CLI
pip install huggingface_hub
# Download the full dataset (including database files)
huggingface-cli download birdsql/bird-critic-1.0-sqlite --repo-type dataset --local-dir ./bird-critic-sqlite
```
## License
This dataset is released under [CC BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/).
## 📄 Paper
If you find our work helpful, please cite as:
```
@article{li2025swe,
title={SWE-SQL: Illuminating LLM Pathways to Solve User SQL Issues in Real-World Applications},
author={Li, Jinyang and Li, Xiaolong and Qu, Ge and Jacobsson, Per and Qin, Bowen and Hui, Binyuan and Si, Shuzheng and Huo, Nan and Xu, Xiaohan and Zhang, Yue and others},
journal={arXiv preprint arXiv:2506.18951},
year={2025}
}
```
# Todo Lists
- [x] Release lite version, bird-critic-1.0-flash (200).
- [x] Open source code, leaderboard page.
- [x] Release Full bird-critic-1.0-open (570 w/ 4 dialects).
- [x] Release Full bird-critic-1.0-postgresql (530 pg tasks).
- [x] LiveSQLBench Base
- [x] Release bird-critic-1.0-sqlite (500 sqlite tasks).
- [x] Release RL-trained models (BIRD-Talon-14B, BIRD-Talon-7B, BIRD-Zeno-7B).
- [ ] BIRD-Nest, a Gym-like training set for bird-critic-1.0
- [ ] BIRD-CRITIC 1.5 Lite on track!
提供机构:
birdsql



