Shumatsurontek/neo-sql-reasoning-combined
收藏Hugging Face2026-04-06 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/Shumatsurontek/neo-sql-reasoning-combined
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
task_categories:
- text-generation
tags:
- sql
- reasoning
- math
- sft
- chat
- fine-tuning
language:
- en
pretty_name: Neo SQL + Reasoning Combined
size_categories:
- 1K<n<10K
configs:
- config_name: combined
data_files:
- split: train
path: combined/train-*
- split: test
path: combined/test-*
default: true
- config_name: math_only
data_files:
- split: train
path: math_only/train-*
- split: test
path: math_only/test-*
- config_name: reasoning_only
data_files:
- split: train
path: reasoning_only/train-*
- split: test
path: reasoning_only/test-*
- config_name: sql_only
data_files:
- split: train
path: sql_only/train-*
- split: test
path: sql_only/test-*
dataset_info:
- config_name: combined
features:
- name: messages
list:
- name: content
dtype: string
- name: role
dtype: string
- name: source
dtype: string
- name: difficulty
dtype: string
splits:
- name: train
num_bytes: 7577123
num_examples: 7650
- name: test
num_bytes: 841902
num_examples: 850
download_size: 8261040
dataset_size: 8419025
- config_name: math_only
features:
- name: messages
list:
- name: content
dtype: string
- name: role
dtype: string
- name: source
dtype: string
- name: difficulty
dtype: string
splits:
- name: train
num_bytes: 861756
num_examples: 1260
- name: test
num_bytes: 95750
num_examples: 140
download_size: 935748
dataset_size: 957506
- config_name: reasoning_only
features:
- name: difficulty
dtype: string
- name: messages
list:
- name: content
dtype: string
- name: role
dtype: string
- name: source
dtype: string
splits:
- name: train
num_bytes: 3660790
num_examples: 1890
- name: test
num_bytes: 406754
num_examples: 210
download_size: 4027617
dataset_size: 4067544
- config_name: sql_only
features:
- name: messages
list:
- name: content
dtype: string
- name: role
dtype: string
- name: source
dtype: string
- name: difficulty
dtype: string
splits:
- name: train
num_bytes: 3054576
num_examples: 4500
- name: test
num_bytes: 339397
num_examples: 500
download_size: 3300863
dataset_size: 3393973
---
# Neo SQL + Reasoning Combined Dataset
Combined SFT dataset for fine-tuning SQL, reasoning, and math models.
Built for the [neo-deep-agent-lab](https://github.com/Shumatsurontek/neo-deep-agent-lab) project.
## Sources & Proportions
| Source | Proportion | Records | Focus |
|--------|-----------|---------|-------|
| `gretelai/synthetic_text_to_sql` | 50% | ~5,000 | SQL generation |
| `nohurry/Opus-4.6-Reasoning-3000x-filtered` | 30% | ~2,100 | Reasoning |
| `openai/gsm8k` | 20% | ~1,400 | Math problems |
## Format
All samples are normalized to **SFT chat format**:
```json
{
"messages": [
{"role": "system", "content": "<task-specific prompt>"},
{"role": "user", "content": "<question>"},
{"role": "assistant", "content": "<answer>"}
],
"source": "sql|reasoning|math",
"difficulty": "basic|medium|hard"
}
```
## Configs
| Config | Description | Default |
|--------|-------------|---------|
| `combined` | All 3 sources mixed and shuffled | Yes |
| `sql_only` | SQL generation only | |
| `reasoning_only` | Analytical reasoning only | |
| `math_only` | Math problems only | |
## Usage
```python
from datasets import load_dataset
# Load default (combined)
ds = load_dataset("Shumatsurontek/neo-sql-reasoning-combined")
# Load specific config
sql = load_dataset("Shumatsurontek/neo-sql-reasoning-combined", "sql_only")
math = load_dataset("Shumatsurontek/neo-sql-reasoning-combined", "math_only")
```
## Training
This dataset is designed for SFT (Supervised Fine-Tuning) with models like:
- **LiquidAI/LFM2.5-350M** (350M params, 32K context)
- **Qwen/Qwen3.5-\*** (0.8B to 9B)
Best results with LoRA bf16, 3 epochs, lr=2e-4, batch=4.
## License
Apache 2.0 (inherits from source datasets)
提供机构:
Shumatsurontek



