five

Shumatsurontek/neo-sql-reasoning-combined

收藏
Hugging Face2026-04-06 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/Shumatsurontek/neo-sql-reasoning-combined
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 task_categories: - text-generation tags: - sql - reasoning - math - sft - chat - fine-tuning language: - en pretty_name: Neo SQL + Reasoning Combined size_categories: - 1K<n<10K configs: - config_name: combined data_files: - split: train path: combined/train-* - split: test path: combined/test-* default: true - config_name: math_only data_files: - split: train path: math_only/train-* - split: test path: math_only/test-* - config_name: reasoning_only data_files: - split: train path: reasoning_only/train-* - split: test path: reasoning_only/test-* - config_name: sql_only data_files: - split: train path: sql_only/train-* - split: test path: sql_only/test-* dataset_info: - config_name: combined features: - name: messages list: - name: content dtype: string - name: role dtype: string - name: source dtype: string - name: difficulty dtype: string splits: - name: train num_bytes: 7577123 num_examples: 7650 - name: test num_bytes: 841902 num_examples: 850 download_size: 8261040 dataset_size: 8419025 - config_name: math_only features: - name: messages list: - name: content dtype: string - name: role dtype: string - name: source dtype: string - name: difficulty dtype: string splits: - name: train num_bytes: 861756 num_examples: 1260 - name: test num_bytes: 95750 num_examples: 140 download_size: 935748 dataset_size: 957506 - config_name: reasoning_only features: - name: difficulty dtype: string - name: messages list: - name: content dtype: string - name: role dtype: string - name: source dtype: string splits: - name: train num_bytes: 3660790 num_examples: 1890 - name: test num_bytes: 406754 num_examples: 210 download_size: 4027617 dataset_size: 4067544 - config_name: sql_only features: - name: messages list: - name: content dtype: string - name: role dtype: string - name: source dtype: string - name: difficulty dtype: string splits: - name: train num_bytes: 3054576 num_examples: 4500 - name: test num_bytes: 339397 num_examples: 500 download_size: 3300863 dataset_size: 3393973 --- # Neo SQL + Reasoning Combined Dataset Combined SFT dataset for fine-tuning SQL, reasoning, and math models. Built for the [neo-deep-agent-lab](https://github.com/Shumatsurontek/neo-deep-agent-lab) project. ## Sources & Proportions | Source | Proportion | Records | Focus | |--------|-----------|---------|-------| | `gretelai/synthetic_text_to_sql` | 50% | ~5,000 | SQL generation | | `nohurry/Opus-4.6-Reasoning-3000x-filtered` | 30% | ~2,100 | Reasoning | | `openai/gsm8k` | 20% | ~1,400 | Math problems | ## Format All samples are normalized to **SFT chat format**: ```json { "messages": [ {"role": "system", "content": "<task-specific prompt>"}, {"role": "user", "content": "<question>"}, {"role": "assistant", "content": "<answer>"} ], "source": "sql|reasoning|math", "difficulty": "basic|medium|hard" } ``` ## Configs | Config | Description | Default | |--------|-------------|---------| | `combined` | All 3 sources mixed and shuffled | Yes | | `sql_only` | SQL generation only | | | `reasoning_only` | Analytical reasoning only | | | `math_only` | Math problems only | | ## Usage ```python from datasets import load_dataset # Load default (combined) ds = load_dataset("Shumatsurontek/neo-sql-reasoning-combined") # Load specific config sql = load_dataset("Shumatsurontek/neo-sql-reasoning-combined", "sql_only") math = load_dataset("Shumatsurontek/neo-sql-reasoning-combined", "math_only") ``` ## Training This dataset is designed for SFT (Supervised Fine-Tuning) with models like: - **LiquidAI/LFM2.5-350M** (350M params, 32K context) - **Qwen/Qwen3.5-\*** (0.8B to 9B) Best results with LoRA bf16, 3 epochs, lr=2e-4, batch=4. ## License Apache 2.0 (inherits from source datasets)
提供机构:
Shumatsurontek
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作