LocoreMind/qwen3.5-27b-cli-reasoning-3632x
收藏Hugging Face2026-02-28 更新2026-04-05 收录
下载链接:
https://hf-mirror.com/datasets/LocoreMind/qwen3.5-27b-cli-reasoning-3632x
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
license: cc-by-4.0
size_categories:
- 1K<n<10K
task_categories:
- text-generation
- question-answering
tags:
- reasoning
- cli
- terminal
- bash
- linux
- distillation
- synthetic
- thinking
- chain-of-thought
pretty_name: Qwen3.5-27B CLI Reasoning 3632x
dataset_info:
features:
- name: messages
list:
- name: role
dtype: string
- name: content
dtype: string
splits:
- name: train
num_examples: 3632
---
# Qwen3.5-27B CLI Reasoning 3632x
A synthetic reasoning dataset for CLI/terminal command assistance, distilled from **Qwen3.5-27B** with thinking mode enabled.
Each sample contains a realistic user scenario describing a terminal task, paired with the model's reasoning chain (`<think>`) and a structured JSON answer (`command` + `description`).
## Dataset Summary
| | |
|---|---|
| **Source model** | Qwen3.5-27B (DashScope API) |
| **Samples** | 3,632 |
| **Thinking mode** | Enabled (budget: 4096 tokens) |
| **Source data** | [b-mc2/cli-commands-explained](https://huggingface.co/datasets/b-mc2/cli-commands-explained) (16K CLI commands) |
| **Command coverage** | 3,034 unique command prefixes (100% of source) |
| **Quality filter** | JSON parseable + `bash -n` syntax check passed |
| **License** | CC-BY-4.0 |
## Creation Pipeline
This dataset was built using a three-phase distillation pipeline:
1. **Phase 1 — Prompt Generation** (non-thinking mode): Each raw CLI command entry (`code`, `title`, `description`, `date`) was fed to Qwen3.5-27B to generate a realistic, scenario-based user prompt with environment context (`Current time`, `Current OS`, `Shell`). The model infers OS/Shell from the command itself.
2. **Phase 2 — Thinking Distillation**: The generated prompts were sent back to Qwen3.5-27B with thinking mode enabled. The model reasons about the user's problem and outputs a structured JSON answer.
3. **Phase 3 — Quality Filtering**: Responses were validated for JSON parseability and command syntax (`bash -n`). 89 samples with broken JSON escaping or truncated multi-line commands were filtered out (2.4% rejection rate).
### Stratified Sampling
3,721 samples were selected from the 16K source dataset using stratified sampling:
- All high-vote commands (votes ≥ 11): 797 classic commands
- One representative per unique command prefix: 2,924 additional commands
- 100% coverage of all 3,034 command prefixes in the source
## Data Format
```json
{
"messages": [
{
"role": "system",
"content": ""
},
{
"role": "user",
"content": "Current time: 2009-01-26 10:26:48\nCurrent OS: Linux\nShell: Bash\n\nI just tried to install a package but got a \"permission denied\" error because I forgot to run it with sudo. Is there a quick way to re-run that exact same command as root without having to type the whole thing out again?"
},
{
"role": "assistant",
"content": "<think>The user ran a command without sudo and got a permission denied error. They want to quickly re-run the same command but with sudo. In bash, there's a quick way to do this using `sudo !!`...</think>\n```json\n{\n \"command\": \"sudo !!\",\n \"description\": \"This uses bash's history expansion feature where '!!' represents the last command you executed. Prepending 'sudo' allows you to immediately re-run that exact command with elevated privileges without retyping it.\"\n}\n```"
}
]
}
```
## Statistics
| Metric | Min | Avg | Max |
|--------|-----|-----|-----|
| User prompt length | 243c | 411c | 622c |
| Reasoning (`<think>`) length | 272c | 1,920c | 16,135c |
| Answer length | 100c | 411c | 1,958c |
## Usage
```python
from datasets import load_dataset
dataset = load_dataset("LocoreMind/qwen3.5-27b-cli-reasoning-3632x")
```
## Command Prefix Coverage (Top 10)
`find`(1079) · `for`(600) · `echo`(579) · `sudo`(451) · `curl`(364) · `ls`(325) · `cat`(304) · `grep`(254) · `git`(251) · `sed`(241)
*Counts refer to the full source dataset; the sampled subset covers all prefixes with at least one representative.*
## Intended Use
- Fine-tuning smaller models for CLI assistance and terminal command generation
- Training reasoning capabilities for shell/DevOps tasks
- Building agentic tool-calling systems that operate in terminal environments
## Limitations
- Commands are biased toward Linux/Bash; macOS and other shells are underrepresented
- The model occasionally suggests modernized alternatives instead of the exact original command (e.g., `python3 -m http.server` instead of `python -m SimpleHTTPServer`)
- Complex commands with heavy quoting/escaping were filtered out, slightly underrepresenting `awk`/`sed`/`perl` one-liners
提供机构:
LocoreMind



