five

LocoreMind/qwen3.5-27b-cli-reasoning-3632x

收藏
Hugging Face2026-02-28 更新2026-04-05 收录
下载链接:
https://hf-mirror.com/datasets/LocoreMind/qwen3.5-27b-cli-reasoning-3632x
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - en license: cc-by-4.0 size_categories: - 1K<n<10K task_categories: - text-generation - question-answering tags: - reasoning - cli - terminal - bash - linux - distillation - synthetic - thinking - chain-of-thought pretty_name: Qwen3.5-27B CLI Reasoning 3632x dataset_info: features: - name: messages list: - name: role dtype: string - name: content dtype: string splits: - name: train num_examples: 3632 --- # Qwen3.5-27B CLI Reasoning 3632x A synthetic reasoning dataset for CLI/terminal command assistance, distilled from **Qwen3.5-27B** with thinking mode enabled. Each sample contains a realistic user scenario describing a terminal task, paired with the model's reasoning chain (`<think>`) and a structured JSON answer (`command` + `description`). ## Dataset Summary | | | |---|---| | **Source model** | Qwen3.5-27B (DashScope API) | | **Samples** | 3,632 | | **Thinking mode** | Enabled (budget: 4096 tokens) | | **Source data** | [b-mc2/cli-commands-explained](https://huggingface.co/datasets/b-mc2/cli-commands-explained) (16K CLI commands) | | **Command coverage** | 3,034 unique command prefixes (100% of source) | | **Quality filter** | JSON parseable + `bash -n` syntax check passed | | **License** | CC-BY-4.0 | ## Creation Pipeline This dataset was built using a three-phase distillation pipeline: 1. **Phase 1 — Prompt Generation** (non-thinking mode): Each raw CLI command entry (`code`, `title`, `description`, `date`) was fed to Qwen3.5-27B to generate a realistic, scenario-based user prompt with environment context (`Current time`, `Current OS`, `Shell`). The model infers OS/Shell from the command itself. 2. **Phase 2 — Thinking Distillation**: The generated prompts were sent back to Qwen3.5-27B with thinking mode enabled. The model reasons about the user's problem and outputs a structured JSON answer. 3. **Phase 3 — Quality Filtering**: Responses were validated for JSON parseability and command syntax (`bash -n`). 89 samples with broken JSON escaping or truncated multi-line commands were filtered out (2.4% rejection rate). ### Stratified Sampling 3,721 samples were selected from the 16K source dataset using stratified sampling: - All high-vote commands (votes ≥ 11): 797 classic commands - One representative per unique command prefix: 2,924 additional commands - 100% coverage of all 3,034 command prefixes in the source ## Data Format ```json { "messages": [ { "role": "system", "content": "" }, { "role": "user", "content": "Current time: 2009-01-26 10:26:48\nCurrent OS: Linux\nShell: Bash\n\nI just tried to install a package but got a \"permission denied\" error because I forgot to run it with sudo. Is there a quick way to re-run that exact same command as root without having to type the whole thing out again?" }, { "role": "assistant", "content": "<think>The user ran a command without sudo and got a permission denied error. They want to quickly re-run the same command but with sudo. In bash, there's a quick way to do this using `sudo !!`...</think>\n```json\n{\n \"command\": \"sudo !!\",\n \"description\": \"This uses bash's history expansion feature where '!!' represents the last command you executed. Prepending 'sudo' allows you to immediately re-run that exact command with elevated privileges without retyping it.\"\n}\n```" } ] } ``` ## Statistics | Metric | Min | Avg | Max | |--------|-----|-----|-----| | User prompt length | 243c | 411c | 622c | | Reasoning (`<think>`) length | 272c | 1,920c | 16,135c | | Answer length | 100c | 411c | 1,958c | ## Usage ```python from datasets import load_dataset dataset = load_dataset("LocoreMind/qwen3.5-27b-cli-reasoning-3632x") ``` ## Command Prefix Coverage (Top 10) `find`(1079) · `for`(600) · `echo`(579) · `sudo`(451) · `curl`(364) · `ls`(325) · `cat`(304) · `grep`(254) · `git`(251) · `sed`(241) *Counts refer to the full source dataset; the sampled subset covers all prefixes with at least one representative.* ## Intended Use - Fine-tuning smaller models for CLI assistance and terminal command generation - Training reasoning capabilities for shell/DevOps tasks - Building agentic tool-calling systems that operate in terminal environments ## Limitations - Commands are biased toward Linux/Bash; macOS and other shells are underrepresented - The model occasionally suggests modernized alternatives instead of the exact original command (e.g., `python3 -m http.server` instead of `python -m SimpleHTTPServer`) - Complex commands with heavy quoting/escaping were filtered out, slightly underrepresenting `awk`/`sed`/`perl` one-liners
提供机构:
LocoreMind
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作