five

emirkaanozdemr/bash_command_data_6K

收藏
Hugging Face2026-03-25 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/emirkaanozdemr/bash_command_data_6K
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: prompt dtype: string - name: completion dtype: string splits: - name: train num_bytes: 942238 num_examples: 6153 download_size: 476358 dataset_size: 942238 configs: - config_name: default data_files: - split: train path: data/train-* license: apache-2.0 task_categories: - text-generation language: - en tags: - code pretty_name: lds size_categories: - 1K<n<10K --- # 📦 Bash Command Dataset v1 A high-quality dataset of **natural language instructions paired with their equivalent Bash commands**, designed for training and fine-tuning large language models (LLMs) that translate English tasks into shell commands. This dataset is ideal for researchers, developers, and machine learning engineers interested in **natural language to Bash command translation**, command-line automation, and building intelligent terminal assistants. --- ## 📁 Dataset Structure The dataset contains a single split (`train`) in JSONL / JSON format. Each example consists of two fields: - **`prompt`** *(string)*: A natural language description of a task that a user wants to perform in a Unix-like shell. - **`completion`** *(string)*: The corresponding Bash command (or command sequence) that fulfills the described task. ### Example Entry ```json {"prompt": "List all files in the current directory, including hidden ones, in long format.", "completion": "ls -la\n"} ``` # 🔧 Use Cases This dataset can be used for: - 🧠 **Fine-tuning LLMs** to convert English instructions into Bash commands. - 💻 **Building AI assistants** for command-line automation. - 📊 **Evaluating model performance** on shell command generation tasks. - 🚀 **Research** on NL2SH (Natural Language to Shell) translation systems. ## ⚠️ Safety and Usage Notes - **Execution Caution:** Some commands may be destructive (e.g., deletion of files). Always execute generated commands in a safe environment (sandbox or container) before running them on real systems. - **Environment Specifics:** Commands are written for Linux-like systems (e.g., Ubuntu). Some commands or options might behave differently on other shells or distributions. - Responsible use is strongly recommended. ## 📜 License This dataset is shared under the **Apache 2.0 License** ## 📈 Download and Usage You can load the dataset directly using the Hugging Face Datasets library: ```python from datasets import load_dataset dataset = load_dataset("emirkaanozdemr/bash_command_data_6K") ``` # 📚 Citation If you use this dataset in your work, please cite it as: ```latex @misc{ozdemir2026bash, author = {Emir Kaan Ozdemir}, title = {Bash Command Dataset 6K}, year = {2026}, publisher = {Hugging Face}, journal = {Hugging Face Dataset}, url = {https://huggingface.co/datasets/emirkaanozdemr/bash_command_data_6K} }
提供机构:
emirkaanozdemr
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作