five

b-mc2/cli-commands-explained

收藏
Hugging Face2024-04-12 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/b-mc2/cli-commands-explained
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc0-1.0 task_categories: - text-generation - question-answering language: - en tags: - terminal - CLI - code - NLP - commandlinefu - cheatsheets pretty_name: cli-commands-explained size_categories: - 10K<n<100K --- #### Overview This dataset is a collection of **16,098** command line instructions sourced from [Commandlinefu](https://www.commandlinefu.com/commands/browse) and [Cheatsheets](https://github.com/cheat/cheatsheets/tree/master). It includes an array of commands, each with an id, title, description, date, url to source, author, votes, and flag indicating if the description is AI generated. The descriptions are primarily authored by the original contributors, for entries where descriptions were absent, they have been generated using [NeuralBeagle14-7B](https://huggingface.co/mlabonne/NeuralBeagle14-7B). Out of the total entries, **10,039** descriptions are originally human-written, while **6,059** have been generated by AI. Format: | Key | Description | Type | |--------------|-----------|------------| | **id** | ID provided by Commandlinefu, content from Cheatsheets has IDs incremented afterwards | int | | **votes** | User votes of a command from Commandlinefu, Cheetsheets default to `0`. | int | | **url** | URL to data source | str | | **title** | Title provided by source | str | | **description** | Description provided by author or AI generated by NeuralBeagle14-7B | str | | **code** | The actual CLI/Terminal Code | str | | **author** | Author credited with code creation | str | | **date** | Date code was created (estimate) | str | | **ai_generated_description** | Flag to indicate if description was human written or AI written | bool | ``` ai_generated_description False 10039 True 6059 ``` #### Cleansing and Augmentation Cleansing and data augmentation has been done on the combined Commandlinefu and Cheatsheets data. Some content from both sources has been removed due to formatting issues. For Cheatsheets, I attempted to attribute an author and date using results from `git log --diff-filter=A --pretty="format:%ai,%an" --follow $file` #### TODO If you have any edits you'd like to see in a version 2 of this dataset, let me know. Random sample: ```json { "id": 13, "votes": 1219, "url": "http://www.commandlinefu.com/commands/view/13/run-the-last-command-as-root", "title": "Run the last command as root", "description": "Useful when you forget to use sudo for a command. \"!!\" grabs the last run command.", "code": "sudo !!", "author": "root", "date": "2009-01-26 10:26:48", "ai_generated_description": false }, { "id": 71, "votes": 846, "url": "http://www.commandlinefu.com/commands/view/71/serve-current-directory-tree-at-httphostname8000", "title": "Serve current directory tree at http://$HOSTNAME:8000/", "description": "This Python command, using the module SimpleHTTPServer, creates a basic web server that serves the current directory and its contents over HTTP on port 8000. When executed, it allows anyone with access to the specified URL (in this case, http://$HOSTNAME:8000/) to view and download files from the current directory as if it were a simple website.", "code": "python -m SimpleHTTPServer", "author": "pixelbeat", "date": "2009-02-05 11:57:43", "ai_generated_description": true }, ``` #### Citing this work ```TeX @misc{b-mc2_2024_cli-commands-explained, title = {cli-commands-explained Dataset}, author = {b-mc2}, year = {2023}, url = {https://huggingface.co/datasets/b-mc2/cli-commands-explained}, note = {This dataset was created by modifying data from the following sources: commandlinefu.com, https://github.com/cheat/cheatsheets/tree/master}, } ```
提供机构:
b-mc2
原始信息汇总

数据集概述

数据集名称: cli-commands-explained

数据集大小: 16,098条命令

数据来源:

  • Commandlinefu
  • Cheatsheets

数据内容:

  • 每条命令包含id, title, description, date, url, author, votes, 以及一个标志(ai_generated_description)指示描述是否由AI生成。

描述生成方式:

  • 10,039条描述由人类编写
  • 6,059条描述由AI(NeuralBeagle14-7B)生成

数据集格式

Key Description Type
id 命令ID,来自Commandlinefu,Cheatsheets的ID后续递增 int
votes 用户投票数,Cheatsheets默认值为0 int
url 数据源URL str
title 命令标题,由源提供 str
description 命令描述,由作者提供或由NeuralBeagle14-7B生成 str
code 实际的CLI/终端代码 str
author 命令创建者 str
date 命令创建日期(估计) str
ai_generated_description 标志,指示描述是否由AI生成 bool

数据清洗与增强

  • 对Commandlinefu和Cheatsheets的数据进行了清洗和增强。
  • 部分内容因格式问题被移除。
  • 对于Cheatsheets,尝试通过git log命令添加作者和日期信息。
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作