b-mc2/cli-commands-explained
收藏Hugging Face2024-04-12 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/b-mc2/cli-commands-explained
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc0-1.0
task_categories:
- text-generation
- question-answering
language:
- en
tags:
- terminal
- CLI
- code
- NLP
- commandlinefu
- cheatsheets
pretty_name: cli-commands-explained
size_categories:
- 10K<n<100K
---
#### Overview
This dataset is a collection of **16,098** command line instructions sourced from [Commandlinefu](https://www.commandlinefu.com/commands/browse) and [Cheatsheets](https://github.com/cheat/cheatsheets/tree/master). It includes an array of commands, each with an id, title, description, date, url to source, author, votes, and flag indicating if the description is AI generated. The descriptions are primarily authored by the original contributors, for entries where descriptions were absent, they have been generated using [NeuralBeagle14-7B](https://huggingface.co/mlabonne/NeuralBeagle14-7B). Out of the total entries, **10,039** descriptions are originally human-written, while **6,059** have been generated by AI.
Format:
| Key | Description | Type |
|--------------|-----------|------------|
| **id** | ID provided by Commandlinefu, content from Cheatsheets has IDs incremented afterwards | int |
| **votes** | User votes of a command from Commandlinefu, Cheetsheets default to `0`. | int |
| **url** | URL to data source | str |
| **title** | Title provided by source | str |
| **description** | Description provided by author or AI generated by NeuralBeagle14-7B | str |
| **code** | The actual CLI/Terminal Code | str |
| **author** | Author credited with code creation | str |
| **date** | Date code was created (estimate) | str |
| **ai_generated_description** | Flag to indicate if description was human written or AI written | bool |
```
ai_generated_description
False 10039
True 6059
```
#### Cleansing and Augmentation
Cleansing and data augmentation has been done on the combined Commandlinefu and Cheatsheets data. Some content from both sources has been removed due to formatting issues. For Cheatsheets, I attempted to attribute an author and date using results from `git log --diff-filter=A --pretty="format:%ai,%an" --follow $file`
#### TODO
If you have any edits you'd like to see in a version 2 of this dataset, let me know.
Random sample:
```json
{
"id": 13,
"votes": 1219,
"url": "http://www.commandlinefu.com/commands/view/13/run-the-last-command-as-root",
"title": "Run the last command as root",
"description": "Useful when you forget to use sudo for a command. \"!!\" grabs the last run command.",
"code": "sudo !!",
"author": "root",
"date": "2009-01-26 10:26:48",
"ai_generated_description": false
},
{
"id": 71,
"votes": 846,
"url": "http://www.commandlinefu.com/commands/view/71/serve-current-directory-tree-at-httphostname8000",
"title": "Serve current directory tree at http://$HOSTNAME:8000/",
"description": "This Python command, using the module SimpleHTTPServer, creates a basic web server that serves the current directory and its contents over HTTP on port 8000. When executed, it allows anyone with access to the specified URL (in this case, http://$HOSTNAME:8000/) to view and download files from the current directory as if it were a simple website.",
"code": "python -m SimpleHTTPServer",
"author": "pixelbeat",
"date": "2009-02-05 11:57:43",
"ai_generated_description": true
},
```
#### Citing this work
```TeX
@misc{b-mc2_2024_cli-commands-explained,
title = {cli-commands-explained Dataset},
author = {b-mc2},
year = {2023},
url = {https://huggingface.co/datasets/b-mc2/cli-commands-explained},
note = {This dataset was created by modifying data from the following sources: commandlinefu.com, https://github.com/cheat/cheatsheets/tree/master},
}
```
提供机构:
b-mc2
原始信息汇总
数据集概述
数据集名称: cli-commands-explained
数据集大小: 16,098条命令
数据来源:
- Commandlinefu
- Cheatsheets
数据内容:
- 每条命令包含id, title, description, date, url, author, votes, 以及一个标志(ai_generated_description)指示描述是否由AI生成。
描述生成方式:
- 10,039条描述由人类编写
- 6,059条描述由AI(NeuralBeagle14-7B)生成
数据集格式
| Key | Description | Type |
|---|---|---|
| id | 命令ID,来自Commandlinefu,Cheatsheets的ID后续递增 | int |
| votes | 用户投票数,Cheatsheets默认值为0 | int |
| url | 数据源URL | str |
| title | 命令标题,由源提供 | str |
| description | 命令描述,由作者提供或由NeuralBeagle14-7B生成 | str |
| code | 实际的CLI/终端代码 | str |
| author | 命令创建者 | str |
| date | 命令创建日期(估计) | str |
| ai_generated_description | 标志,指示描述是否由AI生成 | bool |
数据清洗与增强
- 对Commandlinefu和Cheatsheets的数据进行了清洗和增强。
- 部分内容因格式问题被移除。
- 对于Cheatsheets,尝试通过
git log命令添加作者和日期信息。



