andyburgin/kubefix
收藏Hugging Face2024-05-11 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/andyburgin/kubefix
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
license: cc-by-4.0
configs:
- config_name: default
dataset_info:
features:
- name: instruction
dtype: string
- name: input
dtype: string
- name: output
dtype: string
- name: source
dtype: string
tags:
- kubernetes
---
**Kubernetes Fault Analysis and Resolution Dataset**
====================================================
**Introduction**
---------------
The purpose of this dataset is for finetuning a model for use with [K8sGPT](https://k8sgpt.ai/) for fault analysis and resolution. Ultimately the resulting LLM is intended to be self-hosted in a GPU free environment running under [local-ai](https://localai.io/basics/kubernetes/) in Kubernetes.
For a detailed description of the method used to generate the dataset and the resultant [andyburgin/Phi-3-mini-4k-instruct-kubefix-v0.1-gguf](https://huggingface.co/andyburgin/Phi-3-mini-4k-instruct-kubefix-v0.1-gguf) model please see the [kubefix-llm repo](https://github.com/andyburgin/kubefix-llm).
**Data Sources**
----------------
The dataset contains a series of Question and Answer pairs in alpaca format generated from a subset of the Kubernetes documentation from the [English markdown files](https://github.com/kubernetes/website/tree/main/content/en/docs). The Q&A pairs have been generated from the documents using an opensource model (to avoid licencing issues for some free models or SaaS services) - after much trial and error the [openchat-3.5-0106](https://huggingface.co/TheBloke/openchat-3.5-0106-GGUF) model was found to be the least problematic.
**Dataset Statistics**
---------------------
* Total rows: 2564
**Dataset Structure**
---------------------
The dataset is stored in alpaca format and consists of four columns:
* `instruction`: the question of the Q & A pair
* `input`: blank - can have additional training information
* `output` : the generated output to the instruction (the answer of the Q & A pair)
* `source` : the path within [documentation](https://github.com/kubernetes/website/tree/main/content/en/docs) which was used to create the Q&A pair.
The dataset contains approximately 2.5K+ instructions, each with its corresponding output, providing a rich source of fault finding information for training and fine-tuning language models.
**License**
---------
The Kubernetes Fault Analysis and Resolution Dataset is licensed under the [CC-BY 4.0 license](https://creativecommons.org/licenses/by/4.0/) as is the source [documentation](https://github.com/kubernetes/website/tree/main/content/en/docs).
提供机构:
andyburgin
原始信息汇总
Kubernetes Fault Analysis and Resolution Dataset
数据集概述
数据来源
- 内容: 从Kubernetes文档的英文Markdown文件中提取的问答对。
- 生成方法: 使用openchat-3.5-0106开源模型生成,以避免许可问题。
数据集统计
- 总行数: 2564
数据集结构
- 格式: alpaca格式
- 列信息:
instruction: 问答对中的问题input: 空白,可用于附加训练信息output: 对指令的生成输出(问答对中的答案)source: 用于创建问答对的文档路径
许可证
- 许可证: CC-BY 4.0



