andyburgin/kubefix

Name: andyburgin/kubefix
Creator: andyburgin
Published: 2024-05-11 18:25:01
License: 暂无描述

Hugging Face2024-05-11 更新2024-06-12 收录

下载链接：

https://hf-mirror.com/datasets/andyburgin/kubefix

下载链接

链接失效反馈

官方服务：

资源简介：

--- language: - en license: cc-by-4.0 configs: - config_name: default dataset_info: features: - name: instruction dtype: string - name: input dtype: string - name: output dtype: string - name: source dtype: string tags: - kubernetes --- **Kubernetes Fault Analysis and Resolution Dataset** ==================================================== **Introduction** --------------- The purpose of this dataset is for finetuning a model for use with [K8sGPT](https://k8sgpt.ai/) for fault analysis and resolution. Ultimately the resulting LLM is intended to be self-hosted in a GPU free environment running under [local-ai](https://localai.io/basics/kubernetes/) in Kubernetes. For a detailed description of the method used to generate the dataset and the resultant [andyburgin/Phi-3-mini-4k-instruct-kubefix-v0.1-gguf](https://huggingface.co/andyburgin/Phi-3-mini-4k-instruct-kubefix-v0.1-gguf) model please see the [kubefix-llm repo](https://github.com/andyburgin/kubefix-llm). **Data Sources** ---------------- The dataset contains a series of Question and Answer pairs in alpaca format generated from a subset of the Kubernetes documentation from the [English markdown files](https://github.com/kubernetes/website/tree/main/content/en/docs). The Q&A pairs have been generated from the documents using an opensource model (to avoid licencing issues for some free models or SaaS services) - after much trial and error the [openchat-3.5-0106](https://huggingface.co/TheBloke/openchat-3.5-0106-GGUF) model was found to be the least problematic. **Dataset Statistics** --------------------- * Total rows: 2564 **Dataset Structure** --------------------- The dataset is stored in alpaca format and consists of four columns: * `instruction`: the question of the Q & A pair * `input`: blank - can have additional training information * `output` : the generated output to the instruction (the answer of the Q & A pair) * `source` : the path within [documentation](https://github.com/kubernetes/website/tree/main/content/en/docs) which was used to create the Q&A pair. The dataset contains approximately 2.5K+ instructions, each with its corresponding output, providing a rich source of fault finding information for training and fine-tuning language models. **License** --------- The Kubernetes Fault Analysis and Resolution Dataset is licensed under the [CC-BY 4.0 license](https://creativecommons.org/licenses/by/4.0/) as is the source [documentation](https://github.com/kubernetes/website/tree/main/content/en/docs).

提供机构：

andyburgin

原始信息汇总

Kubernetes Fault Analysis and Resolution Dataset

数据集概述

目的: 用于微调模型，以支持K8sGPT进行故障分析和解决。
最终应用: 在Kubernetes中无GPU环境下，通过local-ai自托管运行。

数据来源

内容: 从Kubernetes文档的英文Markdown文件中提取的问答对。
生成方法: 使用openchat-3.5-0106开源模型生成，以避免许可问题。

数据集统计

总行数: 2564

数据集结构

格式: alpaca格式
列信息:
- instruction: 问答对中的问题
- input: 空白，可用于附加训练信息
- output: 对指令的生成输出（问答对中的答案）
- source: 用于创建问答对的文档路径

许可证

许可证: CC-BY 4.0

5,000+

优质数据集

54 个

任务类型

进入经典数据集