five

andyburgin/kubefix

收藏
Hugging Face2024-05-11 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/andyburgin/kubefix
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - en license: cc-by-4.0 configs: - config_name: default dataset_info: features: - name: instruction dtype: string - name: input dtype: string - name: output dtype: string - name: source dtype: string tags: - kubernetes --- **Kubernetes Fault Analysis and Resolution Dataset** ==================================================== **Introduction** --------------- The purpose of this dataset is for finetuning a model for use with [K8sGPT](https://k8sgpt.ai/) for fault analysis and resolution. Ultimately the resulting LLM is intended to be self-hosted in a GPU free environment running under [local-ai](https://localai.io/basics/kubernetes/) in Kubernetes. For a detailed description of the method used to generate the dataset and the resultant [andyburgin/Phi-3-mini-4k-instruct-kubefix-v0.1-gguf](https://huggingface.co/andyburgin/Phi-3-mini-4k-instruct-kubefix-v0.1-gguf) model please see the [kubefix-llm repo](https://github.com/andyburgin/kubefix-llm). **Data Sources** ---------------- The dataset contains a series of Question and Answer pairs in alpaca format generated from a subset of the Kubernetes documentation from the [English markdown files](https://github.com/kubernetes/website/tree/main/content/en/docs). The Q&A pairs have been generated from the documents using an opensource model (to avoid licencing issues for some free models or SaaS services) - after much trial and error the [openchat-3.5-0106](https://huggingface.co/TheBloke/openchat-3.5-0106-GGUF) model was found to be the least problematic. **Dataset Statistics** --------------------- * Total rows: 2564 **Dataset Structure** --------------------- The dataset is stored in alpaca format and consists of four columns: * `instruction`: the question of the Q & A pair * `input`: blank - can have additional training information * `output` : the generated output to the instruction (the answer of the Q & A pair) * `source` : the path within [documentation](https://github.com/kubernetes/website/tree/main/content/en/docs) which was used to create the Q&A pair. The dataset contains approximately 2.5K+ instructions, each with its corresponding output, providing a rich source of fault finding information for training and fine-tuning language models. **License** --------- The Kubernetes Fault Analysis and Resolution Dataset is licensed under the [CC-BY 4.0 license](https://creativecommons.org/licenses/by/4.0/) as is the source [documentation](https://github.com/kubernetes/website/tree/main/content/en/docs).
提供机构:
andyburgin
原始信息汇总

Kubernetes Fault Analysis and Resolution Dataset

数据集概述

  • 目的: 用于微调模型,以支持K8sGPT进行故障分析和解决。
  • 最终应用: 在Kubernetes中无GPU环境下,通过local-ai自托管运行。

数据来源

  • 内容: 从Kubernetes文档的英文Markdown文件中提取的问答对。
  • 生成方法: 使用openchat-3.5-0106开源模型生成,以避免许可问题。

数据集统计

  • 总行数: 2564

数据集结构

  • 格式: alpaca格式
  • 列信息:
    • instruction: 问答对中的问题
    • input: 空白,可用于附加训练信息
    • output: 对指令的生成输出(问答对中的答案)
    • source: 用于创建问答对的文档路径

许可证

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作