roanbrasil/k8s-rag-corpus

Name: roanbrasil/k8s-rag-corpus
Creator: roanbrasil
Published: 2026-04-12 04:09:00
License: 暂无描述

Hugging Face2026-04-12 更新2026-04-26 收录

下载链接：

https://hf-mirror.com/datasets/roanbrasil/k8s-rag-corpus

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: apache-2.0 task_categories: - text-generation language: - en tags: - kubernetes - devops - rag - retrieval-augmented-generation - helm - argocd - gitops pretty_name: K8s RAG Corpus size_categories: - 1K<n<10K --- # K8s RAG Corpus A curated corpus of **4,794 Kubernetes-specific documents** (7.5MB) used as the retrieval index for the BM25 RAG component of the K8s Multi-Agent Debate system. ## Contents | Source | Documents | |--------|-----------| | Official k8s.io examples (kubernetes/website) | ~200 | | Helm chart examples | ~150 | | Flux CD / HelmRelease examples | ~100 | | ArgoCD Application templates | ~100 | | RBAC, NetworkPolicy, HPA patterns | ~200 | | Curated hardcoded examples (29 complex patterns) | 29 | | Local production YAML files | 5,443 | | **Total** | **~4,794 indexed docs** | ## Key Patterns Covered - HPA v2 with CPU/memory metrics - StatefulSet with volumeClaimTemplates - Ingress with TLS + pathType - PVC with StorageClass - CronJob with restartPolicy - NetworkPolicy with ingress/egress rules - ClusterRole + ClusterRoleBinding - HelmRelease (Flux CD) - Kustomization overlays - ArgoCD Application with syncPolicy - Multi-container pods with sidecars - Resource limits and requests ## Format Plain text, one document per line, with `---DOC---` separator between documents. Used with `rank-bm25` (BM25Okapi) for retrieval. ## Usage ```python from rank_bm25 import BM25Okapi import re with open("rag_k8s.txt") as f: raw = f.read() docs = [d.strip() for d in raw.split("---DOC---") if d.strip()] tokenized = [re.findall(r"[\w:/.-]+", d.lower()) for d in docs] bm25 = BM25Okapi(tokenized) query = "HorizontalPodAutoscaler cpu utilization" tokens = re.findall(r"[\w:/.-]+", query.lower()) top3 = bm25.get_top_n(tokens, docs, n=3) ``` ## Paper > Brasil, R. (2025). *Can Small Domain-Specific LLMs Compete with General 7B Models on Kubernetes Configuration Generation?* > Code: https://github.com/roanbrasil/llm-pocs ## Related - K8sBench benchmark: https://huggingface.co/datasets/roanbrasil/k8sbench - MDA Demo: https://huggingface.co/spaces/roanbrasil/k8s-multi-agent - AttnRes model: https://huggingface.co/roanbrasil/attnres-devops-gpt

提供机构：

roanbrasil

5,000+

优质数据集

54 个

任务类型

进入经典数据集