roanbrasil/k8s-rag-corpus
收藏Hugging Face2026-04-12 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/roanbrasil/k8s-rag-corpus
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
task_categories:
- text-generation
language:
- en
tags:
- kubernetes
- devops
- rag
- retrieval-augmented-generation
- helm
- argocd
- gitops
pretty_name: K8s RAG Corpus
size_categories:
- 1K<n<10K
---
# K8s RAG Corpus
A curated corpus of **4,794 Kubernetes-specific documents** (7.5MB) used as the retrieval index for the BM25 RAG component of the K8s Multi-Agent Debate system.
## Contents
| Source | Documents |
|--------|-----------|
| Official k8s.io examples (kubernetes/website) | ~200 |
| Helm chart examples | ~150 |
| Flux CD / HelmRelease examples | ~100 |
| ArgoCD Application templates | ~100 |
| RBAC, NetworkPolicy, HPA patterns | ~200 |
| Curated hardcoded examples (29 complex patterns) | 29 |
| Local production YAML files | 5,443 |
| **Total** | **~4,794 indexed docs** |
## Key Patterns Covered
- HPA v2 with CPU/memory metrics
- StatefulSet with volumeClaimTemplates
- Ingress with TLS + pathType
- PVC with StorageClass
- CronJob with restartPolicy
- NetworkPolicy with ingress/egress rules
- ClusterRole + ClusterRoleBinding
- HelmRelease (Flux CD)
- Kustomization overlays
- ArgoCD Application with syncPolicy
- Multi-container pods with sidecars
- Resource limits and requests
## Format
Plain text, one document per line, with `---DOC---` separator between documents.
Used with `rank-bm25` (BM25Okapi) for retrieval.
## Usage
```python
from rank_bm25 import BM25Okapi
import re
with open("rag_k8s.txt") as f:
raw = f.read()
docs = [d.strip() for d in raw.split("---DOC---") if d.strip()]
tokenized = [re.findall(r"[\w:/.-]+", d.lower()) for d in docs]
bm25 = BM25Okapi(tokenized)
query = "HorizontalPodAutoscaler cpu utilization"
tokens = re.findall(r"[\w:/.-]+", query.lower())
top3 = bm25.get_top_n(tokens, docs, n=3)
```
## Paper
> Brasil, R. (2025). *Can Small Domain-Specific LLMs Compete with General 7B Models on Kubernetes Configuration Generation?*
> Code: https://github.com/roanbrasil/llm-pocs
## Related
- K8sBench benchmark: https://huggingface.co/datasets/roanbrasil/k8sbench
- MDA Demo: https://huggingface.co/spaces/roanbrasil/k8s-multi-agent
- AttnRes model: https://huggingface.co/roanbrasil/attnres-devops-gpt
提供机构:
roanbrasil



