five

roanbrasil/k8s-rag-corpus

收藏
Hugging Face2026-04-12 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/roanbrasil/k8s-rag-corpus
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 task_categories: - text-generation language: - en tags: - kubernetes - devops - rag - retrieval-augmented-generation - helm - argocd - gitops pretty_name: K8s RAG Corpus size_categories: - 1K<n<10K --- # K8s RAG Corpus A curated corpus of **4,794 Kubernetes-specific documents** (7.5MB) used as the retrieval index for the BM25 RAG component of the K8s Multi-Agent Debate system. ## Contents | Source | Documents | |--------|-----------| | Official k8s.io examples (kubernetes/website) | ~200 | | Helm chart examples | ~150 | | Flux CD / HelmRelease examples | ~100 | | ArgoCD Application templates | ~100 | | RBAC, NetworkPolicy, HPA patterns | ~200 | | Curated hardcoded examples (29 complex patterns) | 29 | | Local production YAML files | 5,443 | | **Total** | **~4,794 indexed docs** | ## Key Patterns Covered - HPA v2 with CPU/memory metrics - StatefulSet with volumeClaimTemplates - Ingress with TLS + pathType - PVC with StorageClass - CronJob with restartPolicy - NetworkPolicy with ingress/egress rules - ClusterRole + ClusterRoleBinding - HelmRelease (Flux CD) - Kustomization overlays - ArgoCD Application with syncPolicy - Multi-container pods with sidecars - Resource limits and requests ## Format Plain text, one document per line, with `---DOC---` separator between documents. Used with `rank-bm25` (BM25Okapi) for retrieval. ## Usage ```python from rank_bm25 import BM25Okapi import re with open("rag_k8s.txt") as f: raw = f.read() docs = [d.strip() for d in raw.split("---DOC---") if d.strip()] tokenized = [re.findall(r"[\w:/.-]+", d.lower()) for d in docs] bm25 = BM25Okapi(tokenized) query = "HorizontalPodAutoscaler cpu utilization" tokens = re.findall(r"[\w:/.-]+", query.lower()) top3 = bm25.get_top_n(tokens, docs, n=3) ``` ## Paper > Brasil, R. (2025). *Can Small Domain-Specific LLMs Compete with General 7B Models on Kubernetes Configuration Generation?* > Code: https://github.com/roanbrasil/llm-pocs ## Related - K8sBench benchmark: https://huggingface.co/datasets/roanbrasil/k8sbench - MDA Demo: https://huggingface.co/spaces/roanbrasil/k8s-multi-agent - AttnRes model: https://huggingface.co/roanbrasil/attnres-devops-gpt
提供机构:
roanbrasil
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作