moikapy/0xKobolds

Name: moikapy/0xKobolds
Creator: moikapy
Published: 2026-04-11 07:18:36
License: 暂无描述

Hugging Face2026-04-11 更新2026-04-12 收录

下载链接：

https://hf-mirror.com/datasets/moikapy/0xKobolds

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: mit task_categories: - text-generation - conversational tags: - agent-traces - coding-agent - pi - 0xkobold - ollama - open-source - developer-tools size_categories: - n<1K --- # 0xKobolds — AI Coding Agent Sessions A growing public dataset of real coding agent sessions from building [0xKobold](https://github.com/0xKobold/0xkobolds) — an open-source AI assistant framework built on [pi](https://pi.dev) with multi-agent orchestration, hot-reload skills, and local LLM support via Ollama. ## What this is Every session in this dataset is an **unedited, redacted trace** of me working with AI to build and debug 0xKobold. This includes: - **Architecture design** — multi-agent orchestration, event bus, extension system - **Implementation** — TypeScript/Bun code generation, testing, debugging - **Integration work** — Ollama providers, Discord bot, WebSocket gateway, memory systems - **Tool development** — custom pi extensions, skills, hot-reload system - **Problem-solving** — real debugging sessions, fixing type errors, handling API quirks - **Experimentation** — trying approaches, backtracking, iterating on designs ## What you'll find in each session Sessions are in [pi session format](https://github.com/badlogic/pi-mono/blob/main/packages/coding-agent/docs/session.md) — newline-delimited JSON with: | Field | Description | |-------|-------------| | User prompts | What I asked the agent to do | | Assistant responses | Code, explanations, analysis | | Tool calls | `read`, `bash`, `edit`, `write` — the actual operations performed | | Tool results | Command output, file contents, errors | | Thinking blocks | Agent reasoning (when using reasoning models) | ## Privacy & Safety Every session goes through a three-layer pipeline before publication: 1. **Deterministic redaction** — known secrets from `.env`, `.npmrc`, shell configs are replaced with `[REDACTED]` 2. **TruffleHog scan** — verified secret detection as a backstop for anything missed 3. **LLM review** — checks for private infrastructure, personal info, and non-project content Sessions that fail any check are **automatically blocked**. Common reasons sessions get rejected: - Contains literal secrets that survived redaction - Reveals private infrastructure (VPN IPs, self-hosted service hostnames) - Contains content unrelated to the OSS project ## Tools used These sessions were generated using: - **[pi](https://pi.dev)** — AI coding agent harness - **[0xKobold](https://github.com/0xKobold/0xkobolds)** — Custom extensions (pi-orchestration, pi-ollama, pi-learn, pi-secret-guardian) - **[Ollama](https://ollama.com)** — Local and cloud LLM inference (kimi-k2.5, glm-5.1, qwen3-coder) - **[pi-share-hf](https://github.com/badlogic/pi-share-hf)** — Incremental pipeline for collecting, redacting, reviewing, and uploading sessions ## Updates This dataset is updated automatically via `pi-share-hf` on an incremental basis — only new or changed sessions are processed each run. Check the `manifest.jsonl` for the full list of included sessions and their hashes. ## License MIT — use these traces for training, analysis, research, or whatever you find useful.

提供机构：

moikapy

搜集汇总

数据集介绍

构建方式

在人工智能辅助编程领域，0xKobolds数据集通过自动化流程系统性地采集了真实的编码代理会话。该数据集基于pi框架与0xKobold扩展工具，在开发开源AI助手框架的过程中，完整记录了从架构设计到代码调试的全周期交互。每个会话均经过三层隐私处理流水线：首先执行确定性脱敏，移除环境配置中的敏感信息；随后通过TruffleHog进行秘密检测扫描；最终经由大型语言模型审查，确保不包含私人基础设施或无关内容。这种增量式处理机制仅同步新增或变更的会话，保障了数据集的持续更新与安全性。

特点

该数据集的核心价值在于其真实性与完整性，所有会话均保持原始交互痕迹，未经人工编辑。会话内容涵盖多智能体编排、事件总线设计、TypeScript/Bun代码生成等高级编程场景，完整呈现了AI辅助下的技术决策与问题解决过程。数据采用标准化的pi会话格式，以换行分隔的JSON结构清晰记录用户指令、智能体响应、工具调用及执行结果等关键元素。特别值得注意的是，数据集包含了智能体推理过程的思维链记录，为研究编码智能体的认知机制提供了珍贵素材。这种结构化的真实工作流数据，在开源编程助手研究领域具有显著的稀缺性。

使用方法

研究者可通过HuggingFace平台直接访问该数据集，利用其标准化JSON结构进行多维度分析。每条会话记录可作为训练数据用于改进代码生成模型的指令遵循能力，或作为评估基准测试智能体在真实开发场景中的表现。开发人员可借鉴会话中的架构设计模式与问题解决策略，优化自身AI编程工作流程。对于学术研究，该数据集支持对多智能体协作机制、工具调用模式及调试行为展开定量分析。使用前建议查阅manifest.jsonl文件获取会话清单，并根据MIT许可证条款在训练、分析与研究等场景中自由运用这些真实编程轨迹。

背景与挑战

背景概述

在人工智能与软件工程交叉领域，AI编程助手正逐步改变开发流程，0xKobolds数据集应运而生，旨在记录真实编程代理会话。该数据集由0xKobold开源项目团队于近期创建，核心研究问题聚焦于多智能体协同架构、热重载技能系统及本地大语言模型集成在代码生成与调试中的实际应用。通过捕捉未经编辑的交互痕迹，包括架构设计、TypeScript/Bun实现、工具开发及问题解决过程，数据集为研究AI辅助编程的行为模式、工具调用逻辑及协作效率提供了宝贵资源，对推动开源AI开发框架的演进具有显著影响力。

当前挑战

该数据集致力于应对AI编程代理在复杂软件开发任务中的效能评估挑战，具体涉及多智能体协调、代码错误实时调试及异构工具链集成等核心问题。在构建过程中，团队面临多重挑战：首先，隐私与安全保护要求极高，需通过确定性编辑、自动化秘密检测及LLM审核三层流水线确保会话数据不泄露敏感信息；其次，数据格式需统一为pi会话结构，以兼容性方式整合用户提示、助手响应、工具调用及结果输出；此外，增量更新机制要求高效处理新增或变更会话，同时维持数据完整性与可追溯性。

常用场景

经典使用场景

在人工智能驱动的软件开发领域，0xKobolds数据集为研究者提供了宝贵的真实编码代理会话轨迹。这些轨迹记录了开发者在构建开源AI助手框架0xKobold过程中，与AI代理协作完成架构设计、代码生成、调试集成等任务的完整交互序列。数据集以未编辑但经过脱敏的会话格式呈现，涵盖了从用户提示、助手响应到工具调用与结果的全链条数据，为分析AI编码代理的行为模式、决策逻辑与工具使用策略提供了实证基础。

衍生相关工作

围绕0xKobolds数据集，已衍生出多项聚焦于智能编码代理的前沿研究。例如，基于会话轨迹的分析工作探索了多智能体在软件架构设计中的分工协同机制；工具调用序列的研究优化了AI代理对读写、调试与系统命令的决策准确性；此外，结合Ollama本地模型与pi框架的扩展研究，进一步推动了低延迟、高隐私的编码助手在边缘计算场景下的应用。这些工作共同丰富了AI驱动软件开发的方法论体系与工具生态。

数据集最近研究