JohnBeanerson/pi-mono-test
收藏Hugging Face2026-04-08 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/JohnBeanerson/pi-mono-test
下载链接
链接失效反馈官方服务:
资源简介:
---
pretty_name: coding agent session traces
task_categories:
- text-generation
tags:
- agent-traces
- coding-agent
- pi-share-hf
language:
- en
- code
license: other
---
# Coding agent session traces DUPLICATED FROM badlogicgames/pi-mono FOR TESTING PURPOSES. ORIGINAL README BELOW
This dataset contains redacted coding agent session traces collected while working on https://github.com/badlogic/pi-mono.git. The traces were exported with [pi-share-hf](https://github.com/badlogic/pi-share-hf) from a local [pi](https://pi.dev) workspace and filtered to keep only sessions that passed deterministic redaction and LLM review.
## Data description
Each `*.jsonl` file is a redacted pi session. Sessions are stored as JSON Lines files where each line is a structured session entry. Entries include session headers, user and assistant messages, tool results, model changes, thinking level changes, compaction summaries, branch summaries, and custom extension data.
Pi session files are tree-structured via `id` and `parentId`, so a single session file may contain multiple branches of work. See the upstream session format documentation for the exact schema:
- https://github.com/badlogic/pi-mono/blob/main/packages/coding-agent/docs/session.md
Source git repo: https://github.com/badlogic/pi-mono.git
## Redaction and review
The data was processed with [pi-share-hf](https://github.com/badlogic/pi-share-hf) using deterministic secret redaction plus an LLM review step. Deterministic redaction targets exact known secrets and curated credential patterns. The LLM review decides whether a session is about the OSS project, whether it is fit to share publicly, and whether any sensitive content appears to have been missed.
Embedded images may be preserved in the uploaded sessions unless the workspace was initialized with `--no-images`.
## Limitations
This dataset is best-effort redacted. Coding agent transcripts can still contain sensitive or off-topic content, especially if a session mixed OSS work with unrelated private tasks. Use with appropriate caution.
---
pretty_name: 编码智能体会话轨迹
task_categories:
- 文本生成
tags:
- 智能体轨迹
- 编码智能体
- pi-share-hf
language:
- 英语
- 代码
license: 其他
---
# 编码智能体会话轨迹(为测试目的从badlogicgames/pi-mono复刻,原始README如下)
本数据集包含针对https://github.com/badlogic/pi-mono.git 项目开发过程中收集的经过脱敏处理的编码智能体会话轨迹。这些轨迹通过[pi-share-hf](https://github.com/badlogic/pi-share-hf)从本地[pi](https://pi.dev)工作区导出,并经过筛选,仅保留通过确定性脱敏与大语言模型(LLM)审核的会话。
## 数据说明
每个`*.jsonl`文件均为一条经过脱敏处理的pi会话。会话以JSON Lines格式存储,每行对应一个结构化会话条目。条目内容包含会话头部、用户与助手消息、工具执行结果、模型变更、思维层级变更、压缩摘要、分支摘要以及自定义扩展数据。
pi会话文件通过`id`与`parentId`实现树状结构,因此单个会话文件可包含多条工作分支。如需了解精确的架构规范,请参阅上游会话格式文档:
- https://github.com/badlogic/pi-mono/blob/main/packages/coding-agent/docs/session.md
源代码Git仓库:https://github.com/badlogic/pi-mono.git
## 脱敏与审核
本数据集通过[pi-share-hf](https://github.com/badlogic/pi-share-hf)进行处理,采用确定性敏感信息脱敏辅以大语言模型(LLM)审核流程。确定性脱敏针对已知的敏感信息与精心整理的凭证模式进行处理。LLM审核将判断会话是否围绕开源(OSS)项目展开、是否适合公开发布,以及是否存在遗漏的敏感内容。
除非工作区通过`--no-images`参数初始化,否则上传的会话中可能保留嵌入的图像。
## 局限性说明
本数据集仅尽最大努力完成脱敏处理。编码智能体的会话记录仍可能包含敏感或无关内容,尤其是当会话同时涉及开源项目工作与不相关的私人任务时。请谨慎使用本数据集。
提供机构:
JohnBeanerson



