zenlm/zen-agentic-dataset
收藏Hugging Face2026-02-26 更新2026-01-03 收录
下载链接:
https://hf-mirror.com/datasets/zenlm/zen-agentic-dataset
下载链接
链接失效反馈官方服务:
资源简介:
Zen Agentic数据集是一个包含84.7亿token的大规模编程相关数据集,涵盖实际AI编程、区块链开发和基础设施代码。该数据集综合了Claude Code交互和来自1,400多个仓库的完整git历史,时间跨度达15年(2010-2025)。数据组成包括Claude代码调试会话(29%)、Claude对话(13%)、Claude交互(10%)和git历史(48%)。覆盖领域包括:1)代理AI和LLM基础设施,如模型上下文协议、多代理协调和代理框架;2)Web3和区块链,如智能合约、共识引擎和跨链桥;3)密码学和安全性,如后量子密码学、阈值密码学和零知识证明;4)现代开发技术,如全栈TypeScript、系统编程和DevOps。数据集可用于研究和商业许可,并已用于训练多个模型。
The Zen Agentic Dataset is a large-scale programming-related dataset containing 8.47 billion tokens of real-world agentic AI programming, blockchain development, and cutting-edge infrastructure code. This comprehensive training dataset combines Claude Code interactions with full git history from over 1,400 repositories spanning 15 years of professional development (2010-2025). Data composition includes Claude Code Debug Sessions (29%), Claude Conversations (13%), Claude Interactions (10%), and Git History (48%). Domain coverage includes: 1) Agentic AI & LLM Infrastructure (MCP, multi-agent orchestration, agent frameworks); 2) Web3 & Blockchain (smart contracts, consensus engines, cross-chain bridges); 3) Cryptography & Security (post-quantum cryptography, threshold cryptography, ZK proofs); 4) Modern Development (full-stack TypeScript, systems programming, DevOps). The dataset is available for research and commercial licensing and has been used to train multiple models.
提供机构:
zenlm



