five

luxfi/zen-agentic-dataset

收藏
Hugging Face2025-12-31 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/luxfi/zen-agentic-dataset
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: other license_name: commercial license_link: https://hanzo.ai/contact language: - en tags: - agentic - coding - llm - training - claude - programming size_categories: - 1B<n<10B --- # Zen Agentic Dataset **8.47 Billion Tokens** of real-world agentic AI programming, blockchain development, and cutting-edge infrastructure code. ## Dataset Overview A comprehensive training dataset combining Claude Code interactions with full git history from 1,400+ repositories spanning 15 years of professional development. | Metric | Value | |--------|-------| | **Total Tokens** | 8.47 billion | | **Training Samples** | 3.35 million | | **Validation Samples** | 100,000 | | **Total Size** | ~27 GB | | **Repositories** | 1,452 | | **Time Span** | 15 years (2010-2025) | ## Data Composition | Component | Tokens | Percentage | |-----------|--------|------------| | Claude Code Debug Sessions | 2.42B | 29% | | Claude Conversations | 1.14B | 13% | | Claude Interactions | 0.86B | 10% | | Git History | 4.03B | 48% | ## Domain Coverage ### Agentic AI & LLM Infrastructure - Model Context Protocol (MCP) - 260+ tool implementations - Multi-agent orchestration - Claude, GPT-4, Gemini integrations - Agent frameworks - Planning, memory, tool use, reflection ### Web3 & Blockchain - Smart contracts - Solidity, Vyper (ERC20, ERC721, DeFi) - Consensus engines - Snow family, BFT, DAG-based protocols - Cross-chain bridges and DeFi protocols ### Cryptography & Security - Post-quantum cryptography implementations - Threshold cryptography and MPC - Zero-knowledge proofs experimentation ### Modern Development - Full-stack TypeScript - Next.js 14+, React 18+ - Systems programming - Rust, Go, Python, C/C++ - DevOps - Docker, Kubernetes, CI/CD ## Licensing & Access **This dataset is available for research and commercial licensing.** ### For Developers & Researchers We award grants to individuals and teams who want to train models on this dataset, particularly those building: - Models for specific blockchain ecosystems - Open-source AI tools using OpenAI-compatible protocols - Research advancing agentic AI capabilities ### To Request Access **Contact:** z@hanzo.ai Please include: - Intended use case (training, research, evaluation) - Organization/affiliation - Target ecosystem (if applicable) - Licensing requirements ### Supported Organizations Dataset mirrors are maintained by: - [Hanzo AI](https://hanzo.ai) - AI infrastructure platform - [Lux Network](https://lux.network) - AI compute settlement layer - [Zen LM](https://zenlm.org) - Open model research - [Zoo Labs](https://zoo.ngo) - Decentralized AI research ## Models Trained on This Dataset | Model | Size | Architecture | Status | |-------|------|--------------|--------| | Zen Coder 4B | 4B | Qwen3 | Trained | | Zen Coder 24B | 24B | Devstral Small 2 | Trained | | Zen Coder 123B | 123B | Devstral 2 | Training | | Zen Coder Max | 358B | GLM-4.7 (MoE) | Planned | | Zen Coder Ultra | 1T | Kimi K2 (MoE) | Planned | ## Training Framework Use [Zen Trainer](https://github.com/zenlm/zen-trainer) for fine-tuning: ```python from zen_trainer import ZenTrainer trainer = ZenTrainer( model_key="qwen3-4b", dataset_path="hanzoai/zen-agentic-dataset-private", # Requires access output_dir="./output/my-model", ) trainer.train() ``` ## Related Projects - [Zen Trainer](https://github.com/zenlm/zen-trainer) - Training framework - [Hanzo MCP](https://github.com/hanzoai/mcp) - Model Context Protocol (260+ tools) - [Hanzo AI](https://hanzo.ai) - AI infrastructure platform - [Lux Network](https://lux.network) - AI compute settlement layer - [Zoo Labs](https://zoo.ngo) - Decentralized AI research ## Citation ```bibtex @dataset{zen_agentic_dataset, author = {Kelling, Zach}, title = {Zen Agentic Dataset: 8.47B Tokens of Agentic AI Programming}, year = {2025}, publisher = {Zoo Labs Foundation}, url = {https://huggingface.co/datasets/hanzoai/zen-agentic-dataset} } ``` --- **Maintainer:** z@hanzo.ai **License:** Commercial - Contact for licensing terms
提供机构:
luxfi
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作