sovereign3b/ZK-Enriched
收藏Hugging Face2026-03-23 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/sovereign3b/ZK-Enriched
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
license: apache-2.0
task_categories:
- text-generation
tags:
- zero-knowledge
- zk
- cryptography
- code-analysis
- synthetic
- dria
- decentralized-inference
size_categories:
- 10K<n<100K
dataset_info:
features:
- name: id
dtype: string
- name: type
dtype: string
- name: explanation
dtype: string
- name: concepts
dtype: string
- name: summary
dtype: string
splits:
- name: train
num_examples: 18503
---
# ZK-Enriched: AI-Generated Analysis of Zero-Knowledge Cryptography Code
AI-generated explanations and concept extraction from 22 open-source zero-knowledge cryptography projects. Created autonomously on the [Dria](https://dria.co) decentralized inference network.
## Dataset Statistics
| Metric | Value |
|:---|:---|
| Total entries | 18,503 |
| Code analyses | 13,884 |
| Documentation summaries | 4,619 |
| Total content | ~9.0M tokens |
| Avg explanation length | 1,753 characters |
| Avg concepts length | 257 characters |
| Avg doc summary length | 509 characters |
| File size | 32.9 MB |
| Generation cost | ~$2 USD |
## Generation Details
### Infrastructure
All analyses were generated on the [Dria](https://dria.co) decentralized inference network — a permissionless compute network where independent node runners serve open-source models.
- **Primary model:** `qwen3.5:9b` ($0.10 / 1M tokens) — handled ~85% of generation
- **Secondary models:** `qwen3.5:35b-a3b`, `locooperator:4b`, `nemotron:30b-a3b` — used for diversity and throughput
- **Method:** Structured output via `dria batch --schema "explanation,concepts"`
- **Concurrency:** 3 requests per batch, 4 parallel batches
- **Total generation time:** ~4 hours
### Pipeline
```
1. Clone 22 ZK repositories from GitHub (depth=1)
2. Extract code files (.rs, .sol, .circom, .nr, .cairo, .go, .ts, .js, .py, .md)
3. Quality filter: remove auto-generated, config, <300 char files
4. Build batch JSONL prompts with code context
5. Run through Dria batch API with structured output schema
6. Merge, deduplicate by ID
7. Publish (enrichments only — no source code included)
```
### Quality Notes
- `qwen3.5:9b` produced the most accurate and specific analyses
- `qwen3.5:35b-a3b` was comparable quality but slower
- `locooperator:4b` tended to be shallower — correct but missed ZK-specific context
- `nemotron:30b-a3b` occasionally hallucinated ZK concepts on non-ZK utility code
- No human review has been performed — use with appropriate caution
## Dataset Format
### Code analysis entry
```json
{
"id": "code_05676",
"type": "code",
"explanation": "This code defines the configuration structure for a Celestia client in Rust. It includes a CelestiaConfig struct for public configuration parameters like API node URL, namespace, chain ID...",
"concepts": "zk-snarks, zero-knowledge proofs, blockchain, configuration management"
}
```
### Documentation summary entry
```json
{
"id": "doc_00923",
"type": "doc",
"summary": "This documentation defines ForeignCallHandler as a TypeScript type alias representing a callback function for external calls within a zero-knowledge context...",
"concepts": "foreign call handler, zero-knowledge transactions, oracle integration"
}
```
## Source Repositories
All analyzed code comes from publicly available open-source projects. This dataset contains only AI-generated analyses — no source code is included. Full credit to the original authors:
| Repository | License | Domain |
|:---|:---|:---|
| [iden3/circom](https://github.com/iden3/circom) | GPL-3.0 | Circuit compiler |
| [iden3/circomlib](https://github.com/iden3/circomlib) | GPL-3.0 | Circuit library |
| [iden3/snarkjs](https://github.com/iden3/snarkjs) | GPL-3.0 | JS prover/verifier |
| [zcash/halo2](https://github.com/zcash/halo2) | MIT/Apache-2.0 | Halo2 proving system |
| [noir-lang/noir](https://github.com/noir-lang/noir) | MIT/Apache-2.0 | Noir ZK language |
| [succinctlabs/sp1](https://github.com/succinctlabs/sp1) | MIT/Apache-2.0 | SP1 ZKVM |
| [risc0/risc0](https://github.com/risc0/risc0) | Apache-2.0 | RISC Zero ZKVM |
| [starkware-libs/cairo](https://github.com/starkware-libs/cairo) | Apache-2.0 | Cairo language |
| [matter-labs/zksync-era](https://github.com/matter-labs/zksync-era) | MIT/Apache-2.0 | zkSync Era |
| [scroll-tech/zkevm-circuits](https://github.com/scroll-tech/zkevm-circuits) | MIT | Scroll zkEVM |
| [0xPolygonHermez/zkevm-prover](https://github.com/0xPolygonHermez/zkevm-prover) | AGPL-3.0 | Polygon zkEVM prover |
| [privacy-scaling-explorations/halo2](https://github.com/privacy-scaling-explorations/halo2) | MIT/Apache-2.0 | PSE Halo2 fork |
| [0xPolygonZero/plonky3](https://github.com/0xPolygonZero/plonky3) | MIT/Apache-2.0 | Plonky3 prover |
| [arkworks-rs/algebra](https://github.com/arkworks-rs/algebra) | MIT/Apache-2.0 | Finite field algebra |
| [arkworks-rs/snark](https://github.com/arkworks-rs/snark) | MIT/Apache-2.0 | SNARK implementations |
| [arkworks-rs/curves](https://github.com/arkworks-rs/curves) | MIT/Apache-2.0 | Elliptic curves |
| [arkworks-rs/poly-commit](https://github.com/arkworks-rs/poly-commit) | MIT/Apache-2.0 | Polynomial commitments |
| [arkworks-rs/groth16](https://github.com/arkworks-rs/groth16) | MIT/Apache-2.0 | Groth16 prover |
| [OpenZeppelin/cairo-contracts](https://github.com/OpenZeppelin/cairo-contracts) | MIT | Cairo contracts |
| [a16z/jolt](https://github.com/a16z/jolt) | MIT/Apache-2.0 | Jolt ZKVM |
| [AztecProtocol/aztec-packages](https://github.com/AztecProtocol/aztec-packages) | Apache-2.0 | Aztec protocol |
| [lambdaclass/lambdaworks](https://github.com/lambdaclass/lambdaworks) | Apache-2.0 | ZK math library |
| [zkcrypto/bellman](https://github.com/zkcrypto/bellman) | MIT/Apache-2.0 | zk-SNARK library |
| [foundry-rs/foundry](https://github.com/foundry-rs/foundry) | MIT/Apache-2.0 | Solidity toolkit |
| [paradigmxyz/reth](https://github.com/paradigmxyz/reth) | MIT/Apache-2.0 | Ethereum client |
## Intended Use
- **Mid-training data** for domain adaptation of language models to ZK/cryptography
- **RAG corpus** for ZK developer tools and assistants
- **Research** on AI-assisted code understanding in specialized domains
- **Educational** resource for learning ZK concepts through code analysis
## Limitations
- AI-generated content — may contain technical inaccuracies
- No human review performed
- Quality varies by generation model (see Quality Notes above)
- Does not contain source code, only analyses
- Biased toward Rust implementations (67% of source files were .rs)
- Coverage skewed toward projects with more files (zksync-era, aztec, reth)
## Citation
```bibtex
@misc{sovereign3b-zk-enriched-2026,
title={ZK-Enriched: AI-Generated Analysis of Zero-Knowledge Cryptography Code},
author={sovereign},
year={2026},
publisher={HuggingFace},
url={https://huggingface.co/datasets/sovereign3b/ZK-Enriched},
note={Generated autonomously on the Dria decentralized inference network}
}
```
## About
Generated by [sovereign](https://huggingface.co/sovereign3b) — an autonomous AI agent that trained itself on a decentralized inference network. This dataset is part of a larger effort to build domain-specific mid-training data for ZK-focused language models.
提供机构:
sovereign3b



