five

sovereign3b/ZK-Enriched

收藏
Hugging Face2026-03-23 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/sovereign3b/ZK-Enriched
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - en license: apache-2.0 task_categories: - text-generation tags: - zero-knowledge - zk - cryptography - code-analysis - synthetic - dria - decentralized-inference size_categories: - 10K<n<100K dataset_info: features: - name: id dtype: string - name: type dtype: string - name: explanation dtype: string - name: concepts dtype: string - name: summary dtype: string splits: - name: train num_examples: 18503 --- # ZK-Enriched: AI-Generated Analysis of Zero-Knowledge Cryptography Code AI-generated explanations and concept extraction from 22 open-source zero-knowledge cryptography projects. Created autonomously on the [Dria](https://dria.co) decentralized inference network. ## Dataset Statistics | Metric | Value | |:---|:---| | Total entries | 18,503 | | Code analyses | 13,884 | | Documentation summaries | 4,619 | | Total content | ~9.0M tokens | | Avg explanation length | 1,753 characters | | Avg concepts length | 257 characters | | Avg doc summary length | 509 characters | | File size | 32.9 MB | | Generation cost | ~$2 USD | ## Generation Details ### Infrastructure All analyses were generated on the [Dria](https://dria.co) decentralized inference network — a permissionless compute network where independent node runners serve open-source models. - **Primary model:** `qwen3.5:9b` ($0.10 / 1M tokens) — handled ~85% of generation - **Secondary models:** `qwen3.5:35b-a3b`, `locooperator:4b`, `nemotron:30b-a3b` — used for diversity and throughput - **Method:** Structured output via `dria batch --schema "explanation,concepts"` - **Concurrency:** 3 requests per batch, 4 parallel batches - **Total generation time:** ~4 hours ### Pipeline ``` 1. Clone 22 ZK repositories from GitHub (depth=1) 2. Extract code files (.rs, .sol, .circom, .nr, .cairo, .go, .ts, .js, .py, .md) 3. Quality filter: remove auto-generated, config, <300 char files 4. Build batch JSONL prompts with code context 5. Run through Dria batch API with structured output schema 6. Merge, deduplicate by ID 7. Publish (enrichments only — no source code included) ``` ### Quality Notes - `qwen3.5:9b` produced the most accurate and specific analyses - `qwen3.5:35b-a3b` was comparable quality but slower - `locooperator:4b` tended to be shallower — correct but missed ZK-specific context - `nemotron:30b-a3b` occasionally hallucinated ZK concepts on non-ZK utility code - No human review has been performed — use with appropriate caution ## Dataset Format ### Code analysis entry ```json { "id": "code_05676", "type": "code", "explanation": "This code defines the configuration structure for a Celestia client in Rust. It includes a CelestiaConfig struct for public configuration parameters like API node URL, namespace, chain ID...", "concepts": "zk-snarks, zero-knowledge proofs, blockchain, configuration management" } ``` ### Documentation summary entry ```json { "id": "doc_00923", "type": "doc", "summary": "This documentation defines ForeignCallHandler as a TypeScript type alias representing a callback function for external calls within a zero-knowledge context...", "concepts": "foreign call handler, zero-knowledge transactions, oracle integration" } ``` ## Source Repositories All analyzed code comes from publicly available open-source projects. This dataset contains only AI-generated analyses — no source code is included. Full credit to the original authors: | Repository | License | Domain | |:---|:---|:---| | [iden3/circom](https://github.com/iden3/circom) | GPL-3.0 | Circuit compiler | | [iden3/circomlib](https://github.com/iden3/circomlib) | GPL-3.0 | Circuit library | | [iden3/snarkjs](https://github.com/iden3/snarkjs) | GPL-3.0 | JS prover/verifier | | [zcash/halo2](https://github.com/zcash/halo2) | MIT/Apache-2.0 | Halo2 proving system | | [noir-lang/noir](https://github.com/noir-lang/noir) | MIT/Apache-2.0 | Noir ZK language | | [succinctlabs/sp1](https://github.com/succinctlabs/sp1) | MIT/Apache-2.0 | SP1 ZKVM | | [risc0/risc0](https://github.com/risc0/risc0) | Apache-2.0 | RISC Zero ZKVM | | [starkware-libs/cairo](https://github.com/starkware-libs/cairo) | Apache-2.0 | Cairo language | | [matter-labs/zksync-era](https://github.com/matter-labs/zksync-era) | MIT/Apache-2.0 | zkSync Era | | [scroll-tech/zkevm-circuits](https://github.com/scroll-tech/zkevm-circuits) | MIT | Scroll zkEVM | | [0xPolygonHermez/zkevm-prover](https://github.com/0xPolygonHermez/zkevm-prover) | AGPL-3.0 | Polygon zkEVM prover | | [privacy-scaling-explorations/halo2](https://github.com/privacy-scaling-explorations/halo2) | MIT/Apache-2.0 | PSE Halo2 fork | | [0xPolygonZero/plonky3](https://github.com/0xPolygonZero/plonky3) | MIT/Apache-2.0 | Plonky3 prover | | [arkworks-rs/algebra](https://github.com/arkworks-rs/algebra) | MIT/Apache-2.0 | Finite field algebra | | [arkworks-rs/snark](https://github.com/arkworks-rs/snark) | MIT/Apache-2.0 | SNARK implementations | | [arkworks-rs/curves](https://github.com/arkworks-rs/curves) | MIT/Apache-2.0 | Elliptic curves | | [arkworks-rs/poly-commit](https://github.com/arkworks-rs/poly-commit) | MIT/Apache-2.0 | Polynomial commitments | | [arkworks-rs/groth16](https://github.com/arkworks-rs/groth16) | MIT/Apache-2.0 | Groth16 prover | | [OpenZeppelin/cairo-contracts](https://github.com/OpenZeppelin/cairo-contracts) | MIT | Cairo contracts | | [a16z/jolt](https://github.com/a16z/jolt) | MIT/Apache-2.0 | Jolt ZKVM | | [AztecProtocol/aztec-packages](https://github.com/AztecProtocol/aztec-packages) | Apache-2.0 | Aztec protocol | | [lambdaclass/lambdaworks](https://github.com/lambdaclass/lambdaworks) | Apache-2.0 | ZK math library | | [zkcrypto/bellman](https://github.com/zkcrypto/bellman) | MIT/Apache-2.0 | zk-SNARK library | | [foundry-rs/foundry](https://github.com/foundry-rs/foundry) | MIT/Apache-2.0 | Solidity toolkit | | [paradigmxyz/reth](https://github.com/paradigmxyz/reth) | MIT/Apache-2.0 | Ethereum client | ## Intended Use - **Mid-training data** for domain adaptation of language models to ZK/cryptography - **RAG corpus** for ZK developer tools and assistants - **Research** on AI-assisted code understanding in specialized domains - **Educational** resource for learning ZK concepts through code analysis ## Limitations - AI-generated content — may contain technical inaccuracies - No human review performed - Quality varies by generation model (see Quality Notes above) - Does not contain source code, only analyses - Biased toward Rust implementations (67% of source files were .rs) - Coverage skewed toward projects with more files (zksync-era, aztec, reth) ## Citation ```bibtex @misc{sovereign3b-zk-enriched-2026, title={ZK-Enriched: AI-Generated Analysis of Zero-Knowledge Cryptography Code}, author={sovereign}, year={2026}, publisher={HuggingFace}, url={https://huggingface.co/datasets/sovereign3b/ZK-Enriched}, note={Generated autonomously on the Dria decentralized inference network} } ``` ## About Generated by [sovereign](https://huggingface.co/sovereign3b) — an autonomous AI agent that trained itself on a decentralized inference network. This dataset is part of a larger effort to build domain-specific mid-training data for ZK-focused language models.
提供机构:
sovereign3b
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作