Jcrandall541/ethereum-arbitrage
收藏Hugging Face2025-11-23 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/Jcrandall541/ethereum-arbitrage
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
task_categories:
- text-generation
- question-answering
language:
- en
tags:
- crypto
- defi
- ethereum
- solidity
- trading
- documentation
- code
size_categories:
- 100K<n<1M
---
# Crypto & DeFi Documentation Dataset
A comprehensive dataset of cryptocurrency, DeFi, and blockchain documentation and code suitable for LLM training.
## Dataset Description
This dataset contains scraped and processed documentation from various crypto/DeFi sources including:
- **Rust Ethereum libraries** (ethers-rs, etc.)
- **Solidity documentation** (official Solidity language docs)
- **Smart contracts** (Uniswap, Aave, Balancer, SushiSwap, etc.)
- **Trading bots** (MEV, flashloans, arbitrage)
- **Protocol documentation** (Tenderly, Alchemy, etc.)
## Dataset Statistics
- **Total Records**: 794,655
- **Estimated Tokens**: 75,890,740
- **Created**: 2025-11-23T02:36:21.391841
### By Category
| Category | Count |
|----------|-------|
| code | 9,153 |
| data | 885 |
| documentation | 698,443 |
| infrastructure | 5,954 |
| smart_contract | 76,307 |
| trading_bot | 3,913 |
### By Language
| Language | Count |
|----------|-------|
| rust | 483,803 |
| unknown | 177,872 |
| javascript | 71,476 |
| solidity | 47,370 |
| typescript | 9,912 |
| python | 2,871 |
| markdown | 1,235 |
| toml | 76 |
| console | 22 |
| ts14 | 5 |
| json | 3 |
| b | 3 |
| md | 1 |
| ts90 | 1 |
| ts304 | 1 |
## Data Format
Each record is a JSON object with the following fields:
```json
{
"id": "unique_hash_id",
"source": "https://github.com/...",
"file": "original_filename.sol",
"chunk_id": 0,
"category": "smart_contract",
"language": "solidity",
"content": "// SPDX-License-Identifier...",
"token_estimate": 150
}
```
## Usage
```python
from datasets import load_dataset
# Load the dataset
dataset = load_dataset("your-username/crypto-defi-docs", split="train")
# Filter by category
contracts = dataset.filter(lambda x: x['category'] == 'smart_contract')
# Filter by language
solidity = dataset.filter(lambda x: x['language'] == 'solidity')
```
## Sources
- docs.rs (Rust crate documentation)
- docs.soliditylang.org (Solidity official docs)
- GitHub repositories (Uniswap, Flashbots, etc.)
- Protocol documentation (Tenderly, Alchemy, Balancer, etc.)
## License
This dataset is provided for educational and research purposes. Individual components may have their own licenses. Please check the original sources for licensing information.
提供机构:
Jcrandall541



