Crypto-Address-Annotation-10K
收藏魔搭社区2025-12-05 更新2025-12-06 收录
下载链接:
https://modelscope.cn/datasets/Codatta/Crypto-Address-Annotation-10K
下载链接
链接失效反馈官方服务:
资源简介:
# Codatta Crypto Address Annotations (Sample)
## Dataset Summary
This repository contains a **10,000-row sample** of the Codatta Crypto Address Annotations database. The full dataset comprises over **500 million** labeled address pairs across multiple blockchains.
The data provides critical metadata such as entity names, categories (e.g., Exchanges, DeFi, Scam), and risk levels. It is curated through a unique hybrid approach, combining high-quality community contributions from the **Codatta** platform and collaborative data from the **Microscope Protocol**.
By bridging crowdsourced intelligence with standardized open protocols, this dataset aims to solve the problem of fragmented and siloed blockchain metadata.
## Data Sources & Protocols
The high fidelity of this dataset is achieved through two primary sources:
1. **Codatta Platform:** Data aggregated from Codatta's decentralized data intelligence network, verified by the contributor community.
2. **Microscope Protocol:** We utilize the Microscope Protocol, an open-source initiative for collaboratively labeling crypto addresses.
* **Official Website:** [https://microscopeprotocol.net/](https://microscopeprotocol.net/)
* **Context:** [Microscope: Protocol for Collaboratively Labeling Crypto Addresses (Coinbase Blog)](https://www.coinbase.com/blog/microscope-protocol-for-collaboratively-labeling-crypto-addresses)
## Dataset Structure
### Data Fields
* **`chain`** (string): The blockchain network (e.g., `bitcoin`, `ethereum`).
* **`address`** (string): The wallet or contract address.
* **`name`** (string): Specific label name (e.g., "Binance Deposit Address").
* **`entity`** (string): The entity owning the address (e.g., "Binance").
* **`category`** (string): Functional classification (e.g., `CEX`, `DeFi`, `Scam`). See full list at [Microscope Protocol Categories](https://docs.microscopeprotocol.xyz/onboarding/data/categories).
* **`source`** (string): Origin of the label (`ground_truth`, `external`, `heuristics`, `machine_learning`).
## Full Dataset Statistics
While this repository provides a 10k sample for testing and research, our complete database covers a massive scale of on-chain entities. Below is the distribution of our full **517.61 Million** labeled pairs.
### 1. Distribution by Chain
*Total labeled (chain, address) pairs: 517,609,906*
| Chain | Address Count |
| :--- | :--- |
| **Bitcoin** | 185,818,811 |
| **Polygon** | 129,251,435 |
| **Ethereum** | 107,420,551 |
| **BSC** (BNB Chain) | 61,043,196 |
| **Tron** | 22,055,631 |
| **Optimism** | 4,926,779 |
| **Arbitrum** | 3,488,054 |
| **Avalanche** | 3,369,625 |
| **Base** | 233,495 |
| *Others (Litecoin, Solana, etc.)* | < 2,000 |
### 2. Distribution by Category (Top Categories)
| Category | Address Count |
| :--- | :--- |
| **Smart Contract** | 235,023,302 |
| **Exchange** | 187,716,225 |
| **Scam** | 24,261,860 |
| **Gambling** | 21,677,685 |
| **Ransom** | 17,991,759 |
| **DeFi User** | 16,825,002 |
| **Bridging User** | 11,516,923 |
| **DEX User** | 7,286,609 |
| **ERC20 Token** | 5,442,026 |
| **Mixer** | 5,186,762 |
| **Wallet** | 2,724,811 |
| **FIAT** | 2,145,036 |
| **Darknet** | 2,110,659 |
| **Sanctioned** | 1,101,861 |
*(Table truncated for brevity; other categories include Lending User, Liquid Staking, NFT, etc.)*
## Access to Full Dataset
We are releasing this **10k sample** publicly to foster research and demonstrate data quality.
**If you are interested in accessing the full dataset (500M+ labels) for commercial use, advanced analytics, or security integration, please contact us.**
# Codatta加密货币地址标注数据集(样本版)
## 数据集概览
本仓库包含Codatta加密货币地址标注数据库的**1万行样本集**。完整数据集涵盖跨多条区块链的超**5亿条**带标注的地址对。
该数据集提供实体名称、分类(例如交易所、去中心化金融(DeFi)、诈骗项目)以及风险等级等关键元数据。其采用独特的混合式构建方案,整合了来自**Codatta平台(Codatta)**的高质量社区贡献数据,以及来自**显微镜协议(Microscope Protocol)**的协作标注数据。
通过将众包情报与标准化开放协议相结合,本数据集旨在解决区块链元数据碎片化、孤岛化的痛点。
## 数据来源与协议
数据集的高保真特性源自两大核心来源:
1. **Codatta平台(Codatta)**:数据源自Codatta的去中心化数据智能网络,经贡献者社区验证。
2. **显微镜协议(Microscope Protocol)**:本数据集采用该开源协作标注加密货币地址的项目的数据。
* **官方网站**:[https://microscopeprotocol.net/](https://microscopeprotocol.net/)
* **相关背景**:[显微镜协议:协作标注加密货币地址的协议(Coinbase博客)](https://www.coinbase.com/blog/microscope-protocol-for-collaboratively-labeling-crypto-addresses)
## 数据集结构
### 数据字段
* **`chain`(字符串型)**:区块链网络(例如`bitcoin`、`ethereum`)。
* **`address`(字符串型)**:钱包或合约地址。
* **`name`(字符串型)**:具体标注名称(例如“币安充值地址”)。
* **`entity`(字符串型)**:地址所属实体(例如“币安”)。
* **`category`(字符串型)**:功能分类(例如`CEX`、`DeFi`、`Scam`),完整分类列表见[显微镜协议分类指南](https://docs.microscopeprotocol.xyz/onboarding/data/categories)。
* **`source`(字符串型)**:标注来源(`ground_truth`、`external`、`heuristics`、`machine_learning`)。
## 完整数据集统计数据
尽管本仓库仅提供1万条样本用于测试与研究,但完整数据库涵盖了超大规模的链上实体。以下为完整的**517,609,906条**标注地址对的分布情况。
### 1. 按区块链分布
*总标注(链,地址)对数量:517,609,906*
| 区块链 | 地址数量 |
| :--- | :--- |
| **比特币(Bitcoin)** | 185,818,811 |
| **Polygon** | 129,251,435 |
| **以太坊(Ethereum)** | 107,420,551 |
| **币安智能链(BSC,BNB Chain)** | 61,043,196 |
| **波场(Tron)** | 22,055,631 |
| **Optimism** | 4,926,779 |
| **Arbitrum** | 3,488,054 |
| **雪崩链(Avalanche)** | 3,369,625 |
| **Base** | 233,495 |
| *其他(莱特币、索拉纳(Solana)等)* | < 2,000 |
### 2. 按分类分布(热门分类)
| 分类 | 地址数量 |
| :--- | :--- |
| **智能合约(Smart Contract)** | 235,023,302 |
| **交易所(Exchange)** | 187,716,225 |
| **诈骗项目(Scam)** | 24,261,860 |
| **博彩(Gambling)** | 21,677,685 |
| **勒索地址(Ransom)** | 17,991,759 |
| **去中心化金融用户(DeFi User)** | 16,825,002 |
| **跨链用户(Bridging User)** | 11,516,923 |
| **去中心化交易所用户(DEX User)** | 7,286,609 |
| **ERC20代币(ERC20 Token)** | 5,442,026 |
| **混币器(Mixer)** | 5,186,762 |
| **钱包(Wallet)** | 2,724,811 |
| **法币(FIAT)** | 2,145,036 |
| **暗网(Darknet)** | 2,110,659 |
| **受制裁实体(Sanctioned)** | 1,101,861 |
*(表格因篇幅所限仅展示部分内容;其余分类包括借贷用户、流动性质押、非同质化代币(NFT)等。)*
## 完整数据集获取渠道
我们公开发布此**1万条样本集**,旨在推动相关研究并展示数据集的质量。
若您有意获取完整数据集(超5亿条标注数据)用于商业用途、高级分析或安全集成,请联系我们。
提供机构:
maas
创建时间:
2025-11-28



