five

Crypto-Address-Annotation-10K

收藏
魔搭社区2025-12-05 更新2025-12-06 收录
下载链接:
https://modelscope.cn/datasets/Codatta/Crypto-Address-Annotation-10K
下载链接
链接失效反馈
官方服务:
资源简介:
# Codatta Crypto Address Annotations (Sample) ## Dataset Summary This repository contains a **10,000-row sample** of the Codatta Crypto Address Annotations database. The full dataset comprises over **500 million** labeled address pairs across multiple blockchains. The data provides critical metadata such as entity names, categories (e.g., Exchanges, DeFi, Scam), and risk levels. It is curated through a unique hybrid approach, combining high-quality community contributions from the **Codatta** platform and collaborative data from the **Microscope Protocol**. By bridging crowdsourced intelligence with standardized open protocols, this dataset aims to solve the problem of fragmented and siloed blockchain metadata. ## Data Sources & Protocols The high fidelity of this dataset is achieved through two primary sources: 1. **Codatta Platform:** Data aggregated from Codatta's decentralized data intelligence network, verified by the contributor community. 2. **Microscope Protocol:** We utilize the Microscope Protocol, an open-source initiative for collaboratively labeling crypto addresses. * **Official Website:** [https://microscopeprotocol.net/](https://microscopeprotocol.net/) * **Context:** [Microscope: Protocol for Collaboratively Labeling Crypto Addresses (Coinbase Blog)](https://www.coinbase.com/blog/microscope-protocol-for-collaboratively-labeling-crypto-addresses) ## Dataset Structure ### Data Fields * **`chain`** (string): The blockchain network (e.g., `bitcoin`, `ethereum`). * **`address`** (string): The wallet or contract address. * **`name`** (string): Specific label name (e.g., "Binance Deposit Address"). * **`entity`** (string): The entity owning the address (e.g., "Binance"). * **`category`** (string): Functional classification (e.g., `CEX`, `DeFi`, `Scam`). See full list at [Microscope Protocol Categories](https://docs.microscopeprotocol.xyz/onboarding/data/categories). * **`source`** (string): Origin of the label (`ground_truth`, `external`, `heuristics`, `machine_learning`). ## Full Dataset Statistics While this repository provides a 10k sample for testing and research, our complete database covers a massive scale of on-chain entities. Below is the distribution of our full **517.61 Million** labeled pairs. ### 1. Distribution by Chain *Total labeled (chain, address) pairs: 517,609,906* | Chain | Address Count | | :--- | :--- | | **Bitcoin** | 185,818,811 | | **Polygon** | 129,251,435 | | **Ethereum** | 107,420,551 | | **BSC** (BNB Chain) | 61,043,196 | | **Tron** | 22,055,631 | | **Optimism** | 4,926,779 | | **Arbitrum** | 3,488,054 | | **Avalanche** | 3,369,625 | | **Base** | 233,495 | | *Others (Litecoin, Solana, etc.)* | < 2,000 | ### 2. Distribution by Category (Top Categories) | Category | Address Count | | :--- | :--- | | **Smart Contract** | 235,023,302 | | **Exchange** | 187,716,225 | | **Scam** | 24,261,860 | | **Gambling** | 21,677,685 | | **Ransom** | 17,991,759 | | **DeFi User** | 16,825,002 | | **Bridging User** | 11,516,923 | | **DEX User** | 7,286,609 | | **ERC20 Token** | 5,442,026 | | **Mixer** | 5,186,762 | | **Wallet** | 2,724,811 | | **FIAT** | 2,145,036 | | **Darknet** | 2,110,659 | | **Sanctioned** | 1,101,861 | *(Table truncated for brevity; other categories include Lending User, Liquid Staking, NFT, etc.)* ## Access to Full Dataset We are releasing this **10k sample** publicly to foster research and demonstrate data quality. **If you are interested in accessing the full dataset (500M+ labels) for commercial use, advanced analytics, or security integration, please contact us.**

# Codatta加密货币地址标注数据集(样本版) ## 数据集概览 本仓库包含Codatta加密货币地址标注数据库的**1万行样本集**。完整数据集涵盖跨多条区块链的超**5亿条**带标注的地址对。 该数据集提供实体名称、分类(例如交易所、去中心化金融(DeFi)、诈骗项目)以及风险等级等关键元数据。其采用独特的混合式构建方案,整合了来自**Codatta平台(Codatta)**的高质量社区贡献数据,以及来自**显微镜协议(Microscope Protocol)**的协作标注数据。 通过将众包情报与标准化开放协议相结合,本数据集旨在解决区块链元数据碎片化、孤岛化的痛点。 ## 数据来源与协议 数据集的高保真特性源自两大核心来源: 1. **Codatta平台(Codatta)**:数据源自Codatta的去中心化数据智能网络,经贡献者社区验证。 2. **显微镜协议(Microscope Protocol)**:本数据集采用该开源协作标注加密货币地址的项目的数据。 * **官方网站**:[https://microscopeprotocol.net/](https://microscopeprotocol.net/) * **相关背景**:[显微镜协议:协作标注加密货币地址的协议(Coinbase博客)](https://www.coinbase.com/blog/microscope-protocol-for-collaboratively-labeling-crypto-addresses) ## 数据集结构 ### 数据字段 * **`chain`(字符串型)**:区块链网络(例如`bitcoin`、`ethereum`)。 * **`address`(字符串型)**:钱包或合约地址。 * **`name`(字符串型)**:具体标注名称(例如“币安充值地址”)。 * **`entity`(字符串型)**:地址所属实体(例如“币安”)。 * **`category`(字符串型)**:功能分类(例如`CEX`、`DeFi`、`Scam`),完整分类列表见[显微镜协议分类指南](https://docs.microscopeprotocol.xyz/onboarding/data/categories)。 * **`source`(字符串型)**:标注来源(`ground_truth`、`external`、`heuristics`、`machine_learning`)。 ## 完整数据集统计数据 尽管本仓库仅提供1万条样本用于测试与研究,但完整数据库涵盖了超大规模的链上实体。以下为完整的**517,609,906条**标注地址对的分布情况。 ### 1. 按区块链分布 *总标注(链,地址)对数量:517,609,906* | 区块链 | 地址数量 | | :--- | :--- | | **比特币(Bitcoin)** | 185,818,811 | | **Polygon** | 129,251,435 | | **以太坊(Ethereum)** | 107,420,551 | | **币安智能链(BSC,BNB Chain)** | 61,043,196 | | **波场(Tron)** | 22,055,631 | | **Optimism** | 4,926,779 | | **Arbitrum** | 3,488,054 | | **雪崩链(Avalanche)** | 3,369,625 | | **Base** | 233,495 | | *其他(莱特币、索拉纳(Solana)等)* | < 2,000 | ### 2. 按分类分布(热门分类) | 分类 | 地址数量 | | :--- | :--- | | **智能合约(Smart Contract)** | 235,023,302 | | **交易所(Exchange)** | 187,716,225 | | **诈骗项目(Scam)** | 24,261,860 | | **博彩(Gambling)** | 21,677,685 | | **勒索地址(Ransom)** | 17,991,759 | | **去中心化金融用户(DeFi User)** | 16,825,002 | | **跨链用户(Bridging User)** | 11,516,923 | | **去中心化交易所用户(DEX User)** | 7,286,609 | | **ERC20代币(ERC20 Token)** | 5,442,026 | | **混币器(Mixer)** | 5,186,762 | | **钱包(Wallet)** | 2,724,811 | | **法币(FIAT)** | 2,145,036 | | **暗网(Darknet)** | 2,110,659 | | **受制裁实体(Sanctioned)** | 1,101,861 | *(表格因篇幅所限仅展示部分内容;其余分类包括借贷用户、流动性质押、非同质化代币(NFT)等。)* ## 完整数据集获取渠道 我们公开发布此**1万条样本集**,旨在推动相关研究并展示数据集的质量。 若您有意获取完整数据集(超5亿条标注数据)用于商业用途、高级分析或安全集成,请联系我们。
提供机构:
maas
创建时间:
2025-11-28
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作