BlockDB ERC20 Tokens Details | Ethereum & EVM Chains | Historical, EOD, Real-Time | Crypto Token ...
收藏Databricks2025-10-08 收录
下载链接:
https://marketplace.databricks.com/details/4a2beed2-2546-4ff9-8bbb-2c24c9211eb2/BlockDB_BlockDB-ERC20-Tokens-Details-Ethereum-&-EVM-Chains-Historical,-EOD,-Real-Time-Crypto-Token-
下载链接
链接失效反馈官方服务:
资源简介:
🟦 What this is
Canonical ERC-20 token reference with deterministic tracing at the row level. One row per token contract, with audit-grade lineage to the first recognition event and to parent/genesis derivations.
• Schema-stable, versioned, audit-ready
• Historical + real-time options
🌐 Chains / Coverage
ETH, BSC, Base, Arbitrum, Unichain, Avalanche, Polygon, Celo, Linea, Optimism (others on request).
Full history from chain genesis; reorg-aware real-time ingestion and updates.
📑 Schema
List the columns exactly as delivered. Keep names/types consistent with files.
• contract_address BYTEA - PK; 20-byte ERC-20 contract address
• tracing_id BYTEA - deterministic row-level hash (proof-of-derivation)
• parent_tracing_ids BYTEA - salted hash(es) of immediate parent rows in the derivation graph
• genesis_tracing_ids BYTEA - salted hash(es) of original sources (genesis of the derivation path)
• genesis_block_number BIGINT - first block where the token was recognized
• genesis_tx_index INTEGER - tx index for that event
• genesis_log_index INTEGER - log index for that event
• name TEXT - ERC-20 name()
• symbol TEXT - ERC-20 symbol()
• decimals SMALLINT - ERC-20 decimals()
Notes
• Use encode(contract_address,'hex') for hex presentation.
• Metadata (name, symbol, decimals) is populated from ABI reads.
• If the ABI read was unsuccessful, the token is not present in this table (columns are NOT NULL by design).
🔑 Keys & Joins
• Primary key: contract_address
• Lineage triple for joins to raw events: (genesis_block_number, genesis_tx_index, genesis_log_index)
🧬 Lineage & Reproducibility
Every row has a verifiable path back to the originating raw events via the lineage triple and tracing graph:
• tracing_id - this row’s identity
• parent_tracing_ids - immediate sources
• genesis_tracing_ids - original on-chain sources
This supports audits and exact reprocessing to source transactions/logs/function calls.
📈 Common uses
• Token registry to normalize joins for swaps, transfers, pools, and prices
• Amount scaling via decimals for analytics, PnL, and model features
• App backends: display names/symbols and validate token addresses
🚚 Delivery
By default
• WebSocket (API/WSS) reorg-aware live emissions when a new update is available; <140 ms median latency on ETH streams (7-day).
• SFTP server for archives and daily End-of-Day (EOD) snapshots.
• Model Context Protocol (MCP) for AI workflows (pull slices, schemas, lineage).
Optional
• Integrations to Amazon S3, Azure Blob Storage, Snowflake, and other enterprise platforms on request.
🗂️ Files (time-partitioned in UTC, compressed)
• Parquet
• CSV
• XLS
• JSON
💡 Quality and operations
• Reorg-aware ingestion.
• 99.95% uptime SLA.
• Backfills to chain genesis.
• Versioned, schema-stable datasets; changes are additive and announced.
🔄 Change policy
Schema is stable. Any breaking change ships as a new version (e.g., erc20_tokens_v2) with migration notes. Content updates are additive (new rows/fields filled); types aren’t changed in place.
🟦 数据集概况
标准化ERC-20代币(ERC-20)参考数据集,支持行级确定性溯源。每个代币合约对应一行数据,具备审计级别的溯源链路,可追溯至首次识别事件以及父代/创始衍生关系。
• 架构稳定、版本化且适配审计需求
• 支持历史数据与实时数据两种获取方式
🌐 支持链与覆盖范围
支持ETH、BSC、Base、Arbitrum、Unichain、Avalanche、Polygon、Celo、Linea、Optimism链(可按需扩展其他链)。覆盖链创世以来的完整历史数据;支持感知链重组(reorg)的实时数据摄入与更新。
📑 数据架构
请严格按照交付格式列明字段,字段名称与类型需与文件保持一致。
• contract_address(BYTEA):主键;20字节的ERC-20合约地址
• tracing_id(BYTEA):行级确定性哈希(衍生证明)
• parent_tracing_ids(BYTEA):衍生图中直接父代行的加盐哈希值
• genesis_tracing_ids(BYTEA):原始数据源(衍生路径的创始节点)的加盐哈希值
• genesis_block_number(BIGINT):代币首次被识别的区块高度
• genesis_tx_index(INTEGER):该识别事件的交易索引
• genesis_log_index(INTEGER):该识别事件的日志索引
• name(TEXT):ERC-20代币的name()方法返回值
• symbol(TEXT):ERC-20代币的symbol()方法返回值
• decimals(SMALLINT):ERC-20代币的decimals()方法返回值
备注
• 如需十六进制展示格式,可使用encode(contract_address,'hex')函数
• 元数据(name、symbol、decimals)通过应用二进制接口(ABI)读取获取
• 若ABI读取失败,则该代币不会出现在本表中(本表字段默认设置为非空约束)
🔑 主键与关联规则
• 主键:contract_address
• 用于关联原始事件的溯源三元组:(genesis_block_number, genesis_tx_index, genesis_log_index)
🧬 溯源链路与可复现性
所有行均可通过溯源三元组与溯源图谱验证其至原始事件的完整链路:
• tracing_id:当前行的唯一标识
• parent_tracing_ids:直接数据源
• genesis_tracing_ids:链上原始数据源
该设计可支持审计工作,并支持对源交易、日志与函数调用进行精确重处理。
📈 典型应用场景
• 代币注册表:用于标准化交易兑换、转账、流动性池与价格数据的关联操作
• 通过decimals字段进行金额换算,用于数据分析、盈亏计算与模型特征构建
• 应用后端:展示代币名称与符号,并验证代币合约地址的合法性
🚚 数据交付方式
默认交付方式
• WebSocket(API/WSS):支持感知链重组的实时数据推送,当有新数据更新时立即推送;ETH链数据流的中位延迟低于140毫秒(基于7天统计)
• SFTP服务器:用于获取归档数据与每日日终(EOD)快照
• 模型上下文协议(MCP):适配AI工作流,支持拉取数据切片、数据架构与溯源信息
可选交付方式
• 可按需集成至Amazon S3、Azure Blob Storage、Snowflake及其他企业级数据平台
🗂️ 数据文件格式(按UTC时间分区,采用压缩存储)
• Parquet格式
• CSV格式
• XLS格式
• JSON格式
💡 数据质量与运维保障
• 支持感知链重组的数据摄入
• 服务等级协议(SLA)承诺可用率达99.95%
• 支持回溯至链创世节点的历史数据补全
• 数据集采用版本化管理,架构稳定;所有更新均为增量式,并会提前通知
🔄 版本更新策略
数据架构保持稳定。任何破坏性变更均会以新版本形式发布(例如erc20_tokens_v2),并附带迁移说明。内容更新均为增量式(新增行或填充字段),不会对现有字段类型进行原地修改。
提供机构:
BlockDB



