Ethereum Transaction Datasets for Training and Evaluation LLMs, ML, and DL models.
收藏DataCite Commons2026-05-03 更新2026-05-07 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.19810156
下载链接
链接失效反馈官方服务:
资源简介:
Primary Corpus (Unlabeled). We sourced the complete set of Ethereum transaction traces for calendar year 2024 (blocks 18,908,895 to 21,525,890). The raw dataset comprised 429,745,984 transactions. The majority of these are sim- ple value transfers or contract interactions with negligible fees, which are less indicative of sophisticated attack patterns. To focus on transactions with substantial economic weight and to maintain computational feasibility, we filtered the dataset to transactions with a miner fee (including priority fee) of at least 0.01 ETH (ap- proximately 18 USD at the time of analysis). This filtering yielded a final training corpus of 1,074,346 transactions for unsupervised representation learning and detector training. Evaluation Benchmark (Labeled). To evaluate generalization on novel threats, we constructed a separate, manually verified benchmark. We collected 439 transactions from 2023, 2024, and 2025 that were absent from the primary corpus. Each transaction was investigated via block explorers, security reports (e.g., Rekt News), and community analysis to assign a ground-truth label: malicious (confirmed exploit, hack, or scam) or benign. This benchmark includes diverse attack vectors (e.g., price oracle manipulation, logic bugs, phishing, flash loan attacks, reentrancy, access control vulnerabilities) across various protocols, providing a rigorous test for out-of-sample detection.We have restricted access to the dataset until our research paper is accepted. In the meantime, please contact us via this anonymous email: blocksec12345@gmail.comOR access the dataset here https://anonymous.4open.science/r/tx-lens-artifacts-EB4B/README.md
提供机构:
Zenodo
创建时间:
2026-04-27



