five

PolyData/polymarket_trade_capture_5Mar2026

收藏
Hugging Face2026-04-06 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/PolyData/polymarket_trade_capture_5Mar2026
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-4.0 task_categories: - tabular-classification - time-series-forecasting tags: - polymarket - prediction-markets - blockchain - trading - orderbook size_categories: - 1B<n<10B --- # Polymarket Trade Capture (through March 5, 2026) Every filled order on Polymarket's CLOB (Central Limit Order Book) from launch through March 5, 2026. Sourced from on-chain events via the Goldsky indexer. ## Overview | Metric | Value | |--------|-------| | Date range | 2022-11-21 to 2026-03-05 | | Partitions | 1,194 days | | Parquet files | 1,643 | | Size on disk | ~21 GB | | Estimated total rows | ~1-2 billion | ## Schema | Column | Type | Description | |--------|------|-------------| | `timestamp` | int64 | Unix timestamp (seconds) of the fill | | `maker` | string | Maker wallet address (0x...) | | `makerAssetId` | string | Maker's asset token ID (USDC = "0", otherwise CTF token ID) | | `makerAmountFilled` | int64 | Amount filled on maker side (6 decimal USDC or CTF shares) | | `taker` | string | Taker wallet address (0x...) | | `takerAssetId` | string | Taker's asset token ID | | `takerAmountFilled` | int64 | Amount filled on taker side | | `transactionHash` | string | On-chain transaction hash | | `date` | string | Partition key (YYYY-MM-DD) | ## How to use ```python from datasets import load_dataset # Stream without downloading everything ds = load_dataset("PolyData/polymarket_trade_capture_5Mar2026", streaming=True) for row in ds["train"]: print(row) break # Load a specific date partition import pyarrow.parquet as pq ds = load_dataset( "PolyData/polymarket_trade_capture_5Mar2026", data_files="data/date=2026-01-15/*.parquet" ) ``` ## Interpreting the data **Buy vs. Sell:** When `makerAssetId == "0"`, the maker is paying USDC (buying tokens). When `takerAssetId == "0"`, the taker is paying USDC. **Price calculation:** `price = usdc_amount / (usdc_amount + token_amount)` where amounts are from opposite sides of the fill. **Token IDs:** Each binary market has two token IDs (YES and NO). Map token IDs to markets using Polymarket's API: `GET https://clob.polymarket.com/markets/{condition_id}` **Amounts:** Both USDC and CTF token amounts use 6 decimal places. Divide by 1,000,000 for human-readable values. ## Data source Raw `OrderFilled` events from the Polymarket CTF Exchange contract on Polygon, indexed by Goldsky. This is public blockchain data. ## Partitioning Data is Hive-partitioned by date (`date=YYYY-MM-DD/`). High-volume dates (typically >5M fills) are split across multiple parquet files (`data_0.parquet`, `data_1.parquet`). ## Limitations - No market metadata (condition IDs, questions, categories). Join with Polymarket's API or CLOB endpoints. - No order-level data (only fills). Resting order placement/cancellation events are not included. - Token IDs are raw CTF ERC-1155 IDs. You need external mapping to determine YES vs NO side. - Coverage ends March 5, 2026. No updates planned. ## License CC-BY-4.0. This is public blockchain data. Attribution appreciated but not required.
提供机构:
PolyData
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作