PolyData/polymarket_trade_capture_5Mar2026
收藏Hugging Face2026-04-06 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/PolyData/polymarket_trade_capture_5Mar2026
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-4.0
task_categories:
- tabular-classification
- time-series-forecasting
tags:
- polymarket
- prediction-markets
- blockchain
- trading
- orderbook
size_categories:
- 1B<n<10B
---
# Polymarket Trade Capture (through March 5, 2026)
Every filled order on Polymarket's CLOB (Central Limit Order Book) from launch through March 5, 2026. Sourced from on-chain events via the Goldsky indexer.
## Overview
| Metric | Value |
|--------|-------|
| Date range | 2022-11-21 to 2026-03-05 |
| Partitions | 1,194 days |
| Parquet files | 1,643 |
| Size on disk | ~21 GB |
| Estimated total rows | ~1-2 billion |
## Schema
| Column | Type | Description |
|--------|------|-------------|
| `timestamp` | int64 | Unix timestamp (seconds) of the fill |
| `maker` | string | Maker wallet address (0x...) |
| `makerAssetId` | string | Maker's asset token ID (USDC = "0", otherwise CTF token ID) |
| `makerAmountFilled` | int64 | Amount filled on maker side (6 decimal USDC or CTF shares) |
| `taker` | string | Taker wallet address (0x...) |
| `takerAssetId` | string | Taker's asset token ID |
| `takerAmountFilled` | int64 | Amount filled on taker side |
| `transactionHash` | string | On-chain transaction hash |
| `date` | string | Partition key (YYYY-MM-DD) |
## How to use
```python
from datasets import load_dataset
# Stream without downloading everything
ds = load_dataset("PolyData/polymarket_trade_capture_5Mar2026", streaming=True)
for row in ds["train"]:
print(row)
break
# Load a specific date partition
import pyarrow.parquet as pq
ds = load_dataset(
"PolyData/polymarket_trade_capture_5Mar2026",
data_files="data/date=2026-01-15/*.parquet"
)
```
## Interpreting the data
**Buy vs. Sell:** When `makerAssetId == "0"`, the maker is paying USDC (buying tokens). When `takerAssetId == "0"`, the taker is paying USDC.
**Price calculation:** `price = usdc_amount / (usdc_amount + token_amount)` where amounts are from opposite sides of the fill.
**Token IDs:** Each binary market has two token IDs (YES and NO). Map token IDs to markets using Polymarket's API: `GET https://clob.polymarket.com/markets/{condition_id}`
**Amounts:** Both USDC and CTF token amounts use 6 decimal places. Divide by 1,000,000 for human-readable values.
## Data source
Raw `OrderFilled` events from the Polymarket CTF Exchange contract on Polygon, indexed by Goldsky. This is public blockchain data.
## Partitioning
Data is Hive-partitioned by date (`date=YYYY-MM-DD/`). High-volume dates (typically >5M fills) are split across multiple parquet files (`data_0.parquet`, `data_1.parquet`).
## Limitations
- No market metadata (condition IDs, questions, categories). Join with Polymarket's API or CLOB endpoints.
- No order-level data (only fills). Resting order placement/cancellation events are not included.
- Token IDs are raw CTF ERC-1155 IDs. You need external mapping to determine YES vs NO side.
- Coverage ends March 5, 2026. No updates planned.
## License
CC-BY-4.0. This is public blockchain data. Attribution appreciated but not required.
提供机构:
PolyData



