wolframko/pipes-lie-embargo-dominus
收藏Hugging Face2026-03-23 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/wolframko/pipes-lie-embargo-dominus
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
task_categories:
- tabular-classification
language:
- en
tags:
- dota2
- esports
- gaming
- replay-parsing
- time-series
size_categories:
- 100M<n<1B
---
# Betty Dota 2 Pro Matches Dataset
9,388 professional Dota 2 matches parsed from replay files with per-second game state snapshots and combat log events.
## Dataset Structure
### matches.parquet (9,388 rows)
One row per match. Contains metadata, STRATZ player statistics, and draft data.
| Column | Type | Description |
|--------|------|-------------|
| match_id | int64 | Dota 2 match ID |
| league_name | string | Tournament name |
| league_tier | string | PROFESSIONAL, MAJOR, etc. |
| duration_sec | int64 | Match duration in seconds |
| radiant_win | bool | True if Radiant won |
| radiant_team_id / dire_team_id | int64 | Team IDs |
| p0..p9_hero_id | int64 | Hero ID per player (STRATZ) |
| p0..p9_kills/deaths/assists | int64 | Final KDA per player |
| p0..p9_gpm/xpm/networth | int64 | Economy stats per player |
| p0..p9_gpm_per_min | list[int] | Gold per minute time-series |
| p0..p9_networth_per_min | list[int] | Net worth per minute |
| p0..p9_hero_damage_per_min | list[int] | Hero damage per minute |
| stratz_pb_hero_id | list[int] | STRATZ pick/ban hero IDs |
| replay_pb_hero_id | list[int] | Replay-parsed pick/ban hero IDs |
### ticks/ (189,120,930 rows)
One row per game-tick per hero. ~1,500 ticks per match x 10 heroes. Per-second snapshots of full game state.
| Column | Type | Description |
|--------|------|-------------|
| match_id | int64 | Match ID |
| tick | int32 | Raw tick number |
| game_time | float | Game time in seconds |
| slot | int8 | Player slot (0-9) |
| hero | string | Hero internal name |
| x, y | float | Map position coordinates |
| hp, max_hp | int32 | Current and max health |
| mana, max_mana | float | Current and max mana |
| gold | int32 | Current gold |
| net_worth | int32 | Total net worth |
| xp | int32 | Total experience |
| kills, deaths, assists | int16 | KDA at this moment |
| last_hits, denies | int16 | CS at this moment |
| level | int8 | Hero level |
| str, agi, int_ | float | Attribute values |
| armor | float | Armor value |
| move_speed | int16 | Movement speed |
| item_0..item_5 | string | Items in 6 inventory slots |
### events/ (295,810,843 rows)
One row per combat log event. Damage, kills, gold gains, XP, healing, purchases.
| Column | Type | Description |
|--------|------|-------------|
| match_id | int64 | Match ID |
| tick | int32 | Raw tick number |
| game_time | float | Game time in seconds |
| timestamp | float | Exact event timestamp |
| event_type | string | DOTA_COMBATLOG_DAMAGE, _DEATH, _GOLD, _HEAL, _PURCHASE, _XP |
| source | string | Source entity (e.g. npc_dota_hero_invoker) |
| target | string | Target entity |
| value | int32 | Damage/gold/xp amount |
| item | string | Item name (for PURCHASE events) |
## Statistics
- **Matches**: 9,388 professional games from 2025
- **Ticks**: 189,120,930 per-second hero state snapshots
- **Events**: 295,810,843 combat log entries
- **Size**: 5.97 GB compressed Parquet (from 103 GB JSON)
- **Source**: Valve replay files (.dem) parsed with [manta](https://github.com/dotabuff/manta) + STRATZ API metadata
## Usage
```python
from datasets import load_dataset
# Load matches metadata
matches = load_dataset("wolframko/pipes-lie-embargo-dominus", data_files="matches.parquet", split="train")
# Load ticks (large - use streaming)
ticks = load_dataset("wolframko/pipes-lie-embargo-dominus", data_files="ticks/*.parquet", split="train", streaming=True)
# Load events
events = load_dataset("wolframko/pipes-lie-embargo-dominus", data_files="events/*.parquet", split="train", streaming=True)
```
```python
import pandas as pd
# Quick analysis with pandas
matches_df = pd.read_parquet("hf://datasets/wolframko/pipes-lie-embargo-dominus/matches.parquet")
print(f"Matches: {len(matches_df)}, Radiant winrate: {matches_df.radiant_win.mean():.1%}")
```
## Data Pipeline
1. **Fetch**: Match IDs from STRATZ GraphQL API (pro leagues 2025)
2. **Download**: Replay files (.dem.bz2) from Valve CDN
3. **Parse**: Per-second hero states + combat log via manta (Go)
4. **Enrich**: Player/team stats from STRATZ API
5. **Convert**: JSON to Parquet with full validation
## License
MIT
提供机构:
wolframko



