wolframko/betty-dota2
收藏Hugging Face2026-03-30 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/wolframko/betty-dota2
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
task_categories:
- tabular-classification
- time-series-forecasting
language:
- en
tags:
- dota2
- esports
- gaming
- replay-parsing
- time-series
- decision-context
size_categories:
- 1B<n<10B
---
# Betty Dota 2 — Decision Context Dataset
## Overview
**9,385 professional Dota 2 matches** parsed from replay files (.dem) into a rich, per-second decision context: hero states, ability cooldowns, building HP, combat events, modifiers, ward placements, and objectives.
Built to train Transformer and RL models that understand the game state at each moment in time.
## Dataset Structure
### matches.parquet — 9,385 rows
One row per match. Match metadata, STRATZ player statistics, and draft data.
| Column | Type | Description |
|--------|------|-------------|
| match_id | int64 | Dota 2 match ID |
| league_name | string | Tournament name |
| league_tier | string | PROFESSIONAL, MAJOR, etc. |
| start_time | int64 | Unix timestamp |
| duration_sec | int64 | Match duration in seconds |
| radiant_win | bool | True if Radiant won |
| radiant_team_id / dire_team_id | int64 | Team IDs |
| p0..p9_hero_id | int64 | Hero ID per player |
| p0..p9_kills/deaths/assists | int64 | Final KDA per player |
| p0..p9_gpm/xpm/networth | int64 | Final economy stats |
| p0..p9_position / p0..p9_role | string | Lane/role from STRATZ |
| p0..p9_gpm_per_min | list[int64] | Gold per minute time series |
| p0..p9_networth_per_min | list[int64] | Net worth per minute |
| p0..p9_hero_damage_per_min | list[int64] | Hero damage per minute |
| p0..p9_tower_damage_per_min | list[int64] | Tower damage per minute |
| p0..p9_last_hits_per_min | list[int64] | Last hits per minute |
| stratz_pb_hero_id | list[int64] | STRATZ pick/ban hero IDs |
| replay_pb_hero_id | list[int64] | Replay-parsed pick/ban hero IDs |
---
### ticks/ — ~211,000,000 rows
Per-second hero state snapshot. One row per hero per game second (~10 heroes × ~2,100 ticks per match).
| Column | Type | Description |
|--------|------|-------------|
| match_id | int64 | Match ID |
| tick | int32 | Raw engine tick |
| game_time | double | Game time in seconds |
| slot | int8 | Player slot (0–9, 0–4 Radiant, 5–9 Dire) |
| hero | string | Hero internal name (e.g. `npc_dota_hero_invoker`) |
| x, y | double | Map position |
| hp, max_hp | int32 | Current and max health |
| mana, max_mana | double | Current and max mana |
| gold | int32 | Current unreliable gold |
| net_worth | int32 | Total net worth |
| xp | int32 | Total experience |
| is_alive | bool | Whether hero is alive this tick |
| respawn_seconds | double | Seconds until respawn (0 if alive) |
| kills / deaths / assists | int16 | KDA at this moment |
| last_hits / denies | int16 | CS at this moment |
| level | int8 | Hero level |
| str / agi / int_ | double | Current attribute values |
| armor | double | Physical armor |
| move_speed | int16 | Movement speed |
| item_0..item_5 | string | Items in inventory slots |
---
### abilities/ — ~720,000,000 rows
Per-second ability state for every ability on every hero. One row per ability per hero per tick (~20 abilities × 10 heroes × 2,100 ticks per match).
| Column | Type | Description |
|--------|------|-------------|
| match_id | int64 | Match ID |
| tick | int32 | Raw engine tick |
| game_time | double | Game time in seconds |
| slot | int8 | Player slot |
| hero | string | Hero internal name |
| ability_points | int16 | Unspent ability points |
| ability_name | string | Ability internal name |
| ability_level | int16 | Current level (0 = unlearned) |
| cooldown | double | Remaining cooldown in seconds |
| cooldown_length | double | Full cooldown duration |
| charges | int16 | Charges remaining (charge-based abilities) |
| mana_cost | int32 | Mana cost to cast |
| activated | bool | Whether ability is toggled on |
---
### buildings/ — ~465,000,000 rows
Per-second state of all towers, barracks, and the ancient. One row per building per tick.
| Column | Type | Description |
|--------|------|-------------|
| match_id | int64 | Match ID |
| tick | int32 | Raw engine tick |
| game_time | double | Game time in seconds |
| entity_name | string | Building internal name |
| entity_type | string | `tower`, `barracks`, `fort` |
| team | int16 | 2 = Radiant, 3 = Dire |
| lane | string | `top`, `mid`, `bot`, `base` |
| tier | int16 | Tower tier (1–4) |
| hp | int32 | Current HP |
| max_hp | int32 | Max HP |
| alive | bool | Whether building is alive |
---
### events/ — ~287,000,000 rows
Combat log events: damage, deaths, gold gains, XP, healing, purchases.
| Column | Type | Description |
|--------|------|-------------|
| match_id | int64 | Match ID |
| tick | int32 | Raw engine tick |
| game_time | double | Game time in seconds |
| timestamp | double | Exact event timestamp |
| event_type | string | `DOTA_COMBATLOG_DAMAGE`, `_DEATH`, `_GOLD`, `_HEAL`, `_PURCHASE`, `_XP` |
| source | string | Source entity |
| target | string | Target entity |
| value | int32 | Amount (damage/gold/xp/heal) |
| inflictor | string | Ability or item that caused the event |
| gold_reason / xp_reason | int32 | Reason code for gold/XP events |
| item | string | Item name (PURCHASE events) |
---
### modifiers/ — ~196,000,000 rows
Modifier (buff/debuff) add and remove events. One row per modifier lifecycle event.
| Column | Type | Description |
|--------|------|-------------|
| match_id | int64 | Match ID |
| timestamp | double | Event timestamp |
| modifier_type | string | `add`, `remove`, `stack` |
| slot | int8 | Affected player slot |
| actor | string | Caster entity |
| target | string | Target entity |
| modifier_name | string | Modifier internal name |
| modifier_category | string | `stun`, `silence`, `slow`, `buff`, `debuff`, etc. |
| duration | double | Total duration in seconds |
| elapsed_duration | double | Time elapsed when event fired |
| stack_count | int32 | Stack count at event time |
| hidden / invisibility / silence / root / aura / armor_debuff | bool | Effect flags |
---
### objectives/ — ~176,000 rows
Objective events: tower kills, rune pickups, aegis, barracks, Roshan.
| Column | Type | Description |
|--------|------|-------------|
| match_id | int64 | Match ID |
| timestamp | double | Event timestamp |
| objective_type | string | `building_kill`, `rune_pickup`, `aegis`, `roshan_kill`, etc. |
| slot | int8 | Player slot |
| actor / target | string | Entities involved |
| team | int16 | Acting team |
| entity_type / entity_subtype | string | Building or rune type |
| lane / tier | string/int16 | For building kills |
| rune_type | string | For rune pickups |
| value | int32 | Gold bounty or other numeric value |
| x, y | double | Map position |
---
### wards/ — ~971,000 rows
Ward placement and destruction events.
| Column | Type | Description |
|--------|------|-------------|
| match_id | int64 | Match ID |
| timestamp | double | Event timestamp |
| ward_event_type | string | `placed`, `destroyed`, `expired` |
| slot | int8 | Placing player slot |
| actor / target | string | Entities involved |
| team | int16 | Placing team |
| ward_type | string | `observer`, `sentry` |
| owner_slot | int8 | Original owner slot |
| x, y | double | Map position |
---
## Statistics
| Table | Rows | Size |
|-------|------|------|
| matches | 9,385 | — |
| ticks | ~211,000,000 | 4.1 GB |
| abilities | ~720,000,000 | 503 MB |
| buildings | ~465,000,000 | 332 MB |
| events | ~287,000,000 | 2.3 GB |
| modifiers | ~196,000,000 | 1.6 GB |
| actions | ~29,000,000 | 271 MB |
| objectives | ~176,000 | 7.4 MB |
| wards | ~971,000 | 14 MB |
| **Total** | **~1.9 billion** | **~9.1 GB** |
---
## Usage
```python
import pandas as pd
# Match metadata
matches = pd.read_parquet("hf://datasets/wolframko/betty-dota2/matches.parquet")
print(f"Matches: {len(matches)}, Radiant winrate: {matches.radiant_win.mean():.1%}")
# Hero states (streaming — large table)
from datasets import load_dataset
ticks = load_dataset(
"wolframko/betty-dota2",
data_files="ticks/*.parquet",
split="train",
streaming=True,
)
# Ability cooldowns for a single match
import pyarrow.parquet as pq
import pyarrow.dataset as ds
abilities = ds.dataset("hf://datasets/wolframko/betty-dota2/abilities/", format="parquet")
match_abilities = abilities.to_table(
filter=ds.field("match_id") == 8106964131
).to_pandas()
```
---
## Data Pipeline
1. **Fetch** — Match IDs from STRATZ GraphQL API (professional leagues 2025)
2. **Download** — Replay files (.dem.bz2) from Valve CDN
3. **Parse** — Per-second hero states, ability states, building HP, combat log, modifiers, wards, objectives via [manta](https://github.com/dotabuff/manta) (Go)
4. **Enrich** — Player/team/draft stats from STRATZ API
5. **Convert** — JSON → Parquet (snappy compression, 10 matches/shard)
Source code: [github.com/wolframko/betty](https://github.com/wolframko/betty) — branch `decision-context-v1`
---
## License
MIT
提供机构:
wolframko



