five

wolframko/betty-dota2

收藏
Hugging Face2026-03-30 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/wolframko/betty-dota2
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit task_categories: - tabular-classification - time-series-forecasting language: - en tags: - dota2 - esports - gaming - replay-parsing - time-series - decision-context size_categories: - 1B<n<10B --- # Betty Dota 2 — Decision Context Dataset ## Overview **9,385 professional Dota 2 matches** parsed from replay files (.dem) into a rich, per-second decision context: hero states, ability cooldowns, building HP, combat events, modifiers, ward placements, and objectives. Built to train Transformer and RL models that understand the game state at each moment in time. ## Dataset Structure ### matches.parquet — 9,385 rows One row per match. Match metadata, STRATZ player statistics, and draft data. | Column | Type | Description | |--------|------|-------------| | match_id | int64 | Dota 2 match ID | | league_name | string | Tournament name | | league_tier | string | PROFESSIONAL, MAJOR, etc. | | start_time | int64 | Unix timestamp | | duration_sec | int64 | Match duration in seconds | | radiant_win | bool | True if Radiant won | | radiant_team_id / dire_team_id | int64 | Team IDs | | p0..p9_hero_id | int64 | Hero ID per player | | p0..p9_kills/deaths/assists | int64 | Final KDA per player | | p0..p9_gpm/xpm/networth | int64 | Final economy stats | | p0..p9_position / p0..p9_role | string | Lane/role from STRATZ | | p0..p9_gpm_per_min | list[int64] | Gold per minute time series | | p0..p9_networth_per_min | list[int64] | Net worth per minute | | p0..p9_hero_damage_per_min | list[int64] | Hero damage per minute | | p0..p9_tower_damage_per_min | list[int64] | Tower damage per minute | | p0..p9_last_hits_per_min | list[int64] | Last hits per minute | | stratz_pb_hero_id | list[int64] | STRATZ pick/ban hero IDs | | replay_pb_hero_id | list[int64] | Replay-parsed pick/ban hero IDs | --- ### ticks/ — ~211,000,000 rows Per-second hero state snapshot. One row per hero per game second (~10 heroes × ~2,100 ticks per match). | Column | Type | Description | |--------|------|-------------| | match_id | int64 | Match ID | | tick | int32 | Raw engine tick | | game_time | double | Game time in seconds | | slot | int8 | Player slot (0–9, 0–4 Radiant, 5–9 Dire) | | hero | string | Hero internal name (e.g. `npc_dota_hero_invoker`) | | x, y | double | Map position | | hp, max_hp | int32 | Current and max health | | mana, max_mana | double | Current and max mana | | gold | int32 | Current unreliable gold | | net_worth | int32 | Total net worth | | xp | int32 | Total experience | | is_alive | bool | Whether hero is alive this tick | | respawn_seconds | double | Seconds until respawn (0 if alive) | | kills / deaths / assists | int16 | KDA at this moment | | last_hits / denies | int16 | CS at this moment | | level | int8 | Hero level | | str / agi / int_ | double | Current attribute values | | armor | double | Physical armor | | move_speed | int16 | Movement speed | | item_0..item_5 | string | Items in inventory slots | --- ### abilities/ — ~720,000,000 rows Per-second ability state for every ability on every hero. One row per ability per hero per tick (~20 abilities × 10 heroes × 2,100 ticks per match). | Column | Type | Description | |--------|------|-------------| | match_id | int64 | Match ID | | tick | int32 | Raw engine tick | | game_time | double | Game time in seconds | | slot | int8 | Player slot | | hero | string | Hero internal name | | ability_points | int16 | Unspent ability points | | ability_name | string | Ability internal name | | ability_level | int16 | Current level (0 = unlearned) | | cooldown | double | Remaining cooldown in seconds | | cooldown_length | double | Full cooldown duration | | charges | int16 | Charges remaining (charge-based abilities) | | mana_cost | int32 | Mana cost to cast | | activated | bool | Whether ability is toggled on | --- ### buildings/ — ~465,000,000 rows Per-second state of all towers, barracks, and the ancient. One row per building per tick. | Column | Type | Description | |--------|------|-------------| | match_id | int64 | Match ID | | tick | int32 | Raw engine tick | | game_time | double | Game time in seconds | | entity_name | string | Building internal name | | entity_type | string | `tower`, `barracks`, `fort` | | team | int16 | 2 = Radiant, 3 = Dire | | lane | string | `top`, `mid`, `bot`, `base` | | tier | int16 | Tower tier (1–4) | | hp | int32 | Current HP | | max_hp | int32 | Max HP | | alive | bool | Whether building is alive | --- ### events/ — ~287,000,000 rows Combat log events: damage, deaths, gold gains, XP, healing, purchases. | Column | Type | Description | |--------|------|-------------| | match_id | int64 | Match ID | | tick | int32 | Raw engine tick | | game_time | double | Game time in seconds | | timestamp | double | Exact event timestamp | | event_type | string | `DOTA_COMBATLOG_DAMAGE`, `_DEATH`, `_GOLD`, `_HEAL`, `_PURCHASE`, `_XP` | | source | string | Source entity | | target | string | Target entity | | value | int32 | Amount (damage/gold/xp/heal) | | inflictor | string | Ability or item that caused the event | | gold_reason / xp_reason | int32 | Reason code for gold/XP events | | item | string | Item name (PURCHASE events) | --- ### modifiers/ — ~196,000,000 rows Modifier (buff/debuff) add and remove events. One row per modifier lifecycle event. | Column | Type | Description | |--------|------|-------------| | match_id | int64 | Match ID | | timestamp | double | Event timestamp | | modifier_type | string | `add`, `remove`, `stack` | | slot | int8 | Affected player slot | | actor | string | Caster entity | | target | string | Target entity | | modifier_name | string | Modifier internal name | | modifier_category | string | `stun`, `silence`, `slow`, `buff`, `debuff`, etc. | | duration | double | Total duration in seconds | | elapsed_duration | double | Time elapsed when event fired | | stack_count | int32 | Stack count at event time | | hidden / invisibility / silence / root / aura / armor_debuff | bool | Effect flags | --- ### objectives/ — ~176,000 rows Objective events: tower kills, rune pickups, aegis, barracks, Roshan. | Column | Type | Description | |--------|------|-------------| | match_id | int64 | Match ID | | timestamp | double | Event timestamp | | objective_type | string | `building_kill`, `rune_pickup`, `aegis`, `roshan_kill`, etc. | | slot | int8 | Player slot | | actor / target | string | Entities involved | | team | int16 | Acting team | | entity_type / entity_subtype | string | Building or rune type | | lane / tier | string/int16 | For building kills | | rune_type | string | For rune pickups | | value | int32 | Gold bounty or other numeric value | | x, y | double | Map position | --- ### wards/ — ~971,000 rows Ward placement and destruction events. | Column | Type | Description | |--------|------|-------------| | match_id | int64 | Match ID | | timestamp | double | Event timestamp | | ward_event_type | string | `placed`, `destroyed`, `expired` | | slot | int8 | Placing player slot | | actor / target | string | Entities involved | | team | int16 | Placing team | | ward_type | string | `observer`, `sentry` | | owner_slot | int8 | Original owner slot | | x, y | double | Map position | --- ## Statistics | Table | Rows | Size | |-------|------|------| | matches | 9,385 | — | | ticks | ~211,000,000 | 4.1 GB | | abilities | ~720,000,000 | 503 MB | | buildings | ~465,000,000 | 332 MB | | events | ~287,000,000 | 2.3 GB | | modifiers | ~196,000,000 | 1.6 GB | | actions | ~29,000,000 | 271 MB | | objectives | ~176,000 | 7.4 MB | | wards | ~971,000 | 14 MB | | **Total** | **~1.9 billion** | **~9.1 GB** | --- ## Usage ```python import pandas as pd # Match metadata matches = pd.read_parquet("hf://datasets/wolframko/betty-dota2/matches.parquet") print(f"Matches: {len(matches)}, Radiant winrate: {matches.radiant_win.mean():.1%}") # Hero states (streaming — large table) from datasets import load_dataset ticks = load_dataset( "wolframko/betty-dota2", data_files="ticks/*.parquet", split="train", streaming=True, ) # Ability cooldowns for a single match import pyarrow.parquet as pq import pyarrow.dataset as ds abilities = ds.dataset("hf://datasets/wolframko/betty-dota2/abilities/", format="parquet") match_abilities = abilities.to_table( filter=ds.field("match_id") == 8106964131 ).to_pandas() ``` --- ## Data Pipeline 1. **Fetch** — Match IDs from STRATZ GraphQL API (professional leagues 2025) 2. **Download** — Replay files (.dem.bz2) from Valve CDN 3. **Parse** — Per-second hero states, ability states, building HP, combat log, modifiers, wards, objectives via [manta](https://github.com/dotabuff/manta) (Go) 4. **Enrich** — Player/team/draft stats from STRATZ API 5. **Convert** — JSON → Parquet (snappy compression, 10 matches/shard) Source code: [github.com/wolframko/betty](https://github.com/wolframko/betty) — branch `decision-context-v1` --- ## License MIT
提供机构:
wolframko
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作