five

wolframko/pipes-lie-embargo-dominus

收藏
Hugging Face2026-03-23 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/wolframko/pipes-lie-embargo-dominus
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit task_categories: - tabular-classification language: - en tags: - dota2 - esports - gaming - replay-parsing - time-series size_categories: - 100M<n<1B --- # Betty Dota 2 Pro Matches Dataset 9,388 professional Dota 2 matches parsed from replay files with per-second game state snapshots and combat log events. ## Dataset Structure ### matches.parquet (9,388 rows) One row per match. Contains metadata, STRATZ player statistics, and draft data. | Column | Type | Description | |--------|------|-------------| | match_id | int64 | Dota 2 match ID | | league_name | string | Tournament name | | league_tier | string | PROFESSIONAL, MAJOR, etc. | | duration_sec | int64 | Match duration in seconds | | radiant_win | bool | True if Radiant won | | radiant_team_id / dire_team_id | int64 | Team IDs | | p0..p9_hero_id | int64 | Hero ID per player (STRATZ) | | p0..p9_kills/deaths/assists | int64 | Final KDA per player | | p0..p9_gpm/xpm/networth | int64 | Economy stats per player | | p0..p9_gpm_per_min | list[int] | Gold per minute time-series | | p0..p9_networth_per_min | list[int] | Net worth per minute | | p0..p9_hero_damage_per_min | list[int] | Hero damage per minute | | stratz_pb_hero_id | list[int] | STRATZ pick/ban hero IDs | | replay_pb_hero_id | list[int] | Replay-parsed pick/ban hero IDs | ### ticks/ (189,120,930 rows) One row per game-tick per hero. ~1,500 ticks per match x 10 heroes. Per-second snapshots of full game state. | Column | Type | Description | |--------|------|-------------| | match_id | int64 | Match ID | | tick | int32 | Raw tick number | | game_time | float | Game time in seconds | | slot | int8 | Player slot (0-9) | | hero | string | Hero internal name | | x, y | float | Map position coordinates | | hp, max_hp | int32 | Current and max health | | mana, max_mana | float | Current and max mana | | gold | int32 | Current gold | | net_worth | int32 | Total net worth | | xp | int32 | Total experience | | kills, deaths, assists | int16 | KDA at this moment | | last_hits, denies | int16 | CS at this moment | | level | int8 | Hero level | | str, agi, int_ | float | Attribute values | | armor | float | Armor value | | move_speed | int16 | Movement speed | | item_0..item_5 | string | Items in 6 inventory slots | ### events/ (295,810,843 rows) One row per combat log event. Damage, kills, gold gains, XP, healing, purchases. | Column | Type | Description | |--------|------|-------------| | match_id | int64 | Match ID | | tick | int32 | Raw tick number | | game_time | float | Game time in seconds | | timestamp | float | Exact event timestamp | | event_type | string | DOTA_COMBATLOG_DAMAGE, _DEATH, _GOLD, _HEAL, _PURCHASE, _XP | | source | string | Source entity (e.g. npc_dota_hero_invoker) | | target | string | Target entity | | value | int32 | Damage/gold/xp amount | | item | string | Item name (for PURCHASE events) | ## Statistics - **Matches**: 9,388 professional games from 2025 - **Ticks**: 189,120,930 per-second hero state snapshots - **Events**: 295,810,843 combat log entries - **Size**: 5.97 GB compressed Parquet (from 103 GB JSON) - **Source**: Valve replay files (.dem) parsed with [manta](https://github.com/dotabuff/manta) + STRATZ API metadata ## Usage ```python from datasets import load_dataset # Load matches metadata matches = load_dataset("wolframko/pipes-lie-embargo-dominus", data_files="matches.parquet", split="train") # Load ticks (large - use streaming) ticks = load_dataset("wolframko/pipes-lie-embargo-dominus", data_files="ticks/*.parquet", split="train", streaming=True) # Load events events = load_dataset("wolframko/pipes-lie-embargo-dominus", data_files="events/*.parquet", split="train", streaming=True) ``` ```python import pandas as pd # Quick analysis with pandas matches_df = pd.read_parquet("hf://datasets/wolframko/pipes-lie-embargo-dominus/matches.parquet") print(f"Matches: {len(matches_df)}, Radiant winrate: {matches_df.radiant_win.mean():.1%}") ``` ## Data Pipeline 1. **Fetch**: Match IDs from STRATZ GraphQL API (pro leagues 2025) 2. **Download**: Replay files (.dem.bz2) from Valve CDN 3. **Parse**: Per-second hero states + combat log via manta (Go) 4. **Enrich**: Player/team stats from STRATZ API 5. **Convert**: JSON to Parquet with full validation ## License MIT
提供机构:
wolframko
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作