five

blanchon/hltv-cs2-demos-test

收藏
Hugging Face2026-04-21 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/blanchon/hltv-cs2-demos-test
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-4.0 task_categories: - other language: - en tags: - counter-strike - csgo - cs2 - esports - hltv - demos pretty_name: "HLTV CS2 Demos Dataset" size_categories: - 100K<n<1M configs: - config_name: default data_files: - split: train path: data/metadata-*.parquet --- # HLTV CS2 Demos Dataset Counter-Strike 2 match demos scraped from [HLTV.org](https://www.hltv.org/results) plus a compact per-map analysis JSON. Each row of the metadata Parquet is **one `.dem` file (one played CS2 map)**; a best-of-3 match contributes 2 or 3 rows depending on whether it went 2-0 or 2-1. - Parquet holds everything you typically filter on: `map_name`, `patch_version`, `rounds_played`, per-player `kast` / `adr` / `rating`, every kill tick with weapon + headshot, every round's winner + end reason. - `.dem` binaries live alongside as loose files; each match subfolder also carries a `meta.json` (HLTV sidecar) and a `<demo>.analysis.json` (the exact JSON that was embedded into the Parquet row, so the shard is self-describing if you grab it via the filesystem). ## Repository layout ``` data/ # metadata Parquet (self-describing) metadata-<machine>-<uuid>.parquet demos/ shard-<machine>-<uuid>/ <match_id>/ meta.json # HLTV sidecar (teams, event, demo id …) <demo1>.dem # bo3 map 1 <demo1>.analysis.json # per-map analysis (full) <demo2>.dem <demo2>.analysis.json <demo3>.dem # absent if bo3 ended 2-0 <demo3>.analysis.json ``` Shard names carry the producing machine id + a UUID so any number of machines can upload to the dataset in parallel without colliding. ## Schema (one row per `.dem`) Promoted top-level columns let you filter fast: | column | type | source | |---|---|---| | `file_name` | string | `.dem` path inside the repo | | `analysis_file_name` | string | `<demo>.analysis.json` path | | `meta_file_name` | string | per-match `meta.json` path | | `match_id` | string | HLTV numeric id (same across the bo3 maps) | | `event`, `team1`, `team2` | string | HLTV `/results` row | | `score1`, `score2` | int32 | map score of the full bo3 | | `format` | string | `bo3` / `bo5` / map name for bo1s | | `match_date` | timestamp | UTC, from HLTV | | `map_index` | int32 | 1, 2, 3 position of this map in the bo3 | | `map_name` | string | e.g. `de_overpass` | | `patch_version` | string | CS2 build, e.g. `"14141"` = `1.41.4.1` | | `rounds_played` | int32 | actual rounds this map went | | `winner_side` | string | `ct` / `t` | Native nested columns for the full analysis: | column | type | |---|---| | `header` | `struct<map_name, patch_version, server_name, network_protocol, demo_version_name>` | | `match` | `struct<map_name, patch_version, tick_rate, first_tick, last_tick, total_ticks, rounds_played, score_ct, score_t, winner_side>` | | `teams` | `list<struct<side_start, players: list<struct<name, steamid>>>>` | | `players` | `list<struct<name, steamid, side, n_rounds, kills, headshots, deaths, assists, hs_rate, adr, dmg, kast, kast_rounds, rating, impact>>` | | `rounds` | `list<struct<round_num, start, freeze_end, end, official_end, winner, reason, bomb_plant, bomb_site, duration_ticks>>` | | `kills` | `list<struct<tick, round_num, attacker_name, attacker_steamid, attacker_side, victim_name, victim_steamid, victim_side, assister_name, assister_steamid, weapon, headshot, distance, noscope, thrusmoke, penetrated, attackerblind, dmg_health, hitgroup>>` | | `meta` | `struct` mirror of the on-disk `meta.json` | ## Query with DuckDB (no download) The Hub supports `hf://` paths in DuckDB `≥ 0.10.3`, so you can run SQL directly against the parquet without fetching the `.dem`s. See the [HF DuckDB docs](https://huggingface.co/docs/hub/datasets-duckdb). ```sql -- Quick peek FROM 'hf://datasets/blanchon/hltv-cs2-demos-test/data/*.parquet' LIMIT 3; -- All de_overpass maps on patch 14141 SELECT match_id, team1, team2, map_name, score_ct, score_t, rounds_played FROM 'hf://datasets/blanchon/hltv-cs2-demos-test/data/*.parquet' WHERE patch_version = '14141' AND map_name = 'de_overpass'; -- Dot-access into the nested `match` struct SELECT match_id, match.tick_rate, match.winner_side FROM 'hf://datasets/blanchon/hltv-cs2-demos-test/data/*.parquet'; -- UNNEST kills to one row per kill SELECT match_id, k.weapon, k.headshot, k.distance FROM 'hf://datasets/blanchon/hltv-cs2-demos-test/data/*.parquet', UNNEST(kills) AS t(k) WHERE k.headshot; -- Inline list predicates (avoid UNNEST when you want aggregates) SELECT match_id, map_name, len(kills) AS n_kills, len(filter(kills, x -> x.headshot)) AS n_hs, list_avg(list_transform(players, p -> p.adr)) AS avg_adr FROM 'hf://datasets/blanchon/hltv-cs2-demos-test/data/*.parquet'; ``` Also works from DuckDB's Python client: ```python import duckdb duckdb.sql("SELECT count(*) FROM 'hf://datasets/blanchon/hltv-cs2-demos-test/data/*.parquet'") ``` Or via the Hub's auto-converted parquet branch (useful when the dataset gets huge and the Viewer shards it for you): ```sql SELECT count(*) FROM 'hf://datasets/blanchon/hltv-cs2-demos-test@~parquet/**/*.parquet'; ``` ## Stream with `datasets` ```python from datasets import load_dataset ds = load_dataset("blanchon/hltv-cs2-demos-test", split="train", streaming=True) # Take the first 10 rows — metadata only, no .dem transferred for row in ds.take(10): print(row["match_id"], row["map_name"], row["patch_version"], row["match"]["score_ct"], row["match"]["score_t"]) # Filter in the stream: patch 14141 + tier-1 events ds = ds.filter( lambda r: r["patch_version"] == "14141" and "IEM" in (r["event"] or "") ) ``` ## Stream single files with `HfFileSystem` For random access (read one Parquet row group, one `.analysis.json` without downloading neighbours, etc.) use [`HfFileSystem`](https://huggingface.co/docs/huggingface_hub/package_reference/hf_file_system): ```python from huggingface_hub import hffs import json, pyarrow.parquet as pq # Read one .analysis.json with hffs.open("datasets/blanchon/hltv-cs2-demos-test/demos/shard-<id>/<match_id>/<demo>.analysis.json") as f: analysis = json.load(f) print(analysis["match"]["rounds_played"], len(analysis["kills"])) # List every .analysis.json in a shard for p in hffs.glob("datasets/blanchon/hltv-cs2-demos-test/demos/shard-*/*/*.analysis.json"): print(p) # Row-group-level streaming on the metadata Parquet with pq.ParquetFile("hf://datasets/blanchon/hltv-cs2-demos-test/data/metadata-xyz.parquet") as pf: for i in range(pf.num_row_groups): tbl = pf.read_row_group(i) # process tbl … ``` ## Partial download with the `hf` CLI The [`hf` CLI](https://huggingface.co/docs/huggingface_hub/guides/cli) ships with `huggingface_hub`. `hf_transfer` is the Rust-backed uploader/downloader that saturates a gigabit link per file. ```bash pip install -U huggingface_hub hf_transfer export HF_HUB_ENABLE_HF_TRANSFER=1 # Metadata only (fast, ~tens of MB for the whole dataset) hf download blanchon/hltv-cs2-demos-test --repo-type dataset --include "data/*.parquet" # One match: all files under a given <match_id>/ hf download blanchon/hltv-cs2-demos-test --repo-type dataset --include "demos/shard-*/2393304/*" # One shard (one machine's batch) worth of data hf download blanchon/hltv-cs2-demos-test --repo-type dataset --include "demos/shard-machineA-*/**" # Only the .analysis.json sidecars (tens of KB each, no .dem) hf download blanchon/hltv-cs2-demos-test --repo-type dataset --include "demos/shard-*/**/*.analysis.json" ``` The equivalent in Python: ```python from huggingface_hub import snapshot_download snapshot_download( repo_id="blanchon/hltv-cs2-demos-test", repo_type="dataset", allow_patterns=["data/*", "demos/shard-*/2393304/*"], ) ``` ## Example workflow: metadata-first, fetch on demand ```python from huggingface_hub import HfFileSystem, hf_hub_download import duckdb # 1. Decide which matches you care about via SQL — stays on the Hub paths = duckdb.sql(f''' SELECT file_name FROM 'hf://datasets/blanchon/hltv-cs2-demos-test/data/*.parquet' WHERE patch_version = '14141' AND map_name = 'de_overpass' AND score_ct + score_t >= 24 ''').fetchall() # 2. Fetch only the matching .dem files for (rel,) in paths: local = hf_hub_download(repo_id="blanchon/hltv-cs2-demos-test", repo_type="dataset", filename=rel) print(local) ``` ## Collection & licensing Demos are mirrored from HLTV's public `/download/demo/<id>` endpoint, which hosts VODs provided by tournament organizers. Downstream users are responsible for respecting the original terms from those tournaments. This dataset exists for research / analytics / ML use on competitive CS2 play.
提供机构:
blanchon
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作