lukeslp/steam-co-review-network
收藏Hugging Face2026-02-19 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/lukeslp/steam-co-review-network
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-4.0
task_categories:
- graph-ml
- feature-extraction
language:
- en
tags:
- steam
- games
- network-analysis
- co-review
- graph-data
- gaming
- social-network
pretty_name: Steam Co-Review Network
size_categories:
- 1M<n<10M
dataset_info:
features:
- name: nodes
dtype: list
- name: links
dtype: list
- name: meta
dtype: dict
---
# Steam Co-Review Network
Two Steam games share an edge when multiple users reviewed both. Built from 128 million user reviews across 80,000 games (2012-2024), this dataset maps how the Steam catalog is connected through player overlap.
## Files
### steam_network_full.json
The complete co-review graph with minimal filtering:
- 48,362 game nodes (every game with 10+ reviews)
- 33,041,298 weighted edges (2+ shared reviewers per pair)
- Per-node cap of 50 neighbors (densest connections preserved)
- Edge weights range from 2 to 420,410 shared reviewers
**Node format:**
```json
{
"id": "620",
"title": "Portal 2",
"year": "2011",
"rating": "Overwhelmingly Positive",
"ratio": 97,
"reviews": 263842,
"price": 9.99
}
```
**Link format:**
```json
{"source": 0, "target": 42, "weight": 1523}
```
Source and target are indices into the nodes array. Weight is the number of users who reviewed both games.
### steam_all_2005.json
82,928 games released 2005-2025. Packed as arrays for compact JSON:
```
[name, year, ratio, reviews, price, ratingIdx, genreIdxs, tagIdxs, developer]
[0] [1] [2] [3] [4] [5] [6] [7] [8]
```
Genres and tags are stored as index arrays referencing top-level `genres[]` and `tags[]` lookup tables in the same file.
### steam_force_layout.json
Genre-aware pre-computed layout positions for the top ~9K nodes, clustered by primary genre with hub games as anchors. Use as warm-start coordinates for force-directed visualization.
## Sources
- **Game metadata**: [FronkonGames Steam Games Dataset](https://huggingface.co/datasets/FronkonGames/steam-games-dataset) — Jan 2026 snapshot, 122K games
- **User reviews**: [artermiloff Steam Reviews 2024](https://www.kaggle.com/datasets/artermiloff/steam-games-reviews-2024) — 128M reviews across 80K games, one CSV per game, 2012-June 2024
## Pipeline
1. Load game metadata from FronkonGames enriched CSV
2. Scan 30K+ per-game review CSVs, extract steamid-to-game mappings
3. For each user who reviewed 2+ games, generate all game pairs
4. Count shared reviewers per pair to produce edge weights
5. Filter: minimum 2 shared reviewers (no neighbor cap)
Full pipeline: [github.com/lukeslp/steam-network-data](https://github.com/lukeslp/steam-network-data)
## Use Cases
- **Graph ML**: Node classification (predict genre/rating from network position), link prediction, community detection
- **Recommendation systems**: Games connected by high edge weights share audiences
- **Market analysis**: Which genres cluster together? Where are the gaps?
- **Visualization**: Force-directed layouts, chord diagrams, genre timelines of the Steam ecosystem
## Live Visualization
[dr.eamer.dev/datavis/interactive/steam/](https://dr.eamer.dev/datavis/interactive/steam/)
Four interactive Canvas-rendered views: universe scatter, chord diagram, force-directed network, and genre timeline.
## Distribution
- **GitHub**: [lukeslp/steam-network-data](https://github.com/lukeslp/steam-network-data)
- **Kaggle**: [lucassteuber/steam-universe-network](https://www.kaggle.com/datasets/lucassteuber/steam-universe-network)
## Author
Luke Steuber — [lukesteuber.com](https://lukesteuber.com) — [@lukesteuber.com on Bluesky](https://bsky.app/profile/lukesteuber.com)
license: 知识共享署名4.0许可协议(CC BY 4.0)
task_categories:
- 图机器学习(Graph ML)
- 特征提取(Feature Extraction)
language:
- 英语(English)
tags:
- Steam平台(Steam)
- 游戏(Games)
- 网络分析(Network Analysis)
- 协同评论(Co-Review)
- 图数据(Graph Data)
- 游戏行业(Gaming)
- 社交网络(Social Network)
pretty_name: Steam协同评论网络(Steam Co-Review Network)
size_categories:
- 100万<n<1000万
dataset_info:
features:
- name: 节点(nodes)
dtype: 列表(list)
- name: 边(links)
dtype: 列表(list)
- name: 元数据(meta)
dtype: 字典(dict)
# Steam协同评论网络(Steam Co-Review Network)
当多名用户同时对两款Steam游戏进行评论时,二者之间便会建立一条边。本数据集基于2012年至2024年间8万款游戏的1.28亿条用户评论构建,旨在通过玩家重叠情况映射Steam游戏商店的关联网络。
## 文件
### 完整协同评论网络文件:steam_network_full.json
该文件为经过最小化过滤的完整协同评论图(Co-Review Graph):
- 共48362个游戏节点(包含至少10条评论的所有游戏)
- 33041298条带权边(每对游戏至少有2名共同评论用户)
- 每个节点最多保留50个邻居节点(保留最紧密的连接关系)
- 边权值范围为2至420410(代表共同评论用户的数量)
**节点格式:**
json
{
"id": "620", // 游戏的Steam应用ID
"title": "Portal 2", // 游戏名称
"year": "2011", // 发售年份
"rating": "Overwhelmingly Positive", // 玩家评价:压倒性好评
"ratio": 97, // 好评率百分比
"reviews": 263842, // 总评论数
"price": 9.99 // 售价(单位:美元)
}
**边格式:**
json
{"source": 0, "target": 42, "weight": 1523}
其中source与target为nodes数组中的索引值,weight代表同时评论两款游戏的用户数量。
### 全量游戏文件:steam_all_2005.json
包含2005年至2025年发售的82928款游戏,为实现JSON压缩采用数组打包格式:
[name, year, ratio, reviews, price, ratingIdx, genreIdxs, tagIdxs, developer]
[0] [1] [2] [3] [4] [5] [6] [7] [8]
翻译为:
[游戏名称, 发售年份, 好评率, 总评论数, 售价, 评价索引, 游戏类型索引, 标签索引, 开发商]
[0] [1] [2] [3] [4] [5] [6] [7] [8]
游戏类型与标签以索引数组形式存储,指向当前文件中顶层的`genres[]`与`tags[]`查找表。
### 预计算力导向布局文件:steam_force_layout.json
针对约9000个核心节点生成的、基于游戏类型的预计算布局坐标,节点按主力游戏类型聚类,并以头部游戏作为锚点。可作为力导向可视化的初始坐标使用。
## 数据源
- **游戏元数据**:[FronkonGames Steam游戏数据集(FronkonGames Steam Games Dataset)](https://huggingface.co/datasets/FronkonGames/steam-games-dataset) — 2026年1月快照,包含12.2万款游戏
- **用户评论数据**:[artermiloff 2024年Steam评论数据集(artermiloff Steam Reviews 2024)](https://www.kaggle.com/datasets/artermiloff/steam-games-reviews-2024) — 涵盖2012年至2024年6月期间8万款游戏的1.28亿条评论,每款游戏对应一个CSV文件
## 数据处理流程
1. 从FronkonGames扩充后的CSV文件中加载游戏元数据
2. 扫描3万余个单游戏评论CSV文件,提取Steam用户ID到游戏的映射关系
3. 针对每位评论了至少2款游戏的用户,生成其评论过的所有游戏配对
4. 统计每对游戏的共同评论用户数量,以此生成边权值
5. 过滤规则:仅保留至少有2名共同评论用户的游戏配对(未设置邻居节点上限)
完整处理流程:[github.com/lukeslp/steam-network-data](https://github.com/lukeslp/steam-network-data)
## 应用场景
- **图机器学习(Graph ML)**:节点分类(基于网络位置预测游戏类型与评价)、链路预测、社区检测
- **推荐系统**:边权值较高的游戏拥有重叠的玩家群体,可用于游戏推荐
- **市场分析**:分析哪些游戏类型存在聚类现象,以及市场空白点
- **可视化**:可用于Steam生态系统的力导向布局、弦图、游戏类型时间线可视化
## 在线可视化工具
[dr.eamer.dev/datavis/interactive/steam/](https://dr.eamer.dev/datavis/interactive/steam/)
该工具包含四种基于Canvas渲染的交互式视图:宇宙散点图、弦图、力导向网络图与游戏类型时间线。
## 分发渠道
- **GitHub**:[lukeslp/steam-network-data](https://github.com/lukeslp/steam-network-data)
- **Kaggle**:[lucassteuber Steam宇宙网络数据集(lucassteuber/steam-universe-network)](https://www.kaggle.com/datasets/lucassteuber/steam-universe-network)
## 作者
卢克·斯托伊伯(Luke Steuber) — [lukesteuber.com](https://lukesteuber.com) — [Bluesky平台@lukesteuber.com](https://bsky.app/profile/lukesteuber.com)
提供机构:
lukeslp



