five

lukeslp/steam-co-review-network

收藏
Hugging Face2026-02-19 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/lukeslp/steam-co-review-network
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-4.0 task_categories: - graph-ml - feature-extraction language: - en tags: - steam - games - network-analysis - co-review - graph-data - gaming - social-network pretty_name: Steam Co-Review Network size_categories: - 1M<n<10M dataset_info: features: - name: nodes dtype: list - name: links dtype: list - name: meta dtype: dict --- # Steam Co-Review Network Two Steam games share an edge when multiple users reviewed both. Built from 128 million user reviews across 80,000 games (2012-2024), this dataset maps how the Steam catalog is connected through player overlap. ## Files ### steam_network_full.json The complete co-review graph with minimal filtering: - 48,362 game nodes (every game with 10+ reviews) - 33,041,298 weighted edges (2+ shared reviewers per pair) - Per-node cap of 50 neighbors (densest connections preserved) - Edge weights range from 2 to 420,410 shared reviewers **Node format:** ```json { "id": "620", "title": "Portal 2", "year": "2011", "rating": "Overwhelmingly Positive", "ratio": 97, "reviews": 263842, "price": 9.99 } ``` **Link format:** ```json {"source": 0, "target": 42, "weight": 1523} ``` Source and target are indices into the nodes array. Weight is the number of users who reviewed both games. ### steam_all_2005.json 82,928 games released 2005-2025. Packed as arrays for compact JSON: ``` [name, year, ratio, reviews, price, ratingIdx, genreIdxs, tagIdxs, developer] [0] [1] [2] [3] [4] [5] [6] [7] [8] ``` Genres and tags are stored as index arrays referencing top-level `genres[]` and `tags[]` lookup tables in the same file. ### steam_force_layout.json Genre-aware pre-computed layout positions for the top ~9K nodes, clustered by primary genre with hub games as anchors. Use as warm-start coordinates for force-directed visualization. ## Sources - **Game metadata**: [FronkonGames Steam Games Dataset](https://huggingface.co/datasets/FronkonGames/steam-games-dataset) — Jan 2026 snapshot, 122K games - **User reviews**: [artermiloff Steam Reviews 2024](https://www.kaggle.com/datasets/artermiloff/steam-games-reviews-2024) — 128M reviews across 80K games, one CSV per game, 2012-June 2024 ## Pipeline 1. Load game metadata from FronkonGames enriched CSV 2. Scan 30K+ per-game review CSVs, extract steamid-to-game mappings 3. For each user who reviewed 2+ games, generate all game pairs 4. Count shared reviewers per pair to produce edge weights 5. Filter: minimum 2 shared reviewers (no neighbor cap) Full pipeline: [github.com/lukeslp/steam-network-data](https://github.com/lukeslp/steam-network-data) ## Use Cases - **Graph ML**: Node classification (predict genre/rating from network position), link prediction, community detection - **Recommendation systems**: Games connected by high edge weights share audiences - **Market analysis**: Which genres cluster together? Where are the gaps? - **Visualization**: Force-directed layouts, chord diagrams, genre timelines of the Steam ecosystem ## Live Visualization [dr.eamer.dev/datavis/interactive/steam/](https://dr.eamer.dev/datavis/interactive/steam/) Four interactive Canvas-rendered views: universe scatter, chord diagram, force-directed network, and genre timeline. ## Distribution - **GitHub**: [lukeslp/steam-network-data](https://github.com/lukeslp/steam-network-data) - **Kaggle**: [lucassteuber/steam-universe-network](https://www.kaggle.com/datasets/lucassteuber/steam-universe-network) ## Author Luke Steuber — [lukesteuber.com](https://lukesteuber.com) — [@lukesteuber.com on Bluesky](https://bsky.app/profile/lukesteuber.com)

license: 知识共享署名4.0许可协议(CC BY 4.0) task_categories: - 图机器学习(Graph ML) - 特征提取(Feature Extraction) language: - 英语(English) tags: - Steam平台(Steam) - 游戏(Games) - 网络分析(Network Analysis) - 协同评论(Co-Review) - 图数据(Graph Data) - 游戏行业(Gaming) - 社交网络(Social Network) pretty_name: Steam协同评论网络(Steam Co-Review Network) size_categories: - 100万<n<1000万 dataset_info: features: - name: 节点(nodes) dtype: 列表(list) - name: 边(links) dtype: 列表(list) - name: 元数据(meta) dtype: 字典(dict) # Steam协同评论网络(Steam Co-Review Network) 当多名用户同时对两款Steam游戏进行评论时,二者之间便会建立一条边。本数据集基于2012年至2024年间8万款游戏的1.28亿条用户评论构建,旨在通过玩家重叠情况映射Steam游戏商店的关联网络。 ## 文件 ### 完整协同评论网络文件:steam_network_full.json 该文件为经过最小化过滤的完整协同评论图(Co-Review Graph): - 共48362个游戏节点(包含至少10条评论的所有游戏) - 33041298条带权边(每对游戏至少有2名共同评论用户) - 每个节点最多保留50个邻居节点(保留最紧密的连接关系) - 边权值范围为2至420410(代表共同评论用户的数量) **节点格式:** json { "id": "620", // 游戏的Steam应用ID "title": "Portal 2", // 游戏名称 "year": "2011", // 发售年份 "rating": "Overwhelmingly Positive", // 玩家评价:压倒性好评 "ratio": 97, // 好评率百分比 "reviews": 263842, // 总评论数 "price": 9.99 // 售价(单位:美元) } **边格式:** json {"source": 0, "target": 42, "weight": 1523} 其中source与target为nodes数组中的索引值,weight代表同时评论两款游戏的用户数量。 ### 全量游戏文件:steam_all_2005.json 包含2005年至2025年发售的82928款游戏,为实现JSON压缩采用数组打包格式: [name, year, ratio, reviews, price, ratingIdx, genreIdxs, tagIdxs, developer] [0] [1] [2] [3] [4] [5] [6] [7] [8] 翻译为: [游戏名称, 发售年份, 好评率, 总评论数, 售价, 评价索引, 游戏类型索引, 标签索引, 开发商] [0] [1] [2] [3] [4] [5] [6] [7] [8] 游戏类型与标签以索引数组形式存储,指向当前文件中顶层的`genres[]`与`tags[]`查找表。 ### 预计算力导向布局文件:steam_force_layout.json 针对约9000个核心节点生成的、基于游戏类型的预计算布局坐标,节点按主力游戏类型聚类,并以头部游戏作为锚点。可作为力导向可视化的初始坐标使用。 ## 数据源 - **游戏元数据**:[FronkonGames Steam游戏数据集(FronkonGames Steam Games Dataset)](https://huggingface.co/datasets/FronkonGames/steam-games-dataset) — 2026年1月快照,包含12.2万款游戏 - **用户评论数据**:[artermiloff 2024年Steam评论数据集(artermiloff Steam Reviews 2024)](https://www.kaggle.com/datasets/artermiloff/steam-games-reviews-2024) — 涵盖2012年至2024年6月期间8万款游戏的1.28亿条评论,每款游戏对应一个CSV文件 ## 数据处理流程 1. 从FronkonGames扩充后的CSV文件中加载游戏元数据 2. 扫描3万余个单游戏评论CSV文件,提取Steam用户ID到游戏的映射关系 3. 针对每位评论了至少2款游戏的用户,生成其评论过的所有游戏配对 4. 统计每对游戏的共同评论用户数量,以此生成边权值 5. 过滤规则:仅保留至少有2名共同评论用户的游戏配对(未设置邻居节点上限) 完整处理流程:[github.com/lukeslp/steam-network-data](https://github.com/lukeslp/steam-network-data) ## 应用场景 - **图机器学习(Graph ML)**:节点分类(基于网络位置预测游戏类型与评价)、链路预测、社区检测 - **推荐系统**:边权值较高的游戏拥有重叠的玩家群体,可用于游戏推荐 - **市场分析**:分析哪些游戏类型存在聚类现象,以及市场空白点 - **可视化**:可用于Steam生态系统的力导向布局、弦图、游戏类型时间线可视化 ## 在线可视化工具 [dr.eamer.dev/datavis/interactive/steam/](https://dr.eamer.dev/datavis/interactive/steam/) 该工具包含四种基于Canvas渲染的交互式视图:宇宙散点图、弦图、力导向网络图与游戏类型时间线。 ## 分发渠道 - **GitHub**:[lukeslp/steam-network-data](https://github.com/lukeslp/steam-network-data) - **Kaggle**:[lucassteuber Steam宇宙网络数据集(lucassteuber/steam-universe-network)](https://www.kaggle.com/datasets/lucassteuber/steam-universe-network) ## 作者 卢克·斯托伊伯(Luke Steuber) — [lukesteuber.com](https://lukesteuber.com) — [Bluesky平台@lukesteuber.com](https://bsky.app/profile/lukesteuber.com)
提供机构:
lukeslp
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作