five

yanivohayon1/ucl-2021-22

收藏
Hugging Face2026-04-11 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/yanivohayon1/ucl-2021-22
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - en tags: - football - sports - ucl size_categories: - 1K<n<10K --- # UCL 2021/22 — UEFA Champions League Player Statistics ## Dataset Overview This dataset contains player statistics from the 2021/22 UEFA Champions League season. It consists of 8 CSV files covering different aspects of player performance, with a total of 3,524 rows across all files. After merging all files, the working dataset contains 751 unique players across 44 columns. **Source:** [Kaggle — UCL Matches & Players Data 2021/22](https://www.kaggle.com/datasets/azminetoushikwasi/ucl-202122-uefa-champions-league) **Target Variable:** `position` — Forward / Midfielder / Defender / Goalkeeper --- ## Feature Selection The dataset consists of 8 separate files, each covering a different aspect of player performance. The key features used in this analysis are: **Attacking:** goals, assists, shot attempts, shots on target **Defending:** tackles, clearances, blocks, interceptions **Disciplinary:** yellow cards, red cards, fouls committed **Distribution:** passes, pass accuracy **Goalkeeping:** saves, clean sheets, goals conceded **Key Stats:** minutes played, matches played, position, club --- ## Files | File | Description | |------|-------------| | `attacking.csv` | Goals, assists, shot attempts | | `attempts.csv` | Shot attempts breakdown | | `defending.csv` | Tackles, clearances, blocks | | `disciplinary.csv` | Yellow cards, red cards, fouls | | `distributon.csv` | Passes, pass accuracy | | `goalkeeping.csv` | Saves, clean sheets | | `goals.csv` | Goal details | | `key_stats.csv` | General player stats | | `notebook_1.ipynb` | Full EDA Notebook | --- ## Data Cleaning & Decision Making **Step 1 — Merging:** I merged all 8 CSV files into one Master DataFrame using `player_name`, `club`, and `position` as keys. The result was 751 unique players × 44 columns. **Step 2 — Missing Values:** I found that goalkeeping columns (saved, cleansheets, etc.) had 92.9% missing values — this is expected since only 53 out of 751 players are goalkeepers. I filled those with 0 for non-goalkeepers, and all other numeric columns were filled with the median per position. Final result: 0 missing values. **Step 3 — Duplicates:** No duplicate rows were found. **Step 4 — Scaling & Normalization:** I checked the range of each key column and found that columns exist on very different scales (e.g. minutes_played up to 1,230 vs goals up to 15). I chose not to apply normalization at this stage since I am performing EDA only and not building an ML model. The original scales are clear and interpretable as-is. **Step 5 — Outlier Detection:** I used the IQR method to detect outliers and visualized them using Box Plots. I found 183 outliers in goals and 176 in assists. I decided to keep all outliers because they represent genuine elite performers like Benzema with 15 goals — not data errors. --- ## Research Questions & Visual Insights ### Question 1: What is the unique statistical profile of each position? ![Q1](1_position_profile.png) Forwards dominate all attacking statistics with an average of 1.23 goals per player. Goalkeepers play the most minutes (418 on average). Each position has a clearly distinct statistical profile that reflects its role on the pitch. --- ### Question 2: Do more shot attempts lead to more goals, or does accuracy matter more? ![Q2](q2_shots_vs_goals.png) Shots on target correlate more strongly with goals (r=0.85) than total attempts (r=0.75). This confirms that accuracy matters more than quantity — a player who shoots less but more accurately scores more goals. --- ### Question 3: Does a player's position affect the number of fouls committed? ![Q3](q3_fouls_cards.png) Midfielders commit the most fouls, but defenders receive the most yellow cards (0.065 average). This suggests that defensive positioning leads to more consequential fouls than midfield ones. --- ### Question 4: Which club was the most efficient — most goals per shot attempt? ![Q4](q4efficientclubs.png) Villarreal was the most efficient club, converting 15% of their shots into goals. Surprisingly, big clubs like Real Madrid and Bayern did not top the list — efficiency is not always about having the best players. --- ### Question 5: Do clubs with more assists necessarily score more goals? ![Q5](q5.png) Yes — there is a near-perfect correlation of r=0.98 between assists and goals at the club level. Teamwork is the key to attacking success in the UCL. --- ### Question 6: Does a player's position affect their goal contributions (goals + assists)? ![Q6](6.png) Forwards and midfielders together account for 84% of all goal contributions (42.5% and 41.3% respectively). Goalkeepers contribute only 0.2% — exactly as expected. --- ## Key Findings Summary - Forwards average 1.23 goals per player — significantly more than any other position - Shot accuracy (r=0.85) predicts goals better than total attempts (r=0.75) - Defenders receive the most yellow cards despite midfielders committing more fouls - Villarreal was the most efficient club, converting 15% of shots into goals - Assists and goals have a near-perfect correlation (r=0.98) — teamwork is key - Forwards and midfielders account for 84% of all goal contributions --- ## Final Conclusion The EDA process successfully told the story of UCL 2021/22 player performance. The analysis revealed that position is the strongest predictor of a player's statistical profile — forwards and midfielders dominate goal contributions, while defenders and goalkeepers play a completely different role. Efficiency matters more than volume — Villarreal proved that converting chances well is more important than creating many of them. Finally, teamwork is the true engine of attacking success, as clubs with more assists almost always score more goals. --- ## Challenges & Lessons Learned **Challenges:** I was on military reserve duty during semester A and did not study Python properly. My main challenge was learning how to write code correctly before and during the project, with the help of Claude AI. Additionally, merging 8 files with different columns caused recurring errors that took time to resolve. **Lessons Learned:** At the beginning I did not work in an organized and structured way, which made things increasingly difficult as I progressed — to the point where I had to start over. The most important lesson I learned: working in an organized and structured way, step by step, is the foundation of success. --- ## 📂 Project Files & Deliverables | File | Description | Link | |------|-------------|------| | `attacking.csv` | Attacking stats | [View File](https://huggingface.co/datasets/yanivohayon1/ucl-2021-22/blob/main/attacking.csv) | | `attempts.csv` | Shot attempts | [View File](https://huggingface.co/datasets/yanivohayon1/ucl-2021-22/blob/main/attempts.csv) | | `defending.csv` | Defensive stats | [View File](https://huggingface.co/datasets/yanivohayon1/ucl-2021-22/blob/main/defending.csv) | | `disciplinary.csv` | Cards & fouls | [View File](https://huggingface.co/datasets/yanivohayon1/ucl-2021-22/blob/main/disciplinary.csv) | | `distributon.csv` | Pass stats | [View File](https://huggingface.co/datasets/yanivohayon1/ucl-2021-22/blob/main/distributon.csv) | | `goalkeeping.csv` | Goalkeeper stats | [View File](https://huggingface.co/datasets/yanivohayon1/ucl-2021-22/blob/main/goalkeeping.csv) | | `goals.csv` | Goal details | [View File](https://huggingface.co/datasets/yanivohayon1/ucl-2021-22/blob/main/goals.csv) | | `key_stats.csv` | General stats | [View File](https://huggingface.co/datasets/yanivohayon1/ucl-2021-22/blob/main/key_stats.csv) | | `notebook_1.ipynb` | Full EDA Notebook | [View Notebook](https://huggingface.co/datasets/yanivohayon1/ucl-2021-22/blob/main/notebook_1.ipynb) |

language: - 英语 tags: - 足球 - 体育运动 - 欧洲冠军联赛(UEFA Champions League,简称UCL) size_categories: - 1000 < 样本量 < 10000 # 2021/22赛季欧洲冠军联赛(UEFA Champions League,简称UCL)球员统计数据集 ## 数据集概览 本数据集收录了2021/22赛季欧洲冠军联赛的球员统计数据。 数据集包含8个逗号分隔值(CSV)文件,涵盖球员赛场表现的不同维度,所有文件总计3524行数据。合并所有文件后,最终可用数据集包含751名独特球员的44项统计字段。 **数据来源:** [Kaggle — 2021/22赛季欧洲冠军联赛赛事与球员数据](https://www.kaggle.com/datasets/azminetoushikwasi/ucl-202122-uefa-champions-league) **目标变量:** `位置` — 前锋/中场/后卫/守门员 ## 特征选择 本数据集包含8个独立文件,每个文件对应球员表现的一个维度。本次分析所用的核心特征如下: **进攻维度:** 进球数、助攻数、射门尝试次数、射正次数 **防守维度:** 抢断、解围、封堵、拦截 **纪律维度:** 黄牌数、红牌数、犯规次数 **传球维度:** 传球总数、传球成功率 **守门维度:** 扑救数、零封场次、失球数 **核心统计:** 出场时长、出场场次、位置、所属俱乐部 ## 文件列表 | 文件名 | 描述 | |------|-------------| | `attacking.csv` | 进球、助攻、射门尝试数据 | | `attempts.csv` | 射门尝试细分数据 | | `defending.csv` | 抢断、解围、封堵数据 | | `disciplinary.csv` | 黄牌、红牌、犯规数据 | | `distributon.csv` | 传球、传球成功率数据 | | `goalkeeping.csv` | 扑救、零封数据 | | `goals.csv` | 进球详情数据 | | `key_stats.csv` | 球员基础统计数据 | | `notebook_1.ipynb` | 完整探索性数据分析笔记 | ## 数据清洗与处理决策 **步骤1 — 数据合并:** 以`球员姓名`、`所属俱乐部`和`位置`作为关联键,将8个CSV文件合并为一个主数据框。最终得到包含751名独特球员、共44个字段的数据集。 **步骤2 — 缺失值处理:** 经检测,守门相关字段(扑救数、零封场次等)存在92.9%的缺失值——这属于合理情况,因为751名球员中仅53人为守门员。针对非守门员球员,将此类缺失值填充为0;其余数值型字段则按球员位置的中位数进行填充。最终实现所有字段无缺失值。 **步骤3 — 重复值检查:** 未发现重复行数据。 **步骤4 — 缩放与归一化:** 对各核心字段的取值范围进行检查后发现,不同字段的数值尺度差异显著(例如出场时长最高可达1230,而进球数最高仅为15)。由于本次仅开展探索性数据分析(Exploratory Data Analysis,简称EDA)而非构建机器学习模型,因此未对数据进行归一化处理,保留原始尺度以保证数据的可解释性。 **步骤5 — 异常值检测:** 采用四分位距(Interquartile Range,简称IQR)法检测异常值,并通过箱线图进行可视化。最终在进球字段中发现183个异常值,助攻字段中发现176个异常值。经判断,此类异常值均为真实的顶级球员表现(例如打入15球的本泽马),而非数据错误,因此保留所有异常值。 ## 研究问题与可视化洞察 ### 问题1:不同位置球员的统计特征有何独特性? ![Q1](1_position_profile.png) 前锋在所有进攻统计维度上均占据领先地位,场均进球数达1.23粒。守门员的平均出场时长最长(场均418分钟)。每个位置均拥有清晰独特的统计特征,与其在球场上的职能定位高度匹配。 ### 问题2:射门尝试次数越多进球数就越多吗?还是射门精度更为关键? ![Q2](q2_shots_vs_goals.png) 射正次数与进球数的相关性(皮尔逊相关系数r=0.85)显著高于射门总次数与进球数的相关性(r=0.75)。这证实了射门精度比射门数量更为重要——射门次数更少但精度更高的球员往往能打入更多进球。 ### 问题3:球员位置是否会影响其犯规次数? ![Q3](q3_fouls_cards.png) 中场球员的犯规次数最多,但后卫球员的场均黄牌数最高(场均0.065张)。这表明与中场犯规相比,防守站位带来的犯规往往更具犯规风险。 ### 问题4:哪家俱乐部的射门效率最高——即单次射门转化为进球的比例最高? ![Q4](q4efficientclubs.png) 比利亚雷亚尔是射门效率最高的俱乐部,其射门转化率达15%。令人意外的是,皇家马德里、拜仁慕尼黑等顶级豪门并未跻身榜单前列——这表明射门效率并非完全取决于球队拥有顶尖球员。 ### 问题5:助攻数更多的俱乐部进球数也必然更多吗? ![Q5](q5.png) 是的——在俱乐部层面,助攻数与进球数之间存在近乎完美的相关性(r=0.98)。这表明团队配合是欧洲冠军联赛进攻端取得成功的关键。 ### 问题6:球员位置是否会影响其进球贡献(进球数+助攻数)? ![Q6](6.png) 前锋与中场球员的进球贡献合计占总贡献的84%(分别为42.5%和41.3%)。守门员的进球贡献仅占0.2%,完全符合预期。 ## 核心发现总结 - 前锋场均进球数达1.23粒,显著高于其他所有位置 - 射门精度(r=0.85)相比射门总次数(r=0.75)能更好地预测进球数 - 尽管中场球员犯规次数更多,但后卫球员的场均黄牌数最高 - 比利亚雷亚尔是射门效率最高的俱乐部,射门转化率达15% - 助攻数与进球数之间存在近乎完美的相关性(r=0.98),团队配合是进攻成功的关键 - 前锋与中场球员的进球贡献合计占总贡献的84% ## 最终结论 本次探索性数据分析完整呈现了2021/22赛季欧洲冠军联赛的球员赛场表现。分析结果表明,球员位置是其统计特征最核心的预测因素:前锋与中场球员主导了进球贡献,而后卫与守门员则承担着完全不同的赛场职能。射门效率远比射门数量重要——比利亚雷亚尔的案例证明,精准把握机会远比创造大量机会更为关键。最后,团队配合是进攻端取得成功的核心动力,助攻数更多的俱乐部几乎总能打入更多进球。 ## 挑战与经验教训 **挑战与困难:** 我在第一学期期间正在服兵役,未能系统学习Python编程。本次项目的主要挑战是在项目开展前及进行过程中,借助Claude AI的帮助学习如何正确编写代码。此外,合并8个字段结构不同的文件时反复出现错误,耗费了大量时间解决。 **经验教训:** 项目初期我未采用系统化、结构化的工作方式,导致随着项目推进难度不断升级,最终不得不重新开始。本次项目让我收获的最重要经验是:采用系统化、结构化的方式循序渐进地开展工作,是取得成功的基础。 ## 📂 项目文件与交付物 | 文件名 | 描述 | 链接 | |------|-------------|------| | `attacking.csv` | 进攻统计数据 | [查看文件](https://huggingface.co/datasets/yanivohayon1/ucl-2021-22/blob/main/attacking.csv) | | `attempts.csv` | 射门尝试数据 | [查看文件](https://huggingface.co/datasets/yanivohayon1/ucl-2021-22/blob/main/attempts.csv) | | `defending.csv` | 防守统计数据 | [查看文件](https://huggingface.co/datasets/yanivohayon1/ucl-2021-22/blob/main/defending.csv) | | `disciplinary.csv` | 犯规与红黄牌数据 | [查看文件](https://huggingface.co/datasets/yanivohayon1/ucl-2021-22/blob/main/disciplinary.csv) | | `distributon.csv` | 传球统计数据 | [查看文件](https://huggingface.co/datasets/yanivohayon1/ucl-2021-22/blob/main/distributon.csv) | | `goalkeeping.csv` | 守门员统计数据 | [查看文件](https://huggingface.co/datasets/yanivohayon1/ucl-2021-22/blob/main/goalkeeping.csv) | | `goals.csv` | 进球详情数据 | [查看文件](https://huggingface.co/datasets/yanivohayon1/ucl-2021-22/blob/main/goals.csv) | | `key_stats.csv` | 基础统计数据 | [查看文件](https://huggingface.co/datasets/yanivohayon1/ucl-2021-22/blob/main/key_stats.csv) | | `notebook_1.ipynb` | 完整探索性数据分析笔记 | [查看笔记](https://huggingface.co/datasets/yanivohayon1/ucl-2021-22/blob/main/notebook_1.ipynb) |
提供机构:
yanivohayon1
搜集汇总
数据集介绍
main_image_url
构建方式
在足球数据分析领域,数据集的构建往往依赖于对原始比赛统计的系统性整合。该数据集源自Kaggle平台,涵盖了2021/22赛季欧洲冠军联赛的球员表现数据。构建过程中,研究者将八个独立的CSV文件进行合并,这些文件分别记录了进攻、防守、纪律、传球、守门等不同维度的统计指标。通过以球员姓名、俱乐部和位置为关键字段进行数据融合,最终形成了一个包含751名独特球员、44个特征列的主数据集。在数据清洗阶段,针对守门员相关特征存在大量缺失值的情况,采用了按位置中位数填充的策略,确保了数据的完整性与一致性。
特点
该数据集的特点体现在其多维度的统计覆盖与精细的结构设计上。数据集共包含751名球员的详细记录,覆盖了从进攻效率到防守贡献的广泛指标,如进球、助攻、抢断、传球准确率等。这些特征不仅反映了球员的个人表现,还揭示了位置角色对统计分布的深刻影响。例如,前锋在进攻指标上占据主导,而守门员则在出场时间上表现突出。数据集的另一个显著特点是其保留了原始尺度,未进行归一化处理,这使得指标如出场分钟数与进球数之间的差异得以直观呈现,有助于保持数据的可解释性。此外,数据集中包含的异常值,如精英球员的高进球数,被确认为真实表现而非误差,进一步增强了数据的可靠性。
使用方法
在体育科学的研究中,该数据集为探索球员表现与位置角色之间的关系提供了丰富资源。使用者可通过加载合并后的主数据集,进行探索性数据分析,以揭示不同位置球员的统计特征差异。例如,可以分析射门准确性与进球数的相关性,或比较各俱乐部在进攻效率上的表现。数据集支持基于位置的目标变量分类研究,如预测球员位置或评估团队协作对进球贡献的影响。此外,附带的完整EDA笔记本为初学者提供了数据清洗、可视化和统计检验的参考范例,有助于快速上手并开展深入的足球数据分析项目。
背景与挑战
背景概述
在体育数据分析领域,足球赛事统计资料的整合与挖掘已成为评估球员表现、优化战术策略的重要基础。UCL 2021/22数据集由研究人员于2021至2022赛季期间构建,聚焦于欧洲冠军联赛这一顶级俱乐部赛事。该数据集收录了751名球员在进攻、防守、纪律、传球及守门等多维度的44项统计指标,旨在揭示不同位置球员的独特统计特征及其对比赛结果的影响。通过系统性的数据清洗与探索性分析,该数据集为理解现代足球中位置角色与团队效率的关联提供了实证依据,推动了体育科学在绩效量化方面的深入研究。
当前挑战
该数据集致力于解决足球运动员表现分析与位置角色建模的复杂问题,其核心挑战在于如何从异构的统计指标中准确提取位置特异性模式,并克服数据稀疏性与尺度差异带来的干扰。在构建过程中,研究人员面临多源文件合并的结构性难题,尤其是各CSV文件列名不一致导致的整合错误;同时,守门员相关字段存在高达92.9%的缺失值,需通过基于位置的插补策略保持数据完整性。此外,统计指标量纲差异显著,如出场时间与进球数跨度悬殊,为后续的标准化分析与模型构建埋下潜在障碍。
常用场景
经典使用场景
在足球数据分析领域,UCL 2021/22数据集为研究者提供了探索球员表现与场上位置关联的宝贵资源。该数据集通过整合进攻、防守、纪律性等多维度统计指标,常用于构建球员表现画像,揭示不同位置球员在欧冠赛事中的角色差异。例如,分析前锋与中场球员在进球贡献上的主导地位,或评估防守球员在犯规与黄牌数量上的独特模式,这些经典应用场景深化了对现代足球战术体系的理解。
衍生相关工作
围绕该数据集衍生的经典工作包括基于机器学习的球员位置预测模型、团队效率对比分析框架以及多赛季表现趋势研究。这些工作扩展了足球数据科学的边界,例如通过聚类方法识别球员角色亚型,或构建回归模型预测进球贡献。相关成果常发表于体育工程与统计学期刊,促进了跨学科方法在运动分析中的融合。
数据集最近研究
最新研究方向
在足球数据分析领域,UCL 2021/22数据集正推动基于球员位置的多维性能建模成为前沿热点。研究聚焦于利用机器学习技术,如聚类分析与预测算法,深入挖掘不同位置球员的统计特征,从而揭示进攻效率与防守策略之间的复杂关联。当前探索方向包括整合时空数据以评估球员动态表现,并结合团队协作指标,如助攻与进球的强相关性,来优化战术决策系统。这些进展不仅为职业足球的精准训练与转会市场提供数据支撑,也呼应了体育科学中量化分析日益重要的趋势,彰显了数据驱动方法在提升竞技表现方面的深远影响。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作