Football Events
收藏www.kaggle.com2017-01-25 更新2025-01-08 收录
下载链接:
https://www.kaggle.com/secareanualin/football-events
下载链接
链接失效反馈官方服务:
资源简介:
Context
-------------
Most publicly available football (soccer) statistics are limited to aggregated data such as Goals, Shots, Fouls, Cards. When assessing performance or building predictive models, this simple aggregation, without any context, can be misleading. For example, a team that produced 10 shots on target from long range has a lower chance of scoring than a club that produced the same amount of shots from inside the box. However, metrics derived from this simple count of shots will similarly asses the two teams.
A football game generates much more events and it is very important and interesting to take into account the context in which those events were generated. This dataset should keep sports analytics enthusiasts awake for long hours as the number of questions that can be asked is huge.
Content
-------
This dataset is a result of a very tiresome effort of webscraping and integrating different data sources. The central element is the text commentary. All the events were derived by reverse engineering the text commentary, using regex. Using this, I was able to derive 11 types of events, as well as the main player and secondary player involved in those events and many other statistics. In case I've missed extracting some useful information, you are gladly invited to do so and share your findings. The dataset provides a granular view of 9,074 games, totaling 941,009 events from the biggest 5 European football (soccer) leagues: England, Spain, Germany, Italy, France from 2011/2012 season to 2016/2017 season as of 25.01.2017.
There are games that have been played during these seasons for which I could not collect detailed data. Overall, over 90% of the played games during these seasons have event data.
The dataset is organized in 3 files:
- **events.csv** contains event data about each game. Text commentary was scraped from: bbc.com, espn.com and onefootball.com
- **ginf.csv** - contains metadata and market odds about each game. odds were collected from oddsportal.com
- **dictionary.txt** contains a dictionary with the textual description of each categorical variable coded with integers
Past Research
-------------
I have used this data to:
- create predictive models for football games in order to bet on football outcomes.
- make visualizations about upcoming games
- build expected goals models and compare players
Inspiration
-----------
There are tons of interesting questions a sports enthusiast can answer with this dataset. For example:
- What is the value of a shot? Or what is the probability of a shot being a goal given it's location, shooter, league, assist method, gamestate, number of players on the pitch, time - known as expected goals (xG) models
- When are teams more likely to score?
- Which teams are the best or sloppiest at holding the lead?
- Which teams or players make the best use of set pieces?
- In which leagues is the referee more likely to give a card?
- How do players compare when they shoot with their week foot versus strong foot? Or which players are ambidextrous?
- Identify different styles of plays (shooting from long range vs shooting from the box, crossing the ball vs passing the ball, use of headers)
- Which teams have a bias for attacking on a particular flank?
And many many more...
上下文
------------
多数公开可得的足球(足球)统计数据仅限于汇总数据,如进球数、射门次数、犯规、黄牌等。在评估表现或构建预测模型时,这种简单的汇总,缺乏任何上下文,可能会产生误导。例如,一支从远距离射出10次有目标射门的球队,其进球的可能性低于一支在同一区域内射出相同次数射门的俱乐部。然而,从这种简单射门次数的统计中得出的指标,将对两支球队进行类似的评估。
足球比赛产生的赛事繁多,考虑这些事件发生的上下文非常重要且富有趣味。本数据集将使体育数据分析爱好者夜不能寐,因为可以提出的问题数量庞大。
内容
-------
本数据集是经过艰苦的网页抓取和整合不同数据源的结果。核心元素是文本解说。所有事件均通过逆向工程文本解说,使用正则表达式(regex)进行推导。通过这种方式,我能够推导出11种类型的事件,以及参与这些事件的主要球员和次要球员,以及许多其他统计数据。如果我在提取某些有用信息时有所遗漏,我热烈欢迎您进行补充并分享您的发现。该数据集提供了对9,074场比赛的细粒度视图,总计941,009个事件,涵盖2011/2012赛季至2016/2017赛季(截至2017年1月25日)最大的5个欧洲足球(足球)联赛:英格兰、西班牙、德国、意大利、法国。
在这些赛季中,有一些比赛我无法收集详细数据。总体而言,超过90%的赛季比赛都有事件数据。
数据集组织为3个文件:
- **events.csv** 包含关于每场比赛的事件数据。文本解说从:bbc.com、espn.com 和 onefootball.com 抓取。
- **ginf.csv** - 包含关于每场比赛的元数据和赔率。赔率从 oddsportal.com 收集。
- **dictionary.txt** 包含一个字典,其中包含用整数编码的每个分类变量的文本描述。
过去的研究
-------------
我已使用这些数据:
- 创建用于赌球足球结果的预测模型。
- 制作即将到来的比赛的视觉化。
- 构建预期进球模型并比较球员。
灵感
--------
体育爱好者可以使用本数据集回答许多有趣的问题。例如:
- 射门的价值是什么?或者,给定射门的位置、射手、联赛、助攻方式、比赛状态、场上的球员数量、时间,射门的概率是多少(即预期进球模型 xG)。
- 在何时球队更有可能进球?
- 哪些球队在保持领先方面表现得最好或最松懈?
- 哪些球队或球员在利用定位球方面做得最好?
- 在哪个联赛中,裁判员更可能出示黄牌?
- 当球员用非惯用脚射门时,与用惯用脚射门相比,球员的表现如何?或者,哪些球员是双手同用的?
- 识别不同的比赛风格(从远距离射门与从禁区内射门、传中与传球、头球的使用等)。
- 哪些球队在特定边路进攻上有偏见?
等等,等等……
提供机构:
Kaggle
搜集汇总
数据集介绍

背景与挑战
背景概述
Football Events数据集是一个包含欧洲五大足球联赛超过90万比赛事件的详细数据集,覆盖2011-2017赛季,提供了丰富的事件上下文信息,适合进行深入的足球分析和预测建模。
以上内容由遇见数据集搜集并总结生成



