FIFA 21 messy, raw dataset for cleaning/ exploring
收藏www.kaggle.com2020-10-22 更新2025-03-25 收录
下载链接:
https://www.kaggle.com/yagunnersya/fifa-21-messy-raw-dataset-for-cleaning-exploring
下载链接
链接失效反馈官方服务:
资源简介:
### Context
Kaggle is notorious for providing pure, clean datasets ready for analysis and model building.
So here I present to you a veeeeery messy and raw dataset of EA Sports' latest installment of their hit FIFA series - FIFA21, which I scraped from sofifa.com
### Content
One of the challenges of web scraping is unclean data, and it natural, really. Different front-end developers write the HTML their own way, and that makes the incoming data unpredictable.
You'll definitely learn a lot about data cleaning with this dataset.
### Acknowledgements
A huge round of applause for sofifa.com for providing this amazing data!
### Inspiration
1. Convert the height and weight columns to numerical forms
2. Remove the unnecessary newline characters from all columns that have them.
3. Based on the 'Joined' column, check which players have been playing at a club for more than 10 years!
4. 'Value', 'Wage' and "Release Clause' are string columns. Convert them to numbers. For eg, "M" in value column is Million, so multiply the row values by 1,000,000, etc.
5. Some columns have 'star' characters. Strip those columns of these stars and make the columns numerical
6. Which players are highly valuable but still underpaid (on low wages)? (hint: scatter plot between wage and value)
Ask more questions yourself !
Hope it helps! :) If you like this dataset, please show your support by upvoting this dataset! Thanks! :)
Kaggle以其提供纯粹、干净且适用于分析和模型构建的数据集而著称。然而,在此,我向您呈现了一个极其混乱且原始的数据集,该数据集取自EA Sports旗下热门足球系列游戏FIFA的最新版本——FIFA21,我通过sofifa.com网站进行了数据抓取。
数据内容方面,网络抓取的一大挑战在于数据的杂乱无章,这是自然而然的事情。不同的前端开发者以各自的方式编写HTML代码,这导致接收到的数据难以预测。您在使用本数据集的过程中无疑将深刻体会到数据清洗的重要性。
在致谢方面,我们对sofifa.com提供这一杰出数据表示由衷的感谢。
在灵感启发方面,以下是一些建议:
1. 将身高和体重列转换为数值形式。
2. 从所有包含换行符的列中移除不必要的换行字符。
3. 根据‘加入’列,检查哪些球员在俱乐部效力超过10年。
4. ‘价值’、‘工资’和‘解约金’列目前为字符串格式,应将其转换为数值。例如,在‘价值’列中的‘M’代表百万,因此需将行值乘以1,000,000等。
5. 一些列包含星号字符。应去除这些星号,使列数据数值化。
6. 哪些球员价值高昂却仍处于低薪水平?(提示:在工资和价值之间绘制散点图。)
7. 请您自行提出更多问题!
衷心希望这对您有所帮助!如果您喜欢这个数据集,请通过为该数据集点赞来表达您的支持。谢谢!
提供机构:
www.kaggle.com



