Multi-Season Formula 1 Lap Dataset with Race-Control-Based Safety Car Labels (2022-2025)
收藏NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://data.mendeley.com/datasets/djr8rnjtjp
下载链接
链接失效反馈官方服务:
资源简介:
Containing lap-level race data for the 2022-2025 Formula 1 seasons, the year-race-session data were collected using FastF1, public interface that provides access to timing and race information publicly available similarly on the web. Include only official race sessions! The data for each event were downloaded and then all of the events were merged into a single structured dataset. After eliminating invalid or incomplete laps and cleaning out missing values, the final dataset contains 89,980 driver lap observations. Each line represents one driver on one laps.
The dataset includes such contextual and performance variables as lap number, race progression, tire compound, tire life, driver position, team ID, environmental variables including (air temperature, track temperature, humidity, rainfall) etc. We applied certain basic feature engineering processes in order to make the data model-friendly. Safety Car (SC) and Virtual Safety Car (VSC) labels were constructed by parsing official race control messages and aligning them with lap-level records. This makes it possible to model rare events at the level of the lap. The dataset also supports forward-looking targets (for instance, detection of a Safety Car within the next few laps) for probabilistic risk forecasting. The dataset's structure allows experiments with event-level validation. This means that entire Grand Prix events can be separated during training and testing to avoid leakage between laps in the same event. The data set is intended for academic research, it is a set of processing and structured derivative of public interface data FastF1 that was obtained via timing.
本数据集涵盖2022至2025赛季一级方程式赛车(Formula 1)的单圈级赛事数据。所有按年份、赛事、回合划分的数据均通过FastF1采集,该公共接口可获取与赛事官网公开的计时及赛事信息同源的内容,且仅收录官方赛事回合数据。各赛事的数据均先单独下载,随后合并为单一结构化数据集。经剔除无效或不完整单圈、清理缺失值后,最终数据集共包含89980条车手单圈观测记录,每条记录对应一位车手的单圈赛事数据。
数据集包含如下上下文与性能变量:圈数、赛事进程、轮胎配方(tire compound)、轮胎使用时长、车手位次、车队ID,以及环境变量(包括气温、赛道温度、湿度、降雨状况等)。为使数据适配模型训练需求,我们开展了基础特征工程处理。通过解析官方赛事控制指令,并将其与单圈级记录对齐,我们构建了安全车(Safety Car, SC)与虚拟安全车(Virtual Safety Car, VSC)标签,此举可支持在单圈级别对罕见事件进行建模。本数据集还支持前瞻性预测任务,例如基于未来若干圈内是否出现安全车的检测,开展概率性风险预测。数据集的结构支持赛事级验证实验,即可在训练与测试阶段拆分完整的大奖赛赛事,从而避免同一场赛事内的单圈数据出现信息泄露问题。本数据集仅用于学术研究,其源自通过计时接口获取的FastF1公开数据,经处理与结构化衍生而来。
创建时间:
2026-02-16



