five

MikCil/f1-team-radio

收藏
Hugging Face2026-03-29 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/MikCil/f1-team-radio
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-4.0 task_categories: - automatic-speech-recognition - audio-classification language: - en tags: - f1 - formula-1 - formula-one - team-radio - motorsport - racing - speech - audio pretty_name: F1 Team Radio Transcriptions size_categories: - 10K<n<100K --- # F1 Team Radio Dataset A comprehensive dataset of Formula 1 team radio communications with transcriptions. ## Dataset Description This dataset contains team radio audio clips from Formula 1 races along with their text transcriptions. Team radio communications are the real-time messages exchanged between F1 drivers and their pit wall engineers during race weekends. ## Dataset Statistics | Metric | Value | |--------|-------| | Total audio clips | 14,681 | | Grand Prix events | 149 | | Unique drivers | 43 | | Date range | 2018-03-25 to 2025-12-07 | ### Top Drivers by Message Count | Driver ID | Messages | |-----------|----------| | LEWHAM01 | 1,685 | | MAXVER01 | 1,494 | | LANNOR01 | 1,137 | | CARSAI01 | 898 | | CHALEC01 | 754 | | GEORUS01 | 717 | | VALBOT01 | 686 | | DANRIC01 | 673 | | SERPER01 | 613 | | PIEGAS01 | 557 | ## Data Fields | Field | Type | Description | |-------|------|-------------| | `id` | `string` | Unique identifier for each radio message | | `driver_id` | `string` | Driver code (e.g., `MAXVER01` for Max Verstappen) | | `racing_number` | `string` | Driver's car number | | `grand_prix` | `string` | Full Grand Prix name (e.g., "2024 Monaco Grand Prix") | | `race_id` | `string` | Race identifier (e.g., `2024_Monaco_Grand_Prix`) | | `session_date` | `string` | Date of the session (YYYY-MM-DD) | | `message_timestamp` | `string` | UTC timestamp of the message | | `audio` | `Audio` | Audio clip (MP3, resampled to 16kHz) | | `transcription` | `string` | Text transcription of the radio message | ## Driver ID Format Driver IDs follow the official F1 format: **first 3 letters of surname + first 3 letters of first name + identifier number**. Examples: - `MAXVER01` → Max Verstappen - `LEWHAM01` → Lewis Hamilton - `CHALEC01` → Charles Leclerc - `LANNOR01` → Lando Norris ## Usage ```python from datasets import load_dataset # Load the dataset ds = load_dataset("MikCil/f1-team-radio", split="train") # View a sample print(ds[0]) # Filter by driver verstappen = ds.filter(lambda x: x["driver_id"] == "MAXVER01") # Filter by race monaco_2024 = ds.filter(lambda x: "Monaco" in x["grand_prix"]) ``` ### Playing Audio ```python from IPython.display import Audio as IPythonAudio sample = ds[0] IPythonAudio( sample["audio"]["array"], rate=sample["audio"]["sampling_rate"] ) ``` ### Fine-tuning ASR Models This dataset can be used to fine-tune speech recognition models on F1-specific vocabulary (driver names, technical terms, etc.) ```python from transformers import WhisperForConditionalGeneration, WhisperProcessor ``` ## Transcription Method Audio files were transcribed using [Cohere Transcribe 03-2026](https://huggingface.co/CohereLabs/cohere-transcribe-03-2026), an efficient open-source automatic speech recognition model. ## License This dataset is released under the [CC BY 4.0 License](https://creativecommons.org/licenses/by/4.0/). ## Citation ```bibtex @dataset{f1_team_radio, author = {Michele Ciletti}, title = {F1 Team Radio Dataset}, year = {2025}, publisher = {Hugging Face}, howpublished = {\url{https://huggingface.co/datasets/MikCil/f1-team-radio}} } ``` ## Acknowledgments - Formula 1 for the original broadcasts - Cohere Labs for transcription
提供机构:
MikCil
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作