MikCil/f1-team-radio
收藏Hugging Face2026-03-29 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/MikCil/f1-team-radio
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-4.0
task_categories:
- automatic-speech-recognition
- audio-classification
language:
- en
tags:
- f1
- formula-1
- formula-one
- team-radio
- motorsport
- racing
- speech
- audio
pretty_name: F1 Team Radio Transcriptions
size_categories:
- 10K<n<100K
---
# F1 Team Radio Dataset
A comprehensive dataset of Formula 1 team radio communications with transcriptions.
## Dataset Description
This dataset contains team radio audio clips from Formula 1 races along with their text transcriptions. Team radio communications are the real-time messages exchanged between F1 drivers and their pit wall engineers during race weekends.
## Dataset Statistics
| Metric | Value |
|--------|-------|
| Total audio clips | 14,681 |
| Grand Prix events | 149 |
| Unique drivers | 43 |
| Date range | 2018-03-25 to 2025-12-07 |
### Top Drivers by Message Count
| Driver ID | Messages |
|-----------|----------|
| LEWHAM01 | 1,685 |
| MAXVER01 | 1,494 |
| LANNOR01 | 1,137 |
| CARSAI01 | 898 |
| CHALEC01 | 754 |
| GEORUS01 | 717 |
| VALBOT01 | 686 |
| DANRIC01 | 673 |
| SERPER01 | 613 |
| PIEGAS01 | 557 |
## Data Fields
| Field | Type | Description |
|-------|------|-------------|
| `id` | `string` | Unique identifier for each radio message |
| `driver_id` | `string` | Driver code (e.g., `MAXVER01` for Max Verstappen) |
| `racing_number` | `string` | Driver's car number |
| `grand_prix` | `string` | Full Grand Prix name (e.g., "2024 Monaco Grand Prix") |
| `race_id` | `string` | Race identifier (e.g., `2024_Monaco_Grand_Prix`) |
| `session_date` | `string` | Date of the session (YYYY-MM-DD) |
| `message_timestamp` | `string` | UTC timestamp of the message |
| `audio` | `Audio` | Audio clip (MP3, resampled to 16kHz) |
| `transcription` | `string` | Text transcription of the radio message |
## Driver ID Format
Driver IDs follow the official F1 format: **first 3 letters of surname + first 3 letters of first name + identifier number**.
Examples:
- `MAXVER01` → Max Verstappen
- `LEWHAM01` → Lewis Hamilton
- `CHALEC01` → Charles Leclerc
- `LANNOR01` → Lando Norris
## Usage
```python
from datasets import load_dataset
# Load the dataset
ds = load_dataset("MikCil/f1-team-radio", split="train")
# View a sample
print(ds[0])
# Filter by driver
verstappen = ds.filter(lambda x: x["driver_id"] == "MAXVER01")
# Filter by race
monaco_2024 = ds.filter(lambda x: "Monaco" in x["grand_prix"])
```
### Playing Audio
```python
from IPython.display import Audio as IPythonAudio
sample = ds[0]
IPythonAudio(
sample["audio"]["array"],
rate=sample["audio"]["sampling_rate"]
)
```
### Fine-tuning ASR Models
This dataset can be used to fine-tune speech recognition models on F1-specific vocabulary (driver names, technical terms, etc.)
```python
from transformers import WhisperForConditionalGeneration, WhisperProcessor
```
## Transcription Method
Audio files were transcribed using [Cohere Transcribe 03-2026](https://huggingface.co/CohereLabs/cohere-transcribe-03-2026), an efficient open-source automatic speech recognition model.
## License
This dataset is released under the [CC BY 4.0 License](https://creativecommons.org/licenses/by/4.0/).
## Citation
```bibtex
@dataset{f1_team_radio,
author = {Michele Ciletti},
title = {F1 Team Radio Dataset},
year = {2025},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/datasets/MikCil/f1-team-radio}}
}
```
## Acknowledgments
- Formula 1 for the original broadcasts
- Cohere Labs for transcription
提供机构:
MikCil



