five

bonalor/synthetic_maritime_radio_communication

收藏
Hugging Face2025-12-18 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/bonalor/synthetic_maritime_radio_communication
下载链接
链接失效反馈
官方服务:
资源简介:
MARTTS是一个开源的合成语音语料库,旨在评估和压力测试在海上VHF无线电话环境中运行的自动语音识别(ASR)系统。数据集包含240个现实的多说话者遇险、紧急、搜救和常规海上对话,通过SMCP兼容模板、基于LLM的场景生成、AIS衍生的船名、MMSI标识符和位置、使用Chatterbox TTS模型合成以及多阶段无线电后处理流程生成。数据集模拟了真实的VHF操作条件,包括信道伪影、背景噪声、环境船舶噪声、信号丢失、静噪点击和带宽限制。该数据集旨在在真实海上条件下压力测试和验证ASR系统,特别是在真实数据稀缺或敏感的情况下。据我们所知,这是第一个公开可用的专门用于海上遇险通信的合成数据集。

MARTTS is an open-source synthetic speech corpus designed to evaluate and stress-test Automatic Speech Recognition (ASR) systems operating in maritime VHF radiotelephony environments. The dataset contains 240 realistic multi-speaker distress, urgency, SAR, and routine maritime dialogues, generated through SMCP-compliant templates, LLM-based scenario generation, AIS-derived ship names, MMSI identifiers, and positions, synthesis with the Chatterbox TTS model, and a multi-stage radio post-processing pipeline. The dataset emulates true operational VHF conditions, including channel artifacts, background noise, environmental ship noise, dropouts, squelch clicks, and band-limiting. It is intended for stress-testing and validating ASR systems under realistic maritime conditions where authentic data is scarce or sensitive. This dataset is, to our knowledge, the first publicly available synthetic dataset tailored to maritime distress communication.
提供机构:
bonalor
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作