five

ATCO2 Project Data

收藏
catalogue.elra.info2025-03-22 收录
下载链接:
https://catalogue.elra.info/en-us/repository/browse/ELRA-S0484/
下载链接
链接失效反馈
官方服务:
资源简介:
ATCO2 project aims at developing a unique platform allowing to collect, organize and pre-process air-traffic control (voice communication) data from air space. This project has received funding from the Clean Sky 2 Joint Undertaking (JU) under grant agreement No 864702. The JU receives support from the European Union’s Horizon 2020 research and innovation programme and the Clean Sky 2 JU members other than the Union. The project collected the real-time voice communication between air-traffic controllers and pilots available either directly through publicly accessible radio frequency channels or indirectly from air-navigation service providers (ANSPs). In addition to the voice communication data, contextual information is available in a form of metadata (i.e. surveillance data). The dataset consists of two distinct packages:- A corpus of ca. 4000 hours (untranscribed) of air-traffic control speech collected across different airports (Sion, Bern, Zurich, etc.) in .wav format for speech recognition. Speaker distribution is 90/10% between males and females and the group contains native and non-native speakers of English. The raw data, also provided, consists of:Overall size of the dataset (measured after Voice activity detection)- 5281 hours (English + non-English)- 4465 hours (English only)Overall raw size of audio files (sum of wav file lengths):- 6225 hours (English + non-English)- A corpus of ca. 4 hours (transcribed) of air-traffic control speech collected across different airports (Sion, Bern, Zurich, etc.) in .wav format for speech recognition. Speaker distribution is 90/10% between males and females and the group contains native and non-native speakers of English. This corpus has been manually transcribed and automatically annotated with orthographic information in XML format with speaker noise information, SNR values and others. Ca. 1 hour of annotation has followed a human re-checking.

ATCO2项目致力于构建一个独特的平台,旨在收集、整理和预处理空中交通管制(语音通信)数据。该项目已获得Clean Sky 2联合企业(JU)的资助,资助协议编号为864702。JU从欧盟的Horizon 2020研究和创新计划以及除联盟外的Clean Sky 2 JU成员处获得支持。项目收集了空中交通管制员与飞行员之间的实时语音通信数据,这些数据可通过公开可访问的无线电频率通道直接获取,或间接从空中导航服务提供商(ANSPs)获取。除了语音通信数据外,还提供了以元数据(即监控数据)形式存在的上下文信息。该数据集由两个独立的包组成:一是约4000小时(未转录)的空中交通管制语音语料库,收集于不同机场(锡永、伯尔尼、苏黎世等),以.wav格式提供,用于语音识别。说话人分布为男性与女性各占90/10%,该群体包含英语母语者及非母语者。此外,还提供了原始数据,包括:数据集的总大小(在语音活动检测后测量)- 5281小时(英语及非英语)- 4465小时(仅英语);音频文件的总原始大小(wav文件长度的总和)- 6225小时(英语及非英语)。二是约4小时(已转录)的空中交通管制语音语料库,收集于不同机场(锡永、伯尔尼、苏黎世等),以.wav格式提供,用于语音识别。说话人分布为男性与女性各占90/10%,该群体包含英语母语者及非母语者。该语料库已进行人工转录,并以XML格式自动标注了正字信息、说话人噪音信息、信噪比(SNR)值等。约1小时的标注经过人工复查。
提供机构:
ELRA (European Language Resources Association)
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
ATCO2 Project Data是一个航空交通管制语音数据集,包含约4000小时未转录和4小时已转录的语音数据,用于语音识别研究。数据集涵盖多种语言和说话者背景,具有丰富的元数据信息。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作