STOP
收藏arXiv2022-10-19 更新2024-06-21 收录
下载链接:
https://github.com/facebookresearch/fairseq/tree/main/examples/audio_nlp/nlu
下载链接
链接失效反馈官方服务:
资源简介:
STOP数据集是由Meta AI创建的,是目前公开的最大且最复杂的端到端口语理解数据集。该数据集包含超过236,000个音频文件,涵盖885名不同说话者,支持复杂的查询和多层次的意图解析。创建过程中,利用了Amazon Mechanical Turk进行音频录制,并通过ASR系统进行质量控制。STOP数据集主要用于研究和改进端到端口语理解系统,特别是在资源有限和领域适应性方面的应用,旨在解决传统ASR和NLU系统中的错误传播和信息丢失问题。
The STOP dataset, developed by Meta AI, is currently the largest and most complex publicly available end-to-end spoken language understanding dataset. It contains over 236,000 audio files spanning 885 distinct speakers, and supports complex queries as well as multi-level intent parsing. During its creation, Amazon Mechanical Turk was utilized for audio recording, and ASR systems were adopted for quality control. The STOP dataset is primarily intended for researching and enhancing end-to-end spoken language understanding systems, particularly for applications in resource-constrained and domain adaptation scenarios, with the goal of addressing the issues of error propagation and information loss inherent in traditional ASR and NLU systems.
提供机构:
Meta AI
创建时间:
2022-06-29



