SLURP

arXiv2025-09-30 收录

下载链接：

https://github.com/speechbrain/speechbrain/tree/develop/recipes/SLURP

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集名为SLURP，是一个面向公众的多领域数据集，专为端到端的口语理解（E2E-SLU）而设计。它包含了用户与家用助手的单轮对话的音频记录。为了解决数据标签的不一致性，创建了名为SLURP F的清洁子集。由于标签不一致，约30%的原始数据被丢弃。相较于其他口语理解资源，该数据集在规模上更大，多样性更强。其任务是进行口语理解中的意图分类。

This dataset, named SLURP, is a public-facing multi-domain dataset specifically designed for end-to-end spoken language understanding (E2E-SLU). It contains audio recordings of single-turn dialogues between users and home assistants. To address the issue of inconsistent data labeling, a clean subset named SLURP F was created, and approximately 30% of the original data was discarded due to such labeling inconsistencies. Compared with other spoken language understanding resources, this dataset boasts a larger scale and greater diversity. Its targeted task is intent classification in spoken language understanding.

搜集汇总

数据集介绍

背景与挑战

背景概述

SLURP是一个专注于口语理解（SLU）的数据集，提供多种食谱用于模型训练和评估，包括直接映射、分词器和自然语言理解。数据集还包含详细的性能指标和推理接口，支持多种模型如HuBert，并提供了训练时间和模型链接。

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集