ThBel/Utterly
收藏Hugging Face2025-12-16 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/ThBel/Utterly
下载链接
链接失效反馈官方服务:
资源简介:
Utterly是一个语音数据集,来源于pipecat-ai/human_5_all和pipecat-ai/smart-turn-data-v3.1-train两个基础数据集。它包含超过7.1k条完整和不完整的英语语音记录,每条记录都带有转轮级别的注释,包括Whisper生成的逐字转录、转轮结束标记和即将添加的说话者标识符。该数据集旨在支持需要联合建模语音识别和对话轮换的语音和对话系统的研究和开发,如流式ASR系统、语义转轮结束检测和实时对话代理。数据集的语言为英语,模态包括音频(语音;单声道;16kHz采样)和文本,交互类型为人类对话语音。数据集的结构包括音频路径、转录文本、说话者ID和转轮完成标志。数据集的使用案例包括自动语音识别、语义转轮结束建模、对话AI中的轮换和地板控制研究以及需要低延迟响应时间的语音助手和对话系统。
Utterly is a speech dataset derived from pipecat-ai/human_5_all and pipecat-ai/smart-turn-data-v3.1-train. It contains over 7.1k recordings of complete and partial English utterances, each augmented with turn-level annotations, including verbatim Whisper-generated transcripts, end-of-turn markers, and speaker identifiers (coming soon). The dataset is designed to support research and development of speech and dialogue systems that require joint modeling of speech recognition and conversational turn-taking, such as streaming ASR systems, semantic end-of-turn detection and real-time conversational agents. The language of the dataset is English, and the modalities include audio (speech; mono-channel; sampled at 16kHz) and text, with the interaction type being human conversational speech. The dataset structure includes audio paths, transcript texts, speaker IDs, and turn completion flags. The intended use cases of the dataset include automatic speech recognition, semantic end-of-turn modeling, turn-taking and floor-control research in conversational AI, and voice assistants and dialogue systems requiring low-latency response timing.
提供机构:
ThBel



