AISHELL-4 多通道中文会议语音数据库

超神经2024-02-27 更新2024-05-15 收录

下载链接：

https://hyper.ai/cn/datasets/29375

下载链接

链接失效反馈

官方服务：

资源简介：

AISHELL-4 是由 8 通道圆形麦克风阵列收集的大型真实录制的普通话语音数据集，用于会议场景中的语音处理。该数据集由 211 个录制的会议会议组成，每个会议会议包含 4 至 8 名发言者，总时长为 120 小时。该数据集旨在从三个方面结合多说话人处理的先进研究和实际应用场景。通过真实录制的会议，AISHELL-4 在对话中提供逼真的声学效果和丰富的自然语音特征，如短暂停顿、语音重叠、发言者快速转向、噪音等。同时，AISHELL 中为每个会议提供准确的转录和发言者语音活动。这使得研究人员能够探索会议处理的不同方面，从语音前端处理、语音识别和说话人二值化等单独任务，到相关任务的多模态建模和联合优化。研究团队还发布了基于 PyTorch 的培训和评估框架作为基线系统，以促进该领域的可重复研究。

AISHELL-4 is a large-scale real-recorded Mandarin speech dataset collected via an 8-channel circular microphone array, intended for speech processing in conference scenarios. The dataset consists of 211 recorded conference sessions, each involving 4 to 8 speakers, with a total duration of 120 hours. This dataset aims to combine advanced research on multi-speaker processing with real-world application scenarios from three perspectives. Through authentic recorded conference conversations, AISHELL-4 provides realistic acoustic environments and rich natural speech features, such as short pauses, overlapping speech, rapid speaker turns, background noise and other typical conversational phenomena. Meanwhile, accurate transcriptions and speaker voice activity annotations are provided for each conference session in AISHELL-4. This enables researchers to explore diverse aspects of conference processing, ranging from individual tasks such as speech front-end processing, speech recognition and speaker diarization, to multimodal modeling and joint optimization of related tasks. The research team also released a PyTorch-based training and evaluation framework as a baseline system to facilitate reproducible research in this field.

创建时间：

2024-02-07

搜集汇总

数据集介绍

背景与挑战

背景概述

AISHELL-4是一个大型多通道中文会议语音数据集，包含211个会议录音（总时长120小时），具有4-8名说话人、真实声学效果和自然语音特征（如语音重叠、噪音等），适用于语音处理和多模态建模研究。数据集提供准确转录和发言者语音活动，并包含PyTorch基线框架。

以上内容由遇见数据集搜集并总结生成