five

AISHELL-4

收藏
OpenDataLab2026-05-24 更新2024-05-09 收录
下载链接:
https://opendatalab.org.cn/OpenDataLab/Aishell4
下载链接
链接失效反馈
官方服务:
资源简介:
AISHELL-4是一个由8通道圆形麦克风阵列收集的大型真实录制的普通话语音数据集,用于会议场景中的语音处理。该数据集由 211 个录制的会议会议组成,每个会议会议包含 4 至 8 名发言者,总时长为 120 小时。该数据集旨在从三个方面架起多说话人处理的高级研究和实际应用场景的桥梁。通过真实录制的会议,AISHELL-4 在对话中提供逼真的声学效果和丰富的自然语音特征,如短暂停顿、语音重叠、发言者快速转向、噪音等。同时,AISHELL 中为每个会议提供准确的转录和发言者语音活动-4。这使得研究人员能够探索会议处理的不同方面,范围从语音前端处理、语音识别和说话人二值化等单独任务,到相关任务的多模态建模和联合优化。我们还发布了基于 PyTorch 的培训和评估框架作为基线系统,以促进该领域的可重复研究。基线系统代码和生成的示例可用 在这里。

AISHELL-4 is a large-scale real-recorded Mandarin speech dataset collected using an 8-channel circular microphone array, designed for speech processing in conference scenarios. It comprises 211 recorded meetings, each involving 4 to 8 speakers, with a total duration of 120 hours. This dataset aims to bridge the gap between advanced research on multi-speaker processing and real-world application scenarios across three dimensions. Through authentic recorded conference sessions, AISHELL-4 provides realistic acoustic characteristics and rich natural speech features in dialogues, such as short pauses, overlapping speech, rapid speaker turn-taking, background noise, and others. Additionally, AISHELL-4 offers accurate transcriptions and speaker activity annotations for each meeting. This enables researchers to explore diverse aspects of conference processing, ranging from standalone tasks including speech front-end processing, speech recognition, and speaker binarization, to multimodal modeling and joint optimization of related tasks. We have also released a PyTorch-based training and evaluation framework as a baseline system to facilitate reproducible research in this field. The baseline system code and generated examples are available here.
提供机构:
OpenDataLab
创建时间:
2023-06-25
搜集汇总
数据集介绍
main_image_url
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作