AMI Meeting Transcription
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/BriansIDP/RTLM
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包括80小时的训练数据,这些数据来自137场会议,每场会议有3至5名发言者,他们使用独立头戴式麦克风进行录音。相应的参考转录文本大约有90万个单词,这些文本被用于AMI语言模型的训练。此外,AMI评估集被用作未见过的测试集,以评估AMI语言模型的表现。规模上,数据集包含80小时的数据、137场会议以及大约90万个单词。任务方面,主要是为自动语音识别进行语言建模。
This dataset includes 80 hours of training data sourced from 137 meetings, each involving 3 to 5 speakers who recorded using independent head-mounted microphones. The corresponding reference transcriptions total approximately 900,000 words, which are used for training the AMI language model. Additionally, the AMI evaluation set is employed as an unseen test set to evaluate the performance of the AMI language model. In terms of scale, this dataset contains 80 hours of data, 137 meetings, and roughly 900,000 words. The core task of this dataset is language modeling for automatic speech recognition (ASR).
提供机构:
AMI Project



