hhoangphuoc/ami-av
收藏Hugging Face2025-04-10 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/hhoangphuoc/ami-av
下载链接
链接失效反馈官方服务:
资源简介:
这是一个来自AMI会议语料库的音频-视觉处理数据集。数据集基于个人[会议ID]-[说话人ID]字幕被分割成句子级别的音频/视频片段。该数据集用于音频-视觉语音识别任务(AVSR),尤其是针对自发对话语音。数据集总共包含83,438个片段,包括音频、视频或两者的组合。音频被分割并重采样为16kHz的.wav格式,视频被重采样为25fps的.mp4格式。
This is the processed Audio-Visual Dataset from AMI Meeting Corpus. The dataset was segmented into sentence-level audio/video segments based on the individual [meeting_id]-[speaker_id] transcripts. The purpose of this data is for audio-visual speech recognition task (AVSR), particularly for spontaneous conversational speech. The total number of segments in the dataset is 83,438, which include audio, video, or both. The audio is segmented and resampled to 16kHz, `.wav` format, and the videos are resampled to 25fps, in `.mp4` format.
提供机构:
hhoangphuoc



