five

Trelis/ami-2speaker-test

收藏
Hugging Face2026-04-18 更新2026-05-10 收录
下载链接:
https://hf-mirror.com/datasets/Trelis/ami-2speaker-test
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-4.0 task_categories: - automatic-speech-recognition language: - en tags: - multi-speaker - meeting - ami - benchmark size_categories: - n<1K source_datasets: - edinburghcstr/ami --- # AMI 2-Speaker Test Set **Need a voice model for your domain?** Trelis builds custom ASR, TTS, and voice agent pipelines for specialist verticals (legal, medical, finance, construction) and low-resource languages. [Enquire or book a consultation →](https://trelis.com/voice-ai-services/) A 50-clip benchmark for 2-speaker overlapping speech recognition, derived from the [AMI Meeting Corpus](https://huggingface.co/datasets/edinburghcstr/ami) test split. Each clip is 8–28 seconds of real conversational meeting audio reconstructed as a 2-speaker virtual meeting, with separate ground-truth transcripts for each speaker. ## How it was built 1. Stream the **AMI IHM (Individual Headset Microphone)** test split — each meeting has 4 speakers with separate close-mic tracks and time-aligned utterances. 2. For each meeting, pick the **top-2 speakers by total talk time**. 3. Reconstruct each speaker's meeting-length audio track by placing their IHM utterances at their real `begin_time`/`end_time` positions. 4. Sum the two tracks → a 2-speaker "virtual meeting" with **real conversational rhythm, real overlap patterns, and real acoustic levels**. The other 2 speakers in each meeting are dropped. 5. Slide non-overlapping 8–28-second windows over the timeline, respecting utterance boundaries (a window cannot cut an utterance mid-word) and keeping only windows where both picked speakers are present. 6. Apply deterministic text normalisation to the transcripts (collapse spelled-out acronyms like `"X. M. L." → "XML"`, add punctuation, sentence-case, preserve disfluencies). 7. Sample 50 windows with seed 42. No additional noise, reverberation, or speaker remixing is added — the acoustic content is as-recorded in the original meetings. ## Schema | Column | Description | |---|---| | `audio` | 16kHz mono WAV, 8-28s | | `speaker1_text` / `speaker2_text` | Reference transcript per speaker (cased + punctuated) | | `speaker1_target` / `speaker2_target` | Reference with Whisper-style timestamp tokens: `<\|start\|> text<\|end\|>` | | `speaker1_start` / `speaker1_end` | Start/end time of speaker 1's speech within the clip (seconds) | | `speaker2_start` / `speaker2_end` | Same for speaker 2 | | `overlap_ratio` | Fraction of the shorter speaker's talk time that overlaps with the other (0.0 = sequential, 1.0 = full simultaneity) | | `loudness_db` | Always 0.0 — no loudness manipulation | ## Statistics - 50 clips, from 16 AMI test meetings (4 meeting groups × 4 sessions) - Overlap ratios range 0.0 to ~0.9, median ~0.3 - Speaker pairs are fully disjoint from the AMI train / validation splits ## Baselines [`Trelis/Chorus-v1`](https://huggingface.co/Trelis/Chorus-v1): CER 9.35% / CMER 9.00% mean across both speakers. Per-row predictions at [`Trelis/chorus-v1-ami-2speaker-test-preds`](https://huggingface.co/datasets/Trelis/chorus-v1-ami-2speaker-test-preds). ## Usage ```python from datasets import load_dataset, Audio ds = load_dataset("Trelis/ami-2speaker-test", split="train") # If torchcodec is not installed, use decode=False and read bytes with soundfile: ds = ds.cast_column("audio", Audio(decode=False)) row = ds[0] ``` ## License and attribution Source audio and transcripts from the [AMI Meeting Corpus](https://www.idiap.ch/en/dataset/ami), distributed under **CC-BY 4.0**. This derived test set inherits the same licence. If you use this set, please cite the AMI corpus: ``` @inproceedings{carletta2006ami, title={The AMI Meeting Corpus: A Pre-announcement}, author={Carletta, Jean and others}, booktitle={Machine Learning for Multimodal Interaction}, year={2006} } ```

--- 许可证:CC-BY-4.0 任务类别: - 自动语音识别(automatic-speech-recognition) 语言: - 英语(en) 标签: - 多说话人(multi-speaker) - 会议(meeting) - AMI - 基准测试集(benchmark) 数据规模类别: - n<1K 源数据集: - edinburghcstr/ami --- # AMI 双说话人测试集 **需要适配您领域的语音模型?** Trelis 可为专业垂直领域(法律、医疗、金融、建筑)及低资源语言定制自动语音识别(ASR)、文本转语音(TTS)及语音智能体(AI Agent)流水线。[咨询或预约服务 →](https://trelis.com/voice-ai-services/) 本数据集为包含50个片段的基准测试集,用于双说话人重叠语音识别任务,源自[AMI会议语料库(AMI Meeting Corpus)]的测试划分集。每个片段时长为8至28秒,为真实会议对话音频重构的双说话人虚拟会议内容,每位说话人均配有独立的基准真值转录文本(ground-truth transcripts)。 ## 构建流程 1. 提取**AMI头戴式独立麦克风(Individual Headset Microphone, IHM)**测试划分数据——每场会议包含4位说话人,配有各自的近距离麦克风音轨以及时间对齐的话语(utterances)。 2. 针对每场会议,选取总讲话时长排名前2的说话人。 3. 通过将每位说话人的IHM话语片段放置在其真实的`begin_time`(起始时间)/`end_time`(结束时间)位置,重构该说话人整场会议时长的音轨。 4. 将两条音轨混音,得到具备**真实对话节奏、真实重叠模式与真实声学响度**的双说话人“虚拟会议”。每场会议中其余2位说话人的数据均被舍弃。 5. 在时间轴上滑动非重叠的8至28秒窗口,需严格遵循话语边界(窗口不得在单词中途截断话语),且仅保留两位预选说话人均有发声的窗口。 6. 对转录文本应用确定性文本归一化处理:将拼写展开的缩略词(如`"X. M. L." → "XML"`)合并、添加标点、转为句首大写格式,并保留言语不流利现象(disfluencies)。 7. 以随机种子42采样得到50个窗口片段。 未添加额外噪声、混响或说话人混音操作——声学内容完全保留原始会议的录制状态。 ## 数据 Schema | 列名 | 描述 | |---|---| | `audio` | 16kHz单声道WAV格式,时长8-28秒 | | `speaker1_text` / `speaker2_text` | 每位说话人的参考转录文本(已添加大小写与标点) | | `speaker1_target` / `speaker2_target` | 采用Whisper风格时间戳Token的参考文本:`<|start|> 文本<|end|>` | | `speaker1_start` / `speaker1_end` | 片段中说话人1发声的起始/结束时间(单位:秒) | | `speaker2_start` / `speaker2_end` | 说话人2的对应参数 | | `overlap_ratio` | 较短说话人的发声时长与另一位说话人重叠部分的占比(0.0 = 依次发言,1.0 = 完全同时发声) | | `loudness_db` | 固定为0.0 —— 未进行响度调整 | ## 统计信息 - 共50个片段,源自16场AMI测试会议(4个会议组 × 4个会话) - 重叠比范围为0.0至约0.9,中位数约为0.3 - 说话人对与AMI训练/验证划分中的数据完全无交集 ## 基准模型 [`Trelis/Chorus-v1`](https://huggingface.co/Trelis/Chorus-v1): 两位说话人的平均字符错误率(Character Error Rate, CER)为9.35%,合并多词错误率(Combined Multi-word Error Rate, CMER)为9.00%。每行的预测结果可在[`Trelis/chorus-v1-ami-2speaker-test-preds`](https://huggingface.co/datasets/Trelis/chorus-v1-ami-2speaker-test-preds)获取。 ## 使用示例 python from datasets import load_dataset, Audio ds = load_dataset("Trelis/ami-2speaker-test", split="train") # 若未安装torchcodec,可使用decode=False参数并通过soundfile读取字节数据: ds = ds.cast_column("audio", Audio(decode=False)) row = ds[0] ## 许可证与引用说明 原始音频与转录文本源自[AMI会议语料库(AMI Meeting Corpus)](https://www.idiap.ch/en/dataset/ami),采用**CC-BY 4.0**许可证分发。本衍生测试集沿用相同许可协议。若使用本测试集,请引用AMI语料库: @inproceedings{carletta2006ami, title={The AMI Meeting Corpus: A Pre-announcement}, author={Carletta, Jean and others}, booktitle={Machine Learning for Multimodal Interaction}, year={2006} }
提供机构:
Trelis
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作