Trelis/ami-2speaker-test
收藏Hugging Face2026-04-18 更新2026-05-10 收录
下载链接:
https://hf-mirror.com/datasets/Trelis/ami-2speaker-test
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-4.0
task_categories:
- automatic-speech-recognition
language:
- en
tags:
- multi-speaker
- meeting
- ami
- benchmark
size_categories:
- n<1K
source_datasets:
- edinburghcstr/ami
---
# AMI 2-Speaker Test Set
**Need a voice model for your domain?** Trelis builds custom ASR, TTS, and voice agent pipelines for specialist verticals (legal, medical, finance, construction) and low-resource languages. [Enquire or book a consultation →](https://trelis.com/voice-ai-services/)
A 50-clip benchmark for 2-speaker overlapping speech recognition, derived from the [AMI Meeting Corpus](https://huggingface.co/datasets/edinburghcstr/ami) test split.
Each clip is 8–28 seconds of real conversational meeting audio reconstructed as a 2-speaker virtual meeting, with separate ground-truth transcripts for each speaker.
## How it was built
1. Stream the **AMI IHM (Individual Headset Microphone)** test split — each meeting has 4 speakers with separate close-mic tracks and time-aligned utterances.
2. For each meeting, pick the **top-2 speakers by total talk time**.
3. Reconstruct each speaker's meeting-length audio track by placing their IHM utterances at their real `begin_time`/`end_time` positions.
4. Sum the two tracks → a 2-speaker "virtual meeting" with **real conversational rhythm, real overlap patterns, and real acoustic levels**. The other 2 speakers in each meeting are dropped.
5. Slide non-overlapping 8–28-second windows over the timeline, respecting utterance boundaries (a window cannot cut an utterance mid-word) and keeping only windows where both picked speakers are present.
6. Apply deterministic text normalisation to the transcripts (collapse spelled-out acronyms like `"X. M. L." → "XML"`, add punctuation, sentence-case, preserve disfluencies).
7. Sample 50 windows with seed 42.
No additional noise, reverberation, or speaker remixing is added — the acoustic content is as-recorded in the original meetings.
## Schema
| Column | Description |
|---|---|
| `audio` | 16kHz mono WAV, 8-28s |
| `speaker1_text` / `speaker2_text` | Reference transcript per speaker (cased + punctuated) |
| `speaker1_target` / `speaker2_target` | Reference with Whisper-style timestamp tokens: `<\|start\|> text<\|end\|>` |
| `speaker1_start` / `speaker1_end` | Start/end time of speaker 1's speech within the clip (seconds) |
| `speaker2_start` / `speaker2_end` | Same for speaker 2 |
| `overlap_ratio` | Fraction of the shorter speaker's talk time that overlaps with the other (0.0 = sequential, 1.0 = full simultaneity) |
| `loudness_db` | Always 0.0 — no loudness manipulation |
## Statistics
- 50 clips, from 16 AMI test meetings (4 meeting groups × 4 sessions)
- Overlap ratios range 0.0 to ~0.9, median ~0.3
- Speaker pairs are fully disjoint from the AMI train / validation splits
## Baselines
[`Trelis/Chorus-v1`](https://huggingface.co/Trelis/Chorus-v1): CER 9.35% / CMER 9.00% mean across both speakers. Per-row predictions at [`Trelis/chorus-v1-ami-2speaker-test-preds`](https://huggingface.co/datasets/Trelis/chorus-v1-ami-2speaker-test-preds).
## Usage
```python
from datasets import load_dataset, Audio
ds = load_dataset("Trelis/ami-2speaker-test", split="train")
# If torchcodec is not installed, use decode=False and read bytes with soundfile:
ds = ds.cast_column("audio", Audio(decode=False))
row = ds[0]
```
## License and attribution
Source audio and transcripts from the [AMI Meeting Corpus](https://www.idiap.ch/en/dataset/ami), distributed under **CC-BY 4.0**. This derived test set inherits the same licence. If you use this set, please cite the AMI corpus:
```
@inproceedings{carletta2006ami,
title={The AMI Meeting Corpus: A Pre-announcement},
author={Carletta, Jean and others},
booktitle={Machine Learning for Multimodal Interaction},
year={2006}
}
```
---
许可证:CC-BY-4.0
任务类别:
- 自动语音识别(automatic-speech-recognition)
语言:
- 英语(en)
标签:
- 多说话人(multi-speaker)
- 会议(meeting)
- AMI
- 基准测试集(benchmark)
数据规模类别:
- n<1K
源数据集:
- edinburghcstr/ami
---
# AMI 双说话人测试集
**需要适配您领域的语音模型?** Trelis 可为专业垂直领域(法律、医疗、金融、建筑)及低资源语言定制自动语音识别(ASR)、文本转语音(TTS)及语音智能体(AI Agent)流水线。[咨询或预约服务 →](https://trelis.com/voice-ai-services/)
本数据集为包含50个片段的基准测试集,用于双说话人重叠语音识别任务,源自[AMI会议语料库(AMI Meeting Corpus)]的测试划分集。每个片段时长为8至28秒,为真实会议对话音频重构的双说话人虚拟会议内容,每位说话人均配有独立的基准真值转录文本(ground-truth transcripts)。
## 构建流程
1. 提取**AMI头戴式独立麦克风(Individual Headset Microphone, IHM)**测试划分数据——每场会议包含4位说话人,配有各自的近距离麦克风音轨以及时间对齐的话语(utterances)。
2. 针对每场会议,选取总讲话时长排名前2的说话人。
3. 通过将每位说话人的IHM话语片段放置在其真实的`begin_time`(起始时间)/`end_time`(结束时间)位置,重构该说话人整场会议时长的音轨。
4. 将两条音轨混音,得到具备**真实对话节奏、真实重叠模式与真实声学响度**的双说话人“虚拟会议”。每场会议中其余2位说话人的数据均被舍弃。
5. 在时间轴上滑动非重叠的8至28秒窗口,需严格遵循话语边界(窗口不得在单词中途截断话语),且仅保留两位预选说话人均有发声的窗口。
6. 对转录文本应用确定性文本归一化处理:将拼写展开的缩略词(如`"X. M. L." → "XML"`)合并、添加标点、转为句首大写格式,并保留言语不流利现象(disfluencies)。
7. 以随机种子42采样得到50个窗口片段。
未添加额外噪声、混响或说话人混音操作——声学内容完全保留原始会议的录制状态。
## 数据 Schema
| 列名 | 描述 |
|---|---|
| `audio` | 16kHz单声道WAV格式,时长8-28秒 |
| `speaker1_text` / `speaker2_text` | 每位说话人的参考转录文本(已添加大小写与标点) |
| `speaker1_target` / `speaker2_target` | 采用Whisper风格时间戳Token的参考文本:`<|start|> 文本<|end|>` |
| `speaker1_start` / `speaker1_end` | 片段中说话人1发声的起始/结束时间(单位:秒) |
| `speaker2_start` / `speaker2_end` | 说话人2的对应参数 |
| `overlap_ratio` | 较短说话人的发声时长与另一位说话人重叠部分的占比(0.0 = 依次发言,1.0 = 完全同时发声) |
| `loudness_db` | 固定为0.0 —— 未进行响度调整 |
## 统计信息
- 共50个片段,源自16场AMI测试会议(4个会议组 × 4个会话)
- 重叠比范围为0.0至约0.9,中位数约为0.3
- 说话人对与AMI训练/验证划分中的数据完全无交集
## 基准模型
[`Trelis/Chorus-v1`](https://huggingface.co/Trelis/Chorus-v1): 两位说话人的平均字符错误率(Character Error Rate, CER)为9.35%,合并多词错误率(Combined Multi-word Error Rate, CMER)为9.00%。每行的预测结果可在[`Trelis/chorus-v1-ami-2speaker-test-preds`](https://huggingface.co/datasets/Trelis/chorus-v1-ami-2speaker-test-preds)获取。
## 使用示例
python
from datasets import load_dataset, Audio
ds = load_dataset("Trelis/ami-2speaker-test", split="train")
# 若未安装torchcodec,可使用decode=False参数并通过soundfile读取字节数据:
ds = ds.cast_column("audio", Audio(decode=False))
row = ds[0]
## 许可证与引用说明
原始音频与转录文本源自[AMI会议语料库(AMI Meeting Corpus)](https://www.idiap.ch/en/dataset/ami),采用**CC-BY 4.0**许可证分发。本衍生测试集沿用相同许可协议。若使用本测试集,请引用AMI语料库:
@inproceedings{carletta2006ami,
title={The AMI Meeting Corpus: A Pre-announcement},
author={Carletta, Jean and others},
booktitle={Machine Learning for Multimodal Interaction},
year={2006}
}
提供机构:
Trelis



