richiejp/aec-challenge-16k
收藏Hugging Face2026-04-16 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/richiejp/aec-challenge-16k
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-4.0
task_categories:
- audio-classification
tags:
- speech
- acoustic-echo-cancellation
- aec-challenge
- icassp-2022
pretty_name: Microsoft AEC Challenge 16kHz (FLAC)
---
# Microsoft AEC Challenge 16kHz
[Microsoft AEC Challenge](https://github.com/microsoft/AEC-Challenge) dataset
converted from 16kHz WAV to **FLAC** (lossless compression) and packed into tar shards.
Source: the `datasets/` directory of the microsoft/AEC-Challenge Git LFS repo.
Covers all challenge years (2021, ICASSP 2022, ICASSP 2023).
## Structure
### Real recordings
Paired loopback (far-end reference) and microphone recordings from real devices.
- `real/` — 37,578 files, single playback real recordings
- `real_doubled/` — 10,531 files, double playback real recordings
Filenames preserve the GUID-based naming convention:
`{GUID}_{scenario}_{signal}.flac`
Scenarios: `farend_singletalk`, `farend_singletalk_with_movement`, `nearend_singletalk`,
`doubletalk`, `doubletalk_with_movement`, `sweep`
Signals: `lpb` (loopback/far-end reference), `mic` (microphone recording)
### Synthetic data (10,000 clips)
- `synthetic_echo/` — Echo signal component
- `synthetic_farend/` — Far-end reference signal
- `synthetic_nearend_mic/` — Mixed microphone signal (echo + near-end + noise)
- `synthetic_nearend_speech/` — Clean near-end speech
- `meta.csv` — Synthetic data metadata
### Test sets
- `test_set/` — Original test set (clean + noisy)
- `test_set_icassp2022/` — ICASSP 2022 test set
- `blind_test_set/` — Original blind test set
- `blind_test_set_icassp2022/` — ICASSP 2022 blind test set
- `blind_test_set_icassp2023/` — ICASSP 2023 blind test set
- `blind_test_set_interspeech2021/` — Interspeech 2021 blind test set
## Usage
```python
from huggingface_hub import snapshot_download
import tarfile
from pathlib import Path
# Download
local = snapshot_download("richiejp/aec-challenge-16k", local_dir="/data/aec", repo_type="dataset")
# Extract all shards
for tar_path in sorted(Path(local).rglob("*.tar")):
with tarfile.open(tar_path) as tf:
tf.extractall(tar_path.parent)
```
## Source
Original data from Microsoft's AEC Challenge:
- https://github.com/microsoft/AEC-Challenge
- License: CC-BY-4.0 (see original repo for details)
提供机构:
richiejp



