WSYue-ASR-eval
收藏魔搭社区2026-01-06 更新2025-09-06 收录
下载链接:
https://modelscope.cn/datasets/ASLP-lab/WSYue-ASR-eval
下载链接
链接失效反馈官方服务:
资源简介:
# WSYue-ASR-eval: Cantonese ASR Benchmark
To address the unique linguistic characteristics of Cantonese in speech recognition, we propose **WSYue-ASR-eval**, a benchmark specifically designed for evaluating Cantonese ASR systems. It is tailored to assess model performance across diverse lengths, domains, and linguistic phenomena of Cantonese speech.
The test set annotations are provided by Beijing AISHELL Technology Co., Ltd.
Key features:
- Annotated through multiple rounds of manual labeling
- Includes rich tags such as text transcription, emotion, age, and gender
- Covers Cantonese-English code-switching and multi-domain conditions
- Enables comprehensive evaluation across varying speech lengths
## WSYue-ASR-eval Subsets
| Set | Duration | Speakers | Hours |
|-------|----------|----------|-------|
| Short | 0–10 s | 2861 | 9.46 |
| Long | 10–30 s | 838 | 1.97 |
Total: 11.4 hours, with diverse speakers and scenarios.
# WSYue-ASR-eval:粤语自动语音识别评测基准
为应对粤语在语音识别任务中独特的语言特性,我们提出**WSYue-ASR-eval**——一款专为粤语自动语音识别(Automatic Speech Recognition,ASR)系统评测打造的基准数据集。该数据集针对粤语语音的不同时长、应用领域与语言现象,定制化设计以全面评估模型性能。
本测试集的标注工作由北京AISHELL科技有限公司(Beijing AISHELL Technology Co., Ltd.)提供。
核心特性:
- 经多轮人工标注流程完成
- 涵盖文本转写、情感、说话人年龄与性别等丰富标注维度
- 覆盖粤英代码转换与多领域应用场景
- 支持针对不同语音时长的全面评测
## WSYue-ASR-eval 数据集子集
| 时长分组 | 时长范围 | 说话人数量 | 总时长(小时) |
|----------|----------|------------|----------------|
| 短时长段 | 0–10秒 | 2861 | 9.46 |
| 长时长段 | 10–30秒 | 838 | 1.97 |
总数据集时长共计11.4小时,涵盖多样化的说话人与应用场景。
提供机构:
maas
创建时间:
2025-09-04



