HyPoradise
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/Hypotheses-Paradise/Hypo2Trans
下载链接
链接失效反馈官方服务:
资源简介:
该数据集名为HyPoradise,旨在评估模型在不同领域纠正自动语音识别(ASR)错误的能力。它由八个不同的子集组成,涵盖了各种类型的口语数据。每个子集都包含了训练和测试数据,这为全面评估模型在不同ASR环境下的表现提供了可能。这些子集包括WSJ、ATIS、CHiME-4、Tedlium-3、CV-accent、SwitchBoard、LRS2和CORAAL。该数据集的任务是进行自动语音识别后的错误校正。
The dataset, named HyPoradise, is designed to evaluate models' capability of correcting automatic speech recognition (ASR) errors across diverse domains. It consists of eight distinct subsets covering various types of spoken language data. Each subset contains both training and test data, which enables comprehensive assessment of model performance under different ASR scenarios. These subsets include WSJ, ATIS, CHiME-4, Tedlium-3, CV-accent, SwitchBoard, LRS2, and CORAAL. The core task of this dataset is post-automatic speech recognition error correction.



