DATA2
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/espnet/espnet/tree/master
下载链接
链接失效反馈官方服务:
资源简介:
该数据集专为端到端的口语命名实体识别(NER)任务设计,包含70,763个语音转文本的配对样本,其中标注了实体词或短语。数据集被随机划分为2,000个验证样本、2,000个测试样本,剩余部分用于训练。规模上,该数据集共有70,763个配对样本,任务专注于口语命名实体识别(NER)。
This dataset is specifically designed for end-to-end spoken named entity recognition (NER) tasks, containing 70,763 speech-to-text paired samples with annotated entity words or phrases. It is randomly split into 2,000 validation samples, 2,000 test samples, with the remaining portion used for training. In terms of scale, this dataset includes a total of 70,763 paired samples, and the task focuses on spoken named entity recognition (NER).



