Switchboard Corpus with Word Importance Annotation
收藏arXiv2019-07-17 更新2024-06-21 收录
下载链接:
http://latlab.ist.rit.edu/lrec2018
下载链接
链接失效反馈官方服务:
资源简介:
本数据集名为Switchboard Corpus with Word Importance Annotation,由罗切斯特理工学院创建,旨在通过人工标注对话转录文本中的单词重要性,辅助开发适用于聋人或听力障碍人士的实时语音识别系统。数据集包含约25,000个经过标注的单词,每个单词都附有一个数值分数,表示其在对话轮次中的重要性。数据集的创建过程涉及对原始Switchboard语料库的转录文本进行详细分析和人工标注。该数据集主要应用于自动语音识别系统的性能评估,特别是用于改进语音识别输出文本的可读性和实用性。
Switchboard Corpus with Word Importance Annotation is a dataset developed by Rochester Institute of Technology. It is intended to manually annotate word importance in conversational transcripts, thereby aiding the development of real-time speech recognition systems for deaf and hard-of-hearing individuals. The dataset contains roughly 25,000 annotated words, each paired with a numerical score that denotes its significance within the corresponding conversational turn. The construction of this dataset entails detailed analysis and manual annotation of the transcripts from the original Switchboard Corpus. Primarily, this dataset is utilized for performance evaluation of automatic speech recognition (ASR) systems, particularly to enhance the readability and practicality of speech recognition output texts.
提供机构:
罗切斯特理工学院
创建时间:
2018-01-30



