Earnings-21
收藏arXiv2021-06-16 更新2024-06-21 收录
下载链接:
https://github.com/revdotcom/speechdatasets/tree/master/earnings21
下载链接
链接失效反馈官方服务:
资源简介:
Earnings-21是一个包含39小时财务电话会议录音的数据集,由Rev.com创建,旨在为自动语音识别(ASR)系统提供实际的测试基准。该数据集包含来自9个不同金融行业的44个电话会议录音,每个录音都有丰富的注释,包括标点、大小写和命名实体。数据集的创建过程涉及使用Rev.com的人工转录服务进行精确转录,并通过内部NER工具和SpaCy进行实体标注。Earnings-21特别关注命名实体识别,适用于评估ASR系统在真实世界音频中的性能,尤其是在金融领域的应用。
Earnings-21 is a dataset comprising 39 hours of financial earnings conference call recordings, created by Rev.com to serve as a practical test benchmark for automatic speech recognition (ASR) systems. The dataset includes 44 conference call recordings spanning 9 distinct financial industry verticals, with each recording accompanied by comprehensive annotations covering punctuation, capitalization, and named entities. The dataset creation process involved acquiring accurate transcriptions via Rev.com's human transcription service, followed by entity annotation using both internal named entity recognition (NER) tools and SpaCy. Earnings-21 places a particular emphasis on named entity recognition, making it suitable for evaluating the performance of ASR systems in real-world audio contexts, particularly for financial domain applications.
提供机构:
Rev.com
创建时间:
2021-04-23



