five

CSR-I (WSJ0) Complete

收藏
Mendeley Data2024-01-31 更新2024-06-28 收录
下载链接:
https://catalog.ldc.upenn.edu/LDC93S6A
下载链接
链接失效反馈
官方服务:
资源简介:
Introduction LDC93S6A - Complete CSR-I corpus LDC93S6B - CSR-I Sennheiser speech LDC93S6C - CSR-I other speech During 1991, the DARPA Spoken Language Program initiated efforts to build a new corpus to support research on large-vocabulary Continuous Speech Recognition (CSR) systems. The first two CSR Corpora consist primarily of read speech with texts drawn from a machine-readable corpus of Wall Street Journal news text and are thus often known as WSJ0 and WSJ1. (Later sections of the CSR set of corpora, however, will consist of read texts from other sources of North American business news and eventually from other news domains). The texts to be read were selected to fall within either a 5,000-word or a 20,000-word subset of the WSJ text corpus. (See the documentation for details). Some spontaneous dictation is included in addition to the read speech. The dictation portion was collected using journalists who dictated hypothetical news articles. Two microphones are used throughout: a close-talking Sennheiser HMD414 and a secondary microphone, which may vary. The corpora are thus offered in three configurations: the speech from the Sennheiser, the speech from the other microphone and the speech from both; all three sets include all transcriptions, tests, documentation, etc. In general, transcriptions of the speech, test data from ARPA evaluations, scores achieved by various speech recognition systems and software used in scoring are included on separate discs from the waveform data. Samples Please listen to this audio sample. Portions © 1987-1989 Dow Jones & Company, Inc., © 1992, 1993 Trustees of the University of Pennsylvania
创建时间:
2024-01-31
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
CSR-I (WSJ0) Complete 是一个用于大规模词汇连续语音识别研究的英语语音数据集,包含约141小时的语音录音,来自123名说话者阅读《华尔街日报》摘录,并附带完整的正交转录和语言模型。数据采用双麦克风录制,音频格式为单通道、16位、16 kHz,旨在支持语音识别系统的开发和评估。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作