five

CALLHOME American English Lexicon (PRONLEX)

收藏
DataCite Commons2021-07-01 更新2024-07-13 收录
下载链接:
https://catalog.ldc.upenn.edu/LDC97L20
下载链接
链接失效反馈
官方服务:
资源简介:
<h3>Introduction</h3><br> <p>The <a href="../../../Catalog/docs/LDC97T14/index.html" rel="nofollow">CALLHOME English</a> collection includes a lexical component. The CALLHOME American English Lexicon was originally distributed under the name COMLEX Pronouncing Lexicon, or PRONLEX. Organizations that have already received PRONLEX will not be required to purchase the CALLHOME American English Lexicon.</p><br> <h3>Data</h3><br> <p>The latest version of PRONLEX contains 90,988 lexical entries and includes coverage of WSJ30, WSJ64, Switchboard and CALLHOME English. (WSJ30K and WSJ64K are word lists selected from several years of Wall Street Journal texts used in recent ARPA Continuous Speech Recognition corpora. Switchboard is a three million word corpus of telephone conversations on a variety of topics.)</p><br> <p>The PRONLEX documentation describes the principles observed for word transcription. Although predictable variation in pronunciation due to dialect or variable reduction has not been notated in the lexicon itself, the documentation notes systematic dialectal variants, which may be generated by rule. In addition, alternate pronunciations are given for words whose pronunciation varies by part of speech (e.g., abstrAct, Abstract), or in less systematic but salient ways (especially names). Classes of exceptions to the transcription principles, such as names, function, words and foreign words, are tagged.</p><br> <p>Here is a sample <a href="desc/addenda/LDC97L20_eg.gif" rel="nofollow">page</a>. The transcripts and documentation (<a href="http://catalog.ldc.upenn.edu/LDC97T14" rel="nofollow">LDC97T14</a>) are available, as well as a corpus of telephone speech (<a href="http://catalog.ldc.upenn.edu/LDC97S42" rel="nofollow">LDC97S42</a>).</p><br> <h3>Updates</h3><br> <p>There are no updates at this time.</p></br> Portions © 1994-1997 Trustees of the University of Pennsylvania
提供机构:
Linguistic Data Consortium
创建时间:
2020-11-30
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作