five

Santa Barbara Corpus of Spoken American English Part IV

收藏
DataCite Commons2021-07-01 更新2025-04-16 收录
下载链接:
https://catalog.ldc.upenn.edu/LDC2005S25
下载链接
链接失效反馈
官方服务:
资源简介:
<h3>Introduction</h3><br> <p>Santa Barbara Corpus of Spoken American English Part IV was produced by Linguistic Data Consortium (LDC) catalog number LDC2005S25 and ISBN 158563-348-8.</p><br> <p>Santa Barbara Corpus of Spoken American English Part IV is based on hundreds of recordings of natural speech from all over the United States, representing a wide variety of people of different regional origins, ages, occupations, and ethnic and social backgrounds. It reflects many ways that people use language in their lives: conversation, gossip, arguments, on-the-job talk, card games, city council meetings, sales pitches, classroom lectures, political speeches, bedtime stories, sermons, weddings, and more.</p><br> <p>The corpus was collected by: University of California, Santa Barbara Center for the Study of Discourse (Director: John W. Du Bois (UCSB), Authors: John W. Du Bois and Robert Englebretson. Associate Editors: Wallace L. Chafe (UCSB), Charles Meyer (UMass, Boston), and Sandra A. Thompson (UCSB)).</p><br> <p>For software and additional data resources, please refer to the following sites: <a href="http://www.talkbank.org/resources/software.html" rel="nofollow">TalkBank</a>, <a href="http://www.ucl.ac.uk/english-usage/ice/index.htm" rel="nofollow">International Corpus of English</a>.</p><br> <p>Part I of the Santa Barbara Corpus of Spoken American English is available as <a href="http://catalog.ldc.upenn.edu/LDC2000S85" rel="nofollow">LDC2000S85</a>.</p><br> <p>Part II of the Santa Barbara Corpus of Spoken American English is available as <a href="http://catalog.ldc.upenn.edu/LDC2003S06" rel="nofollow">LDC2003S06</a>.</p><br> <p>Part III of the Santa Barbara Corpus of Spoken American English is available as <a href="http://catalog.ldc.upenn.edu/LDC2004S10" rel="nofollow">LDC2003S10</a>.</p><br> <h3>Data</h3><br> <p>The audio data consists of 14 wave format speech files, recorded in two-channel pcm, at 22050Hz. The speech files total 5.75 hours of audio (1.5 GB), representing over 58,000 words and over 6,000 unique words in the transcribed text.</p><br> <h3>Samples</h3><br> <p>For an example of this corpus, please examine this <a href="desc/addenda/LDC2005S25.wav" rel="nofollow">audio sample</a> and its <a href="desc/addenda/LDC2005S25.txt" rel="nofollow">transcript</a>.</p><br> <h3>Acknowledgements</h3><br> <p>The completion and release of this corpus was facilitated by funding extended by the TalkBank Project. TalkBank is an interdisciplinary research project funded by a five-year grant (BCS-998009, KDI, SBE) from the National Science Foundation to Carnegie Mellon University and the University of Pennsylvania.</p></br> Portions © 2003 University of California, © 2003 Trustees of the University of Pennsylvania
提供机构:
Linguistic Data Consortium
创建时间:
2020-11-30
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作