Santa Barbara Corpus of Spoken American English Part IV

Name: Santa Barbara Corpus of Spoken American English Part IV
Creator: Linguistic Data Consortium
Published: 2021-07-01 16:17:24
License: 暂无描述

DataCite Commons2021-07-01 更新2025-04-16 收录

下载链接：

https://catalog.ldc.upenn.edu/LDC2005S25

下载链接

链接失效反馈

官方服务：

资源简介：

<h3>Introduction</h3><br> <p>Santa Barbara Corpus of Spoken American English Part IV was produced by Linguistic Data Consortium (LDC) catalog number LDC2005S25 and ISBN 158563-348-8.</p><br> <p>Santa Barbara Corpus of Spoken American English Part IV is based on hundreds of recordings of natural speech from all over the United States, representing a wide variety of people of different regional origins, ages, occupations, and ethnic and social backgrounds. It reflects many ways that people use language in their lives: conversation, gossip, arguments, on-the-job talk, card games, city council meetings, sales pitches, classroom lectures, political speeches, bedtime stories, sermons, weddings, and more.</p><br> <p>The corpus was collected by: University of California, Santa Barbara Center for the Study of Discourse (Director: John W. Du Bois (UCSB), Authors: John W. Du Bois and Robert Englebretson. Associate Editors: Wallace L. Chafe (UCSB), Charles Meyer (UMass, Boston), and Sandra A. Thompson (UCSB)).</p><br> <p>For software and additional data resources, please refer to the following sites: <a href="http://www.talkbank.org/resources/software.html" rel="nofollow">TalkBank</a>, <a href="http://www.ucl.ac.uk/english-usage/ice/index.htm" rel="nofollow">International Corpus of English</a>.</p><br> <p>Part I of the Santa Barbara Corpus of Spoken American English is available as <a href="http://catalog.ldc.upenn.edu/LDC2000S85" rel="nofollow">LDC2000S85</a>.</p><br> <p>Part II of the Santa Barbara Corpus of Spoken American English is available as <a href="http://catalog.ldc.upenn.edu/LDC2003S06" rel="nofollow">LDC2003S06</a>.</p><br> <p>Part III of the Santa Barbara Corpus of Spoken American English is available as <a href="http://catalog.ldc.upenn.edu/LDC2004S10" rel="nofollow">LDC2003S10</a>.</p><br> <h3>Data</h3><br> <p>The audio data consists of 14 wave format speech files, recorded in two-channel pcm, at 22050Hz. The speech files total 5.75 hours of audio (1.5 GB), representing over 58,000 words and over 6,000 unique words in the transcribed text.</p><br> <h3>Samples</h3><br> <p>For an example of this corpus, please examine this <a href="desc/addenda/LDC2005S25.wav" rel="nofollow">audio sample</a> and its <a href="desc/addenda/LDC2005S25.txt" rel="nofollow">transcript</a>.</p><br> <h3>Acknowledgements</h3><br> <p>The completion and release of this corpus was facilitated by funding extended by the TalkBank Project. TalkBank is an interdisciplinary research project funded by a five-year grant (BCS-998009, KDI, SBE) from the National Science Foundation to Carnegie Mellon University and the University of Pennsylvania.</p></br> Portions © 2003 University of California, © 2003 Trustees of the University of Pennsylvania

提供机构：

Linguistic Data Consortium

创建时间：

2020-11-30

5,000+

优质数据集

54 个

任务类型

进入经典数据集