five

Texas German Sample Corpus

收藏
DataCite Commons2026-03-27 更新2026-05-05 收录
下载链接:
https://dataverse.tdl.org/citation?persistentId=doi:10.18738/T8/IOX9ZA
下载链接
链接失效反馈
官方服务:
资源简介:
The Texas German Sample Corpus (TGSC) is a collection of annotated transcripts of spoken Texas German (~13.5 hours, 75,000+ tokens). The TGSC was created to implement and test the language-tagging and normalization guidelines as proposed in Blevins (2022). Texas German is a set of mixed-language contact varieties of German "spoken in Texas which have descended from the dialects of German brought to Texas in the 19th century" by German-speaking immigrants (Boas 2009: 34)." The TGSC is a collection of audio recordings from the Texas German Dialect Archive (TGDA, tgdp.org/dialect-archive) with the following annotation layers: original TGDA literary transcription, tokenization, language tags, normalization, standard German utterance translation, and the original TGDA word-for-word English translation. By using the Texas German Sample Corpus (TGSC) database, you agree to the "User Rights and Responsibilities" in accordance with the specifications on https://tgdp.org/dialect-archive/ . Please cite the following works: - For the TGSC: Blevins (2022) The language-tagging & orthographic normalization of spoken mixed-language data, with a focus on Texas German (https://hdl.handle.net/2152/116703) - For the TGDA / TGDP (where the source material for the TGSC came from): Boas, Hans C., Marc Pierce, Karen Roesch, Guido Halder, and Hunter Weilbacher. (2010). The Texas German Dialect Archive: A Multimedia Resource for Research, Teaching, and Outreach. Journal of Germanic Linguistics, 22(3), 277-296.
提供机构:
Texas Data Repository
创建时间:
2022-08-09
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作