TuGeBiC
收藏arXiv2022-05-02 更新2024-06-21 收录
下载链接:
https://github.com/ozlemcek/TuGeBiC
下载链接
链接失效反馈官方服务:
资源简介:
TuGeBiC是一个土耳其-德国双语代码切换语料库,由斯图加特大学的自然语言处理研究所创建。该数据集包含1990年代初期在德国和土耳其录制的土耳其-德国双语者的自发语音样本,共计25个文件。数据集通过手动分词和标准化处理,所有专有名词已被替换为假名,以保护参与者隐私。数据集主要用于研究代码切换现象,特别是在双语社区中语言切换的模式和频率。TuGeBiC的创建旨在为语言学家、心理学家和计算语言学家提供一个研究平台,以探索双语交流中的语言切换行为及其对语言处理的影响。
TuGeBiC is a Turkish-German bilingual code-switching corpus created by the Institute of Natural Language Processing at the University of Stuttgart. This dataset includes 25 files of spontaneous speech samples from Turkish-German bilingual speakers recorded in Germany and Turkey in the early 1990s. The dataset has been processed via manual word segmentation and standardization, with all proper nouns replaced by pseudonyms to protect participant privacy. It is primarily used to research code-switching phenomena, especially the patterns and frequencies of language switching in bilingual communities. The creation of TuGeBiC aims to provide a research platform for linguists, psychologists and computational linguists to explore language switching behaviors in bilingual communication and their impacts on language processing.
提供机构:
自然语言处理研究所(IMS),斯图加特大学
创建时间:
2022-05-02



