Corpus of English and Nigerian Pidgin Code-switching (CENCOS)
收藏NIAID Data Ecosystem2026-03-14 收录
下载链接:
https://zenodo.org/record/7314015
下载链接
链接失效反馈官方服务:
资源简介:
This dataset was compiled from a fieldwork in Nigeria in 2019. It features naturally occurring spoken conversations from educated speakers of English and Nigerian Pidgin in Nigeria, with very few conversations involving uneducated speakers. Nigeria is a multi-lingual nation with over 500 languages that are not mutually intelligible. English and Nigerian Pidgin serve as lingua francas used to bridge linguistic gaps between speakers whose languages are mutually unintelligible. English is used in both formal and informal settings, but Nigerian Pidgin is used only in informal settings. Nigerian Pidgin was formerly regarded as the language of the uneducated in Nigeria. Over time, it has developed into a language spoken not only by the uneducated, but also by the educated in Nigeria. The compilation of this corpus is an effort to understand how the educated speakers with the knowledge of both languages are able to use them in interactions. This corpus contains both sound and text files, but the sound files are not included here for data protection reasons. The sound files are manually transcribed into texts, amounting to over 100, 000 word tokens. It contains no annotation other than the speakers. An excel sheet containing speakers’ basic information like gender, age, ethnic group and education status is included. With these social factors, this corpus is useful for any form of investigation on the use of English and Nigerian Pidgin in Nigeria.
创建时间:
2022-11-12



