five

Corpus of English and Nigerian Pidgin Code-switching (CENCOS)

收藏
NIAID Data Ecosystem2026-03-14 收录
下载链接:
https://zenodo.org/record/7314015
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset was compiled from a fieldwork in Nigeria in 2019. It features naturally occurring spoken conversations from educated speakers of English and Nigerian Pidgin in Nigeria, with very few conversations involving uneducated speakers. Nigeria is a multi-lingual nation with over 500 languages that are not mutually intelligible. English and Nigerian Pidgin serve as lingua francas used to bridge linguistic gaps between speakers whose languages are mutually unintelligible. English is used in both formal and informal settings, but Nigerian Pidgin is used only in informal settings. Nigerian Pidgin was formerly regarded as the language of the uneducated in Nigeria.  Over time, it has developed into a language spoken not only by the uneducated, but also by the educated in Nigeria. The compilation of this corpus is an effort to understand how the educated speakers with the knowledge of both languages are able to use them in interactions. This corpus contains both sound and text files, but the sound files are not included here for data protection reasons. The sound files are manually transcribed into texts, amounting to over 100, 000 word tokens.  It contains no annotation other than the speakers. An excel sheet containing speakers’ basic information like gender, age, ethnic group and education status is included. With these social factors, this corpus is useful for any form of investigation on the use of English and Nigerian Pidgin in Nigeria.
创建时间:
2022-11-12
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作