five

The CLES corpus of spontaneous L2 English

收藏
DataCite Commons2026-02-11 更新2026-05-04 收录
下载链接:
https://www.ortolang.fr/market/item/cles-spontaneous-english/v2
下载链接
链接失效反馈
官方服务:
资源简介:
The CLES corpus of spontaneous L2 English comprises recordings of French university students engaging in a 10-minute role-play where two or three candidates delve into an argumentative discussion on contentious topics. Each of the candidates is given a separate role and the objective of the role play is to come to a final agreement at the end of the 10 minute oral. These recordings were made during the oral interaction task of the CLES English certification exam (CLES B2). Each participant is assessed by a professional rater on eight dimensions related to oral production at the B2 level: positioning and negotiation skills, relevance and variety of arguments, interaction aptitude, fluency, phonetic accuracy, coherence, grammatical precision, and lexical diversity and appropriateness. Failure to meet any of these criteria results in a validation at the B1 level, or no validation if proficiency falls below the threshold. Candidates are ultimately classified as B2, B1, or non-validated based on their performance.Automatic Annotations of the CorpusEach recording comes with a TextGrid file with speaker identification. This annotation was done automatically using the Pyannote Speaker Diarization Toolkit, then manually checked.Moreover, speech segments from the TextGrid file were automatically annotated using the Pause and Lexical Stress Processing Pipeline (PLSPP), which includes:automated speech recognition and word-level alignment (WhisperX), syllable nuclei detection (De Jong et al. 2021), part-of-speech tagging (Spacy), constituency analysis (Berkeley Neural Parser), pause position analysis, polysyllabic words' lexical stress annotation.ContentsThe public version of the corpus currently comprises 10 hours of speech (128 speakers, 62 recordings).See the recordings.csv and speakers.csv metadata files for more information.More recordings may be added in the future.Corpus AccessThis corpus is available for academic research purposes only. Access is restricted because part of the papers are still in used for CLES exams. Please request access by contacting coordination-nationale@certification-cles.fr. Once accessing the corpus, please refrain from sharing the data with other researchers.Related ResourcesA similar corpus with native speakers of English is released here.A similar corpus with Japanese-L1 speakers of English is released here.
提供机构:
ORTOLANG (Open Resources and TOols for LANGuage) - www.ortolang.fr
创建时间:
2026-02-11
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作