The CLES corpus of spontaneous L2 English
收藏DataCite Commons2026-02-11 更新2026-05-04 收录
下载链接:
https://www.ortolang.fr/market/item/cles-spontaneous-english/v2
下载链接
链接失效反馈官方服务:
资源简介:
The CLES corpus of spontaneous L2 English comprises recordings of
French university students engaging in a 10-minute role-play where two
or three candidates delve into an argumentative discussion on
contentious topics. Each of the candidates is given a separate role and the objective of the role play is to come to a final agreement at the end of the 10 minute oral. These recordings were made during the oral
interaction task of the CLES English certification exam
(CLES B2). Each participant is assessed by a professional rater on
eight dimensions related to oral production at the B2 level: positioning
and negotiation skills, relevance and variety of arguments, interaction
aptitude, fluency, phonetic accuracy, coherence, grammatical precision,
and lexical diversity and appropriateness. Failure to meet any of
these criteria results in a validation at the B1 level, or no validation
if proficiency falls below the threshold. Candidates are ultimately
classified as B2, B1, or non-validated based on their performance.Automatic Annotations of the CorpusEach recording comes with a TextGrid file with speaker identification. This annotation was done automatically using the Pyannote Speaker Diarization Toolkit, then manually checked.Moreover, speech segments from the TextGrid file were automatically annotated using the Pause and Lexical Stress Processing Pipeline (PLSPP), which includes:automated speech recognition and word-level alignment (WhisperX), syllable nuclei detection (De Jong et al. 2021), part-of-speech tagging (Spacy), constituency analysis (Berkeley Neural Parser), pause position analysis, polysyllabic words' lexical stress annotation.ContentsThe public version of the corpus currently comprises 10 hours of speech (128 speakers, 62 recordings).See the recordings.csv and speakers.csv metadata files for more information.More recordings may be added in the future.Corpus AccessThis corpus is available for academic research purposes only. Access is restricted because part of the papers are still in used for CLES exams. Please request access by contacting coordination-nationale@certification-cles.fr. Once accessing the corpus, please refrain from sharing the data with other researchers.Related ResourcesA similar corpus with native speakers of English is released here.A similar corpus with Japanese-L1 speakers of English is released here.
提供机构:
ORTOLANG (Open Resources and TOols for LANGuage) - www.ortolang.fr
创建时间:
2026-02-11



