NICT-JLE Corpus
收藏arXiv2023-08-05 更新2024-06-21 收录
下载链接:
https://github.com/lucyskidmore/nict-jle
下载链接
链接失效反馈官方服务:
资源简介:
NICT-JLE Corpus是由日本信息通信技术国家研究所创建的一个包含约300小时英语学习者口语能力测试的数据集。该数据集包含1,281名日本英语学习者的录音,涵盖开放对话、角色扮演和图片描述等任务。数据集中的每条记录都包含HTML风格的标签,用于标记编辑词和言语不流畅性,以及学习者的熟练程度、性别和国籍等元数据。NICT-JLE Corpus主要用于研究学习者言语中的不流畅性检测,旨在通过标准化训练和评估集,促进模型的发展和比较。
The NICT-JLE Corpus is a dataset containing approximately 300 hours of spoken English proficiency test recordings, created by the National Institute of Information and Communications Technology (NICT) of Japan. It consists of audio recordings from 1,281 Japanese English learners, covering tasks such as open dialogues, role-plays and image descriptions. Each record in the corpus includes HTML-style tags for marking edited words and speech disfluencies, as well as metadata including learners' proficiency levels, genders and nationalities. The NICT-JLE Corpus is primarily used for research on disfluency detection in learner speech, aiming to facilitate model development and comparison via standardized training and evaluation sets.
提供机构:
谢菲尔德大学
创建时间:
2023-08-05



