five

LATIC: A Non-native Pre-labelled Mandarin Chinese Validation Corpus for Automatic Speech Scoring and Evaluation Task

收藏
DataCite Commons2021-05-25 更新2025-04-16 收录
下载链接:
https://ieee-dataport.org/open-access/latic-non-native-pre-labelled-mandarin-chinese-validation-corpus-automatic-speech
下载链接
链接失效反馈
官方服务:
资源简介:
LATIC is focusing on non-native Mandarin Chinese learners. It is an annotated non-native speech database for Chinese, which is fully open-source can get online for any purpose use. The related using area can be automatic speech scoring, evaluation, derivation—L2 teaching, Education of Chinese as Foreign Language, etc. We are aiming to provide a relatively small-scale and highly efficient training deviation dataset. For this target, four chosen non-native Chinese speaker participated in this project, and their mother tongue (L1s) varies from Russian, Korean, French and Arabic. It outputs a 1-hour testing audio file (valid recording) for each tester, which has 4 hours of materials. We intend to expand the scale of our current database continuously in the future as well.
提供机构:
IEEE DataPort
创建时间:
2021-05-25
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
LATIC是一个面向非母语中文学习者的开源、带标注的普通话语音验证语料库,主要用于自动语音评分与评估任务。该数据集目前包含4名不同母语背景(俄语、韩语、法语、阿拉伯语)的参与者,提供了总计4小时(2579个音频样本,平均时长9-10秒)的语音材料,并包含三个层次的详细标注。
以上内容由遇见数据集搜集并总结生成
二维码
社区交流群
二维码
科研交流群
商业服务