LATIC: A Non-native Pre-labelled Mandarin Chinese Validation Corpus for Automatic Speech Scoring and Evaluation Task

Name: LATIC: A Non-native Pre-labelled Mandarin Chinese Validation Corpus for Automatic Speech Scoring and Evaluation Task
Creator: IEEE DataPort
Published: 2021-05-25 11:55:12
License: 暂无描述

DataCite Commons2021-05-25 更新2025-04-16 收录

下载链接：

https://ieee-dataport.org/open-access/latic-non-native-pre-labelled-mandarin-chinese-validation-corpus-automatic-speech

下载链接

链接失效反馈

官方服务：

资源简介：

LATIC is focusing on non-native Mandarin Chinese learners. It is an annotated non-native speech database for Chinese, which is fully open-source can get online for any purpose use. The related using area can be automatic speech scoring, evaluation, derivation—L2 teaching, Education of Chinese as Foreign Language, etc. We are aiming to provide a relatively small-scale and highly efficient training deviation dataset. For this target, four chosen non-native Chinese speaker participated in this project, and their mother tongue (L1s) varies from Russian, Korean, French and Arabic. It outputs a 1-hour testing audio file (valid recording) for each tester, which has 4 hours of materials. We intend to expand the scale of our current database continuously in the future as well.

提供机构：

IEEE DataPort

创建时间：

2021-05-25

搜集汇总

数据集介绍

背景与挑战

背景概述

LATIC是一个面向非母语中文学习者的开源、带标注的普通话语音验证语料库，主要用于自动语音评分与评估任务。该数据集目前包含4名不同母语背景（俄语、韩语、法语、阿拉伯语）的参与者，提供了总计4小时（2579个音频样本，平均时长9-10秒）的语音材料，并包含三个层次的详细标注。

以上内容由遇见数据集搜集并总结生成