Common Voice

Name: Common Voice
Creator: Common Voice
License: 暂无描述

arXiv2025-09-30 收录

下载链接：

https://commonvoice.mozilla.org/en/datasets

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集是一个包含法语语音及其转录文本的口语语料库，用于分析音频表达中的同音异义词和语法一致性问题。此外，该数据集结合了SpaCy生成的依存树，以识别特定句法模板的实例。在规模上，提取了1000对音频与转录文本进行分析。该数据集的任务是针对语音识别中的同音异义词消歧。

This dataset is a spoken corpus comprising French speech utterances and their corresponding transcriptions, intended for the analysis of homophones and grammatical agreement issues in audio expressions. Additionally, this dataset incorporates dependency trees generated by SpaCy to identify instances of specific syntactic templates. With respect to its scale, 1000 pairs of audio recordings and their transcribed texts were extracted for analytical work. The core task of this dataset is homophone disambiguation within the domain of speech recognition.

提供机构：

Common Voice

5,000+

优质数据集

54 个

任务类型

进入经典数据集