speechocean762

Name: speechocean762
Creator: maas
Published: 2026-07-09 11:26:02
License: 暂无描述

魔搭社区2026-07-09 更新2025-03-08 收录

下载链接：

https://modelscope.cn/datasets/midasheng/speechocean762

下载链接

链接失效反馈

官方服务：

资源简介：

# speechocean762: A non-native English corpus for pronunciation scoring task ## Introduction Pronunciation scoring is a crucial technology in computer-assisted language learning (CALL) systems. The pronunciation quality scores might be given at phoneme-level, word-level, and sentence-level for a typical pronunciation scoring task. This corpus aims to provide a free public dataset for the pronunciation scoring task. Key features: * It is available for free download for both commercial and non-commercial purposes. * The speaker variety encompasses young children and adults. * The manual annotations are in multiple aspects at sentence-level, word-level and phoneme-level. This corpus consists of 5000 English sentences. All the speakers are non-native, and their mother tongue is Mandarin. Half of the speakers are Children, and the others are adults. The information of age and gender are provided. Five experts made the scores. To avoid subjective bias, each expert scores independently under the same metric. ## Uses ```python >>> from datasets import load_dataset >>> test_set = load_dataset("mispeech/speechocean762", split="test") >>> len(test_set) 2500 >>> next(iter(test_set)) {'accuracy': 9, 'completeness': 10.0, 'fluency': 9, 'prosodic': 9, 'text': 'MARK IS GOING TO SEE ELEPHANT', 'total': 9, 'words': [{'accuracy': 10, 'phones': ['M', 'AA0', 'R', 'K'], 'phones-accuracy': [2.0, 2.0, 1.8, 2.0], 'stress': 10, 'text': 'MARK', 'total': 10, 'mispronunciations': []}, {'accuracy': 10, 'phones': ['IH0', 'Z'], 'phones-accuracy': [2.0, 1.8], 'stress': 10, 'text': 'IS', 'total': 10, 'mispronunciations': []}, {'accuracy': 10, 'phones': ['G', 'OW0', 'IH0', 'NG'], 'phones-accuracy': [2.0, 2.0, 2.0, 2.0], 'stress': 10, 'text': 'GOING', 'total': 10, 'mispronunciations': []}, {'accuracy': 10, 'phones': ['T', 'UW0'], 'phones-accuracy': [2.0, 2.0], 'stress': 10, 'text': 'TO', 'total': 10, 'mispronunciations': []}, {'accuracy': 10, 'phones': ['S', 'IY0'], 'phones-accuracy': [2.0, 2.0], 'stress': 10, 'text': 'SEE', 'total': 10, 'mispronunciations': []}, {'accuracy': 10, 'phones': ['EH1', 'L', 'IH0', 'F', 'AH0', 'N', 'T'], 'phones-accuracy': [2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0], 'stress': 10, 'text': 'ELEPHANT', 'total': 10, 'mispronunciations': []}], 'speaker': '0003', 'gender': 'm', 'age': 6, 'audio': {'path': '000030012.wav', 'array': array([-0.00119019, -0.00500488, -0.00283813, ..., 0.00274658, 0. , 0.00125122]), 'sampling_rate': 16000}} ``` ## The scoring metric The experts score at three levels: phoneme-level, word-level, and sentence-level. ### Sentence level Score the accuracy, fluency, completeness and prosodic at the sentence level. #### Accuracy Score range: 0 - 10 * 9-10: The overall pronunciation of the sentence is excellent, with accurate phonology and no obvious pronunciation mistakes * 7-8: The overall pronunciation of the sentence is good, with a few pronunciation mistakes * 5-6: The overall pronunciation of the sentence is understandable, with many pronunciation mistakes and accent, but it does not affect the understanding of basic meanings * 3-4: Poor, clumsy and rigid pronunciation of the sentence as a whole, with serious pronunciation mistakes * 0-2: Extremely poor pronunciation and only one or two words are recognizable #### Completeness Score range: 0.0 - 1.0 The percentage of the words with good pronunciation. #### Fluency Score range: 0 - 10 * 8-10: Fluent without noticeable pauses or stammering * 6-7: Fluent in general, with a few pauses, repetition, and stammering * 4-5: the speech is a little influent, with many pauses, repetition, and stammering * 0-3: intermittent, very influent speech, with lots of pauses, repetition, and stammering #### Prosodic Score range: 0 - 10 * 9-10: Correct intonation at a stable speaking speed, speak with cadence, and can speak like a native * 7-8: Nearly correct intonation at a stable speaking speed, nearly smooth and coherent, but with little stammering and few pauses * 5-6: Unstable speech speed, many stammering and pauses with a poor sense of rhythm * 3-4: Unstable speech speed, speak too fast or too slow, without the sense of rhythm * 0-2: Poor intonation and lots of stammering and pauses, unable to read a complete sentence ### Word level Score the accuracy and stress of each word's pronunciation. #### Accuracy Score range: 0 - 10 * 10: The pronunciation of the word is perfect * 7-9: Most phones in this word are pronounced correctly but have accents * 4-6: Less than 30% of phones in this word are wrongly pronounced * 2-3: More than 30% of phones in this word are wrongly pronounced. In another case, the word is mispronounced as some other word. For example, the student mispronounced the word "bag" as "bike" * 1: The pronunciation is hard to distinguish * 0: no voice #### Stress Score range: {5, 10} * 10: The stress is correct, or this is a mono-syllable word * 5: The stress is wrong ### Phoneme level Score the pronunciation goodness of each phoneme within the words. Score range: 0-2 * 2: pronunciation is correct * 1: pronunciation is right but has a heavy accent * 0: pronunciation is incorrect or missed For the phones with an accuracy score lower than 0.5, an extra "mispronunciations" indicates which is the most likely phoneme that the current phone was actually pronounced. An example: ```json { "text": "LISA", "accuracy": 5, "phones": ["L", "IY1", "S", "AH0"], "phones-accuracy": [0.4, 2, 2, 1.2], "mispronunciations": [ { "canonical-phone": "L", "index": 0, "pronounced-phone": "D" } ], "stress": 10, "total": 6 } ``` ## Citation Please cite our paper if you find this work useful: ```bibtext @inproceedings{speechocean762, title={speechocean762: An Open-Source Non-native English Speech Corpus For Pronunciation Assessment}, booktitle={Proc. Interspeech 2021}, year=2021, author={Junbo Zhang, Zhiwen Zhang, Yongqing Wang, Zhiyong Yan, Qiong Song, Yukai Huang, Ke Li, Daniel Povey, Yujun Wang} } ```

# speechocean762：用于发音评分任务的非母语英语语料库 ## 简介发音评分是计算机辅助语言学习（Computer-Assisted Language Learning, CALL）系统中的关键技术。典型的发音评分任务可提供音素级、单词级与句子级的发音质量评分。本语料库旨在为发音评分任务提供免费公开的数据集。核心特性： * 可免费下载，适用于商业与非商业用途。 * 说话者群体覆盖儿童与成人。 * 标注涵盖句子级、单词级与音素级的多维度人工标注。本语料库包含5000条英语句子，所有说话者均为非母语使用者，母语为汉语。其中半数说话者为儿童，剩余为成人，数据集提供了年龄与性别信息。本次评分由五位专家完成。为避免主观偏差，每位专家均基于统一评分标准独立开展评分工作。 ## 使用示例 python >>> from datasets import load_dataset >>> test_set = load_dataset("mispeech/speechocean762", split="test") >>> len(test_set) 2500 >>> next(iter(test_set)) {'accuracy': 9, 'completeness': 10.0, 'fluency': 9, 'prosodic': 9, 'text': 'MARK IS GOING TO SEE ELEPHANT', 'total': 9, 'words': [{'accuracy': 10, 'phones': ['M', 'AA0', 'R', 'K'], 'phones-accuracy': [2.0, 2.0, 1.8, 2.0], 'stress': 10, 'text': 'MARK', 'total': 10, 'mispronunciations': []}, {'accuracy': 10, 'phones': ['IH0', 'Z'], 'phones-accuracy': [2.0, 1.8], 'stress': 10, 'text': 'IS', 'total': 10, 'mispronunciations': []}, {'accuracy': 10, 'phones': ['G', 'OW0', 'IH0', 'NG'], 'phones-accuracy': [2.0, 2.0, 2.0, 2.0], 'stress': 10, 'text': 'GOING', 'total': 10, 'mispronunciations': []}, {'accuracy': 10, 'phones': ['T', 'UW0'], 'phones-accuracy': [2.0, 2.0], 'stress': 10, 'text': 'TO', 'total': 10, 'mispronunciations': []}, {'accuracy': 10, 'phones': ['S', 'IY0'], 'phones-accuracy': [2.0, 2.0], 'stress': 10, 'text': 'SEE', 'total': 10, 'mispronunciations': []}, {'accuracy': 10, 'phones': ['EH1', 'L', 'IH0', 'F', 'AH0', 'N', 'T'], 'phones-accuracy': [2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0], 'stress': 10, 'text': 'ELEPHANT', 'total': 10, 'mispronunciations': []}], 'speaker': '0003', 'gender': 'm', 'age': 6, 'audio': {'path': '000030012.wav', 'array': array([-0.00119019, -0.00500488, -0.00283813, ..., 0.00274658, 0. , 0.00125122]), 'sampling_rate': 16000}} ## 评分标准专家将从三个层级开展评分：音素级、单词级与句子级。 ### 句子级评分从句子层面评估发音准确性、流畅性、完整性与韵律性。 #### 准确性评分范围：0-10 * 9-10分：句子整体发音优秀，语音学特征准确，无明显发音错误 * 7-8分：句子整体发音良好，仅存在少量发音错误 * 5-6分：句子整体发音可理解，存在较多发音错误与口音，但不影响基本语义理解 * 3-4分：整体发音拙劣、生硬，存在严重发音错误 * 0-2分：发音极差，仅能识别一两个单词 #### 完整性评分范围：0.0-1.0 指发音合格的单词占总单词的百分比。 #### 流畅性评分范围：0-10 * 8-10分：发音流畅，无明显停顿或口吃现象 * 6-7分：整体发音流畅，仅存在少量停顿、重复与口吃 * 4-5分：发音略有不畅，存在较多停顿、重复与口吃 * 0-3分：发音断续且极不流畅，存在大量停顿、重复与口吃 #### 韵律性评分范围：0-10 * 9-10分：语调正确，语速稳定，富有节奏感，可媲美母语使用者 * 7-8分：语调基本正确，语速稳定，整体连贯平滑，但存在少量口吃与停顿 * 5-6分：语速不稳定，存在较多口吃与停顿，节奏感较差 * 3-4分：语速不稳定，过快或过慢，无节奏感 * 0-2分：语调极差，存在大量口吃与停顿，无法完整朗读句子 ### 单词级评分对每个单词的发音准确性与重音进行评分。 #### 准确性评分范围：0-10 * 10分：单词发音完美 * 7-9分：单词内多数音素发音正确，但带有口音 * 4-6分：单词内错误发音的音素占比不足30% * 2-3分：单词内错误发音的音素占比超过30%；或该单词被误读为其他单词，例如将"bag"误读为"bike" * 1分：发音难以辨别 * 0分：无发声 #### 重音评分范围：{5, 10} * 10分：重音正确，或该单词为单音节词 * 5分：重音错误 ### 音素级评分对单词内每个音素的发音质量进行评分。评分范围：0-2 * 2分：发音正确 * 1分：发音正确但带有浓重口音 * 0分：发音错误或遗漏对于准确性得分低于0.5的音素，将通过额外的"mispronunciations"字段标注该音素实际被误读为最可能的音素。示例如下： json { "text": "LISA", "accuracy": 5, "phones": ["L", "IY1", "S", "AH0"], "phones-accuracy": [0.4, 2, 2, 1.2], "mispronunciations": [ { "canonical-phone": "L", "index": 0, "pronounced-phone": "D" } ], "stress": 10, "total": 6 } ## 引用说明若您认为本数据集对您的工作有所帮助，请引用以下论文： bibtext @inproceedings{speechocean762, title={speechocean762: An Open-Source Non-native English Speech Corpus For Pronunciation Assessment}, booktitle={Proc. Interspeech 2021}, year=2021, author={Junbo Zhang, Zhiwen Zhang, Yongqing Wang, Zhiyong Yan, Qiong Song, Yukai Huang, Ke Li, Daniel Povey, Yujun Wang} }

提供机构：

maas

创建时间：

2025-08-08

搜集汇总

数据集介绍