five

Mergen corpus

收藏
Figshare2021-01-28 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/Mergen_corpus/13655678
下载链接
链接失效反馈
官方服务:
资源简介:
People involvedThis corpus was created by Aidan Winberry in 2020 from a recording of Raisa Alekseevna Beldy readingthe text of a Nanai fairy-tale ”Mergen ningman”. The recording was made by Vasily Kharitonov.Annotation schemeThe information about Nanai phonemes is taken from (Ko & Yurn, 2011).Coding schemeThe sounds and phonemes are represented by their IPA symbols in Unicode.The text of the fairy-tale is provided in Cyrilic Nanai orthography. Nanai writing system is nearly phonemic so a Latin transcription layer would simply copy the phonemic tier in IPA.In several segments the sound is corrupted by background music; in this case the annotation on phonemic and phonetic level is omitted.Annotation qualityAnnotations were made without consulting with the dictionaries. All phonemes and allophonic variantsare marked aurally.Diphthongs are not thoroughly marked.The “Words” tier follows the text of the fairy-tale while the “Phonemes” and “Sounds” tiers representwhat is actually being said instead. Several utterances end with an ellipsis which marks correcting slipsof tongue.

本语料库由艾丹·温伯里(Aidan Winberry)于2020年制作,素材源自瓦西里·哈里托诺夫(Vasily Kharitonov)录制的莱莎·阿列克谢耶夫娜·别尔迪(Raisa Alekseevna Beldy)朗读那乃族童话《Mergen ningman》的音频。 标注方案:有关那乃语音位的信息引自(Ko & Yurn, 2011)。 编码方案:语音与音素以Unicode编码的国际音标(International Phonetic Alphabet, IPA)符号表示。该童话文本采用西里尔字母那乃文书写体系,由于那乃文书写系统近乎音位化,因此拉丁转写层可直接复制音位层级的国际音标内容。部分片段的语音因背景音乐干扰出现损坏,此类情形下将省略音位与语音层面的标注。 标注质量:本次标注未参考词典,所有音素与音位变体均通过听觉方式标记,双元音未做全面标注。"词语"层级严格依照童话原文生成,而"音素"与"语音"层级则如实还原实际发声内容。部分语句以省略号结尾,用于标记口误修正的情况。
创建时间:
2021-01-28
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作