kavyamanohar/Pronunciation-dictionary-malayalam
收藏Hugging Face2024-07-20 更新2024-07-22 收录
下载链接:
https://hf-mirror.com/datasets/kavyamanohar/Pronunciation-dictionary-malayalam
下载链接
链接失效反馈官方服务:
资源简介:
这是一个包含马拉雅拉姆语单词及其国际音标(IPA)发音的数据集。发音是使用Mlphon库自动生成的。数据集由Kavya Manohar整理,适用于自动语音识别(ASR)和文本到语音(TTS)系统,以及马拉雅拉姆语-IPA音译模型的训练。数据集包含多个类别,如常用词、动词、名词、英语借词、梵语名词、专有名词、代词、人名和地名等。发音转录是自动生成的,不是人工整理的,因此存在一定的局限性。数据集的结构没有提供单独的训练-测试分割。
This is a collection of Malayalam words and their pronunciation described in IPA format. The pronunciations have been automatically generated using the Mlphon library. The dataset is curated by Kavya Manohar and is suitable for use in Automatic Speech Recognition (ASR) and Text-to-Speech (TTS) systems, as well as for training Malayalam-IPA transliteration models. The dataset includes multiple categories such as common words, verbs, nouns, English loanwords, Sanskrit nouns, proper nouns, pronouns, person names, and place names. The pronunciation transcriptions are automatically generated and not human-curated, which introduces certain limitations. The dataset structure does not provide separate train-test splits.
提供机构:
kavyamanohar
原始信息汇总
Malayalam Pronunciation Dictionary
数据集概述
- 名称: Malayalam Pronunciation Dictionary
- 别名: Malayalam Phonetic Lexicon
- 语言: Malayalam
- 许可证: CC-BY-SA-4.0
- 大小: 10K<n<100K
- 任务类别: text2text-generation
数据集描述
该数据集包含马拉雅拉姆语单词及其在IPA(国际音标)格式中的发音。发音是通过Mlphon Python库自动生成的。
数据集结构
数据集中的单词被分类为以下类别:
- commonwords
- verbs
- nouns
- english
- nouns_sanskrit
- proper_nouns
- pronouns
- person-names
- place-names
数据集来源
- commonwords: 按频率排序,来源于Indic-NLP-Corpus。
- 其他类别: 来源于Mlmorph的精选词汇集合。
使用场景
- 可直接用于ASR和TTS系统,需要基于IPA的音标词典。
- 可用于训练马拉雅拉姆语到IPA的音译模型。
数据集限制
- IPA发音由Mlphon库自动生成,非人工校对。
- Mlphon生成的是音位转录,而非音标转录。
引用
@ARTICLE{9877808, author={Manohar, Kavya and Jayan, A. R. and Rajan, Rajeev}, journal={IEEE Access}, title={Mlphon: A Multifunctional Grapheme-Phoneme Conversion Tool Using Finite State Transducers}, year={2022}, volume={10}, number={}, pages={97555-97575}, doi={10.1109/ACCESS.2022.3204403}}



