Orthography, phonology and morphology in the Arabic lexicon
收藏CESSDA2025-06-12 更新2024-08-10 收录
下载链接:
https://datacatalogue.cessda.eu/detail?lang=en&q=3acba097c6542e81e92bf7b3035ab40060ecc9174df8023ea14456fce2d2aaae
下载链接
链接失效反馈官方服务:
资源简介:
Arabic script is essentially alphabetic, that is it uses different characters based on the pronunciation of words. However, much Arabic writing only includes the consonants, meaning that there is a lot of ambiguity where a written word could represent many different actual words or forms of those words.
This project aims to apply a framework previously developed for mapping between spelling and pronunciation in European languages (English, Dutch, German and French) to define the relations between written and spoken forms in Modern Standard Arabic and then to apply a set of probabilities, extracted from Arabic corpora, to determine which of the possible pronunciations of a particular written form is the most likely.
The resulting lexicon will be useful for a range of Arabic NLP (Natural Language Processing) applications, and the structure of the lexicon means that it will be possible to extend it to cover different varieties of Arabic.
提供机构:
UK Data Service
创建时间:
2011-09-29



