five

Maninkakan Lexicon

收藏
DataCite Commons2021-07-01 更新2025-04-16 收录
下载链接:
https://catalog.ldc.upenn.edu/LDC2013L01
下载链接
链接失效反馈
官方服务:
资源简介:
<h3>Introduction</h3><br> <p>Maninkakan Lexicon was developed by LDC and contains 5,834 entries of the Maninkakan language presented as a Maninkakan-English lexicon and a Maninkakan-French lexicon. It is the second publication in an ongoing LDC project to to build an electronic dictionary of three Mandekan languages: Mawukakan, Maninkakan and Bambara. These are Eastern Manding languages in the Mande Group of the Niger-Congo language family. LDC released a Mawukakan Lexicon (<a href="http://catalog.ldc.upenn.edu/LDC2005L01" rel="nofollow">LDC2005L01</a>) in 2005 and a Bamanankan Lexicon (<a href="../../../LDC2016L01">LDC2016L01</a>) in 2016.</p><br> <p>There are approximately 3.5 million Maninkakan speakers in West Africa, mostly in Guinea and Mali, and also in Liberia, Senegal, Sierra Leone and Ivory Coast. The word <em>Maninkakan</em> is composed of three lexemes: (1) <em>Mande</em> or <em>Manden</em>, the name of the territory occupied by the people who speak the language, (2) the suffix <em>-ka</em> which when added derives the name of the inhabitant of Mande or Manden, and (3) <em>kan</em>, which means language. Thus Maninkakan is the language of the people who live in Mande/Manden. Mandekan, Mandenkan, Maninka and Malinke are all used to refer to the language of the inhabitants of the Mande/Manden.</p><br> <p>Meghan Glenn served as an editor for the French and English parts of this Lexicon. More information about the work of LDC in the languages of West Africa and the challenges those languages present for language resource development can be found <a href="https://www.ldc.upenn.edu/sites/www.ldc.upenn.edu/files/west-african-languages.pdf" rel="nofollow">here</a>.</p><br> <h3>Data</h3><br> <p>Maninkakan is written using Latin script, Arabic script and the <a href="http://www.omniglot.com/writing/nko.htm" rel="nofollow">NKo </a>alphabet. This lexicon is presented using a Latin-based transcription system because the Latin alphabet is familiar to the majority of Mandekan language speakers and because it is expected to facilitate the work of researchers interested in this resource.</p><br> <p>The dictionary is provided in two formats, Toolbox and XML. <a href="http://www-01.sil.org/computIng/toolbox/" rel="nofollow">Toolbox</a> is a version of the widely used SIL <a href="http://www-01.sil.org/computing/shoebox/" rel="nofollow">Shoebox</a> program adapted to display Unicode. Toolbox can be downloaded for free from this link,&nbsp;<a href="https://software.sil.org/toolbox">https://software.sil.org/toolbox</a>. The Toolbox files are provided in two fonts, Arial and Doulous SIL. The Arial files should display using the Arial font which is standard on most operating systems. <a href="http://scripts.sil.org/doulossilfont" rel="nofollow">Doulous SIL</a>, available as a free download, is a robust font that should display all characters without issue. Users should launch Toolbox using the *.prj files in the Arial or Doulous_SIL folders.</p><br> <p>The lexicon is presented in Unicode Normalization Form D, canonical decomposition. This means that all glyphs are divided into as many parts as possible. See the following link for more information on <a href="http://unicode.org/reports/tr15/" rel="nofollow">Unicode normalization forms</a>.</p><br> <p>The XML formatted lexicon was generated by Toolbox and a DTD is included.</p><br> <h3>Samples</h3><br> <p>Please view this <a href="desc/addenda/LDC2013L01.jpg" rel="nofollow">XML sample</a>.</p><br> <h3>Updates</h3><br> <p>None at this time.</p></br> Portions © 2013 Trustees of the University of Pennsylvania
提供机构:
Linguistic Data Consortium
创建时间:
2020-11-30
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作