Buckwalter Arabic Morphological Analyzer Version 2.0
收藏DataCite Commons2021-07-01 更新2025-04-16 收录
下载链接:
https://catalog.ldc.upenn.edu/LDC2004L02
下载链接
链接失效反馈官方服务:
资源简介:
<h3>Introduction</h3><br>
<p>This file contains documentation on the Buckwalter Arabic Morphological Analyzer Version 2.0.</p><br>
<h3>Data</h3><br>
<p>The data consists primarily of three Arabic-English lexicon files: prefixes (299 entries), suffixes (618 entries), and stems (82158 entries representing 38600 lemmas). The lexicons are supplemented by three morphological compatibility tables used for controlling prefix-stem combinations (1648 entries), stem-suffix combinations (1285 entries), and prefix-suffix combinations (598 entries). The actual code for morphology analysis and POS tagging is contained in a Perl script. The documentation consists of a readme file with a description of the lexicon files, the morphological compatibility tables, the morphology analysis algorithm, a summary of stem morphological categories, and a table with the authors Arabic transliteration system.</p><br>
<h3>Samples</h3><br>
<p>To see an example of the analyzers output, please examine this <a href="desc/addenda/LDC2004T27.xml" rel="nofollow">sample</a>.</p><br>
<h3>Additional Licensing Instructions</h3><br>
<p>This 'members-only' corpus is available to current members who can request the data at the listed reduced-license fee. Contact <a href="mailto:ldc@ldc.upenn.edu">ldc@ldc.upenn.edu</a> for information about becoming a member.</p></br>
提供机构:
Linguistic Data Consortium
创建时间:
2020-11-30



