five

Buckwalter Arabic Morphological Analyzer Version 1.0

收藏
DataCite Commons2021-07-01 更新2025-04-16 收录
下载链接:
https://catalog.ldc.upenn.edu/LDC2002L49
下载链接
链接失效反馈
官方服务:
资源简介:
<h3>Introduction</h3><br> <p>Buckwalter Arabic Morphological Analyzer Version 1.0 is used for annotating Arabic text with part of speech tags.</p><br> <h3>Data</h3><br> <p>The data consists primarily of three Arabic-English lexicon files: prefixes (299 entries), suffixes (618 entries), and stems (82,158 entries representing 38,600 lemmas). The lexicons are supplemented by three morphological compatibility tables used for controlling prefix-stem combinations (1,648 entries), stem-suffix combinations (1,285 entries), and prefix-suffix combinations (598 entries). The actual code for morphology analysis and POS tagging is contained in a Perl script. The documentation consists of a readme file with a description of the lexicon files, the morphological compatibility tables, the morphology analysis algorithm, a summary of stem morphological categories, and a table with the author's Arabic transliteration system.</p><br> <h3>Updates</h3><br> <p>There has been a case mismatch in the manner by which six files were named in the data, compared with their names in the documentation and the script, which caused the analyzer to crash on case sensitive systems. This problem has been remedied and you can now download the fixed version of the analyzer.</p><br> <h3>Licensing</h3><br> <p>Buckwalter Arabic Morphological Analyzer Version 1.0&nbsp; is released under the <a href="https://www.gnu.org/licenses/old-licenses/gpl-2.0-faq.html" rel="nofollow"> GNU General Public License version 2</a>. Organizations interested in licensing the lexicon and/or morphological analyzer for commercial use should contact: QAMUS LLC 448 South 48th St. Philadelphia, PA 19143 ATTN: Tim Buckwalter email: info@qamus.org</p><br> <h3>Note</h3><br> <p>This corpus is free of charge as a web download distribution; a request must be submitted to ldc@ldc.upenn.edu to obtain the data. Note that there is a $100 charge if requested on a CD-ROM.</p></br> Portions © 2002 QAMUS LLC (www.qamus.org), © 2002 Trustees of the University of Pennsylvania
提供机构:
Linguistic Data Consortium
创建时间:
2020-11-30
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作