five

Buckwalter Arabic Morphological Analyzer Version 2.0

收藏
DataCite Commons2021-07-01 更新2025-04-16 收录
下载链接:
https://catalog.ldc.upenn.edu/LDC2004L02
下载链接
链接失效反馈
官方服务:
资源简介:
<h3>Introduction</h3><br> <p>This file contains documentation on the Buckwalter Arabic Morphological Analyzer Version 2.0.</p><br> <h3>Data</h3><br> <p>The data consists primarily of three Arabic-English lexicon files: prefixes (299 entries), suffixes (618 entries), and stems (82158 entries representing 38600 lemmas). The lexicons are supplemented by three morphological compatibility tables used for controlling prefix-stem combinations (1648 entries), stem-suffix combinations (1285 entries), and prefix-suffix combinations (598 entries). The actual code for morphology analysis and POS tagging is contained in a Perl script. The documentation consists of a readme file with a description of the lexicon files, the morphological compatibility tables, the morphology analysis algorithm, a summary of stem morphological categories, and a table with the authors Arabic transliteration system.</p><br> <h3>Samples</h3><br> <p>To see an example of the analyzers output, please examine this <a href="desc/addenda/LDC2004T27.xml" rel="nofollow">sample</a>.</p><br> <h3>Additional Licensing Instructions</h3><br> <p>This 'members-only' corpus is available to current members who can request the data at the listed reduced-license fee. Contact&nbsp;<a href="mailto:ldc@ldc.upenn.edu">ldc@ldc.upenn.edu</a>&nbsp;for information about becoming a member.</p></br>
提供机构:
Linguistic Data Consortium
创建时间:
2020-11-30
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作