five

Lessons learned from Automatic Indexing Projects regarding to Persian Language Specifications

收藏
IFLA Repository2026-03-02 更新2026-05-16 收录
下载链接:
https://repository.ifla.org/items/2bb4e5c1-bb10-4494-9736-1b0e0439d536
下载链接
链接失效反馈
官方服务:
资源简介:
Persian reading and writing are associated with some difficulties due to specific features of this language. this paper attempts to examine automated indexing experiences, lessons, and outcomes of Persian language documents to provide effective solutions for improvement of indexing and retrieval of them. The most important problems in Persian language and script in automatic indexing include selection of an appropriate keyword, building a vocabulary, Semantic, Verb and word sense ambiguities in the sentences, Spaces and Pseudo-spaces in Persian script, isolated and cursive writing, morphology of Persian language, typographical and spelling errors. Removing the stop words, pre-processing of characters and script, identifying the boundaries of words, equalizing different spellings, the automatic stemming, Weighting and scoring of words, Detection of phrasal verbs and compound phrases, Spellchecking through creating morphological or even syntactic spellcheckers design of a corrector and proposer system , developing an infrastructure database for Persian language and script usage are solutions proposed to facilitate the automatic indexing of Persian texts.
提供机构:
International Federation of Library Associations and Institutions
创建时间:
2025-09-24
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作