five

Lists of stopwords, polarity shifters and AnAwords of Bosnian language

收藏
NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/8021149
下载链接
链接失效反馈
官方服务:
资源简介:
The dataset comprises three lists, a list of stopwords, a list of polarity shifters and a list of AnAwords (in two files) of the Bosnian language. Stopwords refer to a set of words contained in a stop list that are deliberately filtered out or "stopped" during the processing of natural language data, specifically text. These words are typically common and frequently occurring words in a language that are considered to have little or no significance in determining the meaning or context of a text. AnAwords (intensifiers and diminishers) refer to a set of words primarily functioning as intensifiers and diminishers, often manifesting as adverbs of manner and adjectives. The compilation of AnAwords is based on categorization, which includes six sublists: maximizers, boosters, approximators, relative intensifiers, diminishers, and minimizers. The list is split into two parts (intensifiers and diminishers) in two separate files. Polarity shifters are words that can affect the polarity of a phrase, inverting or weakening it. When these words are content words, such as verbs, nouns, and adjectives, we refer to them as polarity shifters.

本数据集包含三类列表,分别为波斯尼亚语的停用词(stopwords)列表、极性转换词(polarity shifters)列表,以及拆分存储于两个文件中的AnA词(AnAwords)列表。 停用词(stopwords)指自然语言数据(特指文本)处理流程中,被刻意过滤或“停用”的一类词汇,均收录于停用词表内。此类词汇多为某一语言中的高频通用词,在判定文本语义与上下文语境时,几乎不具备或仅拥有极低的信息价值。 AnA词(AnAwords,即增强词与弱化词)指主要承担语义增强、弱化功能的一类词汇,常以方式副词与形容词的形式出现。该类词汇的汇编基于分类体系,共包含六个子列表:最大化增强词(maximizers)、助推增强词(boosters)、近似增强词(approximators)、相对增强词(relative intensifiers)、弱化词(diminishers)以及最小化弱化词(minimizers)。该列表被划分为增强词与弱化词两个部分,分别存储于两个独立文件中。 极性转换词(polarity shifters)指能够影响短语语义极性,使其反转或弱化的词汇。当此类词汇属于实义词范畴(如动词、名词、形容词)时,我们将其归类为极性转换词。
创建时间:
2023-12-17
二维码
社区交流群
二维码
科研交流群
商业服务