MOL - Multilingual Offensive Lexicon
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/7787172
下载链接
链接失效反馈官方服务:
资源简介:
MOL - Multilingual Offensive Lexicon is a specialized lexicon for abusive language detection for low-resource languages. It consists of 1,000 explicit and implicit terms and expressions with pejorative connotations, manually identified by a specialist and annotated by three different experts with contextual information archiving high annotator human agreement (73% Kappa).
Each term and expression from MOL contains a binary class: context-dependent offensiveness and context-independent offensiveness. For example, the term ``hypocrite'' is classified as context-independent offensiveness, since it is mostly found in the pejorative contexts of use. On the other hand, the term (``worm'') is classified as context-dependent offensiveness because it also may be found in both pejorative and non-pejorative contexts of use such as ``Politicians are like society worms'' and ``Reduces virus, worm, and unwanted access threats''.
Finally, terms that showed strong potential to indicate hate speech targets were also annotated. For instance, ``slut'' and ``Jews from hell'' may indicate sexist and antisemitism comments. Finally, these terms and expressions, originally written in Portuguese, were manually translated by native speakers in English, Spanish, German, French, and Turkish.
创建时间:
2024-07-21



