MaNeCo Corpus
收藏arXiv2020-08-14 更新2024-08-06 收录
下载链接:
http://arxiv.org/abs/2008.06222v1
下载链接
链接失效反馈官方服务:
资源简介:
MaNeCo Corpus是一个由马耳他大学语言学与语言技术研究所创建的大型数据集,包含了自2008年至2017年间马耳他最大的在线报纸《马耳他时报》的所有用户评论,总计超过250万条。该数据集不仅包括公开发布的评论,还包括被报纸管理员删除的评论,这些内容通常包含激烈的仇恨言论。MaNeCo Corpus的创建旨在通过分析网络评论中的仇恨言论,研究其特征和表达方式,以支持自动仇恨言论检测技术的发展。数据集涵盖了多种语言和语言变体,反映了马耳他社会的多元文化和语言背景。
The MaNeCo Corpus is a large-scale dataset developed by the Institute of Linguistics and Language Technology at the University of Malta. It includes all user comments from Malta's largest online newspaper, "The Times of Malta", spanning from 2008 to 2017, with a total of over 2.5 million comments. This dataset covers not only publicly posted comments but also those removed by the newspaper's administrators, which often contain aggressive hate speech. The creation of the MaNeCo Corpus aims to analyze hate speech in online comments to study its characteristics and expression patterns, thereby supporting the development of automated hate speech detection technologies. The dataset encompasses multiple languages and language varieties, reflecting the multicultural and multilingual background of Maltese society.
提供机构:
马耳他大学语言学与语言技术研究所
创建时间:
2020-08-14



