Ethiopic Twitter Dataset for Amharic (ETD-AM)

Name: Ethiopic Twitter Dataset for Amharic (ETD-AM)
Creator: 汉堡大学信息学系MIN学院语言技术组
Published: 2019-12-10 07:18:13
License: 暂无描述

arXiv2019-12-10 更新2024-08-06 收录

下载链接：

http://arxiv.org/abs/1912.04419v1

下载链接

链接失效反馈

官方服务：

资源简介：

本研究介绍了首个针对阿姆哈拉语的Ethiopic Twitter数据集（ETD-AM），旨在识别滥用言论。该数据集自2014年起收集，使用Fidel脚本编写，主要包含阿姆哈拉语推文。数据集通过语言识别技术筛选，确保内容主要为阿姆哈拉语。ETD-AM数据集不仅用于分析滥用言论的时间分布和趋势，还用于比较社交媒体与一般参考语料库中的滥用言论内容。该数据集的应用领域主要集中在社交媒体内容分析，特别是研究阿姆哈拉语社交媒体中滥用言论的使用和趋势，以及如何通过技术、法律和公众参与来管理和对抗社交媒体中的滥用言论。

This study presents the first Amharic-focused Ethiopic Twitter Dataset (ETD-AM) for abusive language detection. Collected since 2014, this dataset primarily consists of Amharic tweets written in the Fidel script. It was filtered via language identification techniques to ensure that most of its content is in Amharic. The ETD-AM dataset is not only used to analyze the temporal distribution and trends of abusive language, but also to compare abusive language content between social media and general reference corpora. The application fields of this dataset mainly focus on social media content analysis, particularly researching the usage and trends of abusive language in Amharic social media, as well as how to manage and combat abusive language on social media through technology, legislation and public participation.

提供机构：

汉堡大学信息学系MIN学院语言技术组

创建时间：

2019-12-10

5,000+

优质数据集

54 个

任务类型

进入经典数据集