Hate Speech in Chilean Twitter

Name: Hate Speech in Chilean Twitter
Creator: IEEE Dataport
License: 暂无描述

ieee-dataport.org2025-03-22 收录

下载链接：

https://ieee-dataport.org/documents/hate-speech-chilean-twitter

下载链接

链接失效反馈

官方服务：

资源简介：

In the last few years, several organizations have manifested their concern over the increase in use of Hateful Speech or Hate Speech for short, this concept refers to forms of expression or audio-visual content that encourage discrimination or violence against individuals or groups solely based on their gender, sexual orientation, ethnicity, religion or nationality. Being able to monitor this phenomenon in a timely manner can help societies and their governments to prevent tensions, crimes and conflicts that endangers not only the most fundamental democratic values but also order stability and social peace. The fast massification of social platforms has transformed them into one of the main mediums used by people today for creating and sharing information. Consequently, social media platforms such as Twitter, Instagram or Facebook are the staging in which Hate Speech is mostly propagated today. Sadly the great reach of these platforms, their public nature, the social dynamics that are perpetuated in them and the absence of an explicit regulatory framework, only worsen and increase the magnitude of this phenomena. Mining such conversations, such as Tweets, to develop a dataset can serve as a data resource for interdisciplinary research related to the analysis of interest, views, opinions and help us in the creation of tools to further our understanding of social dynamics related to Hate Speech propagation and analysis.The dataset is compliant with the privacy policy, developer agreement, and guidelines for content redistribution of Twitter and the FAIR principles (Findability, Accessibility, Interoperability, and Reusability) principles for scientific data management.Data DescriptionThe dataset comprises a total of 4,547 Tweets ID’s, Authors ID’s about Hate Speech related to chilean dialect or news that were posted on Twitter from 2020 to july 2022 and and 6542 Tweets ID’s related to the classified tweets context. tweets_train.csv - Train set.public_test_data.csv - Test set.referenced_tweets_data.csv - Referenced Tweets dataThe train set includes 2255 examples labeled in 5 clases: "Odio", "Mujeres", "Comunidad LGBTQ+", "Comunidades Migrantes", "Pueblos Originarios" with values from 0 to 1 indicating 0 for false and 1 for true where 0 means the tweets doesn't contain the class.The train set examples include the following columns:tweet_idauthor_idconversation_id: Tuple which contains the tweets_id (from the file referenced_tweeets_data.csv) to which the labeled tweet references. this Id’s are in such order that in the first position is the tweet referenced in the labeled tweet, then the id in the second position is referenced by the tweet in the first position and so on and so on…The dataset contains only Tweet and Author IDs in compliance with the terms and conditions mentioned in the privacy policy, developer agreement, and guidelines for content redistribution of Twitter. The Tweet IDs need to be hydrated to be used. For hydrating this dataset the Hydrator application (link to download and a step-by-step tutorial on how to use Hydrator) may be used.

近年来，众多机构对仇恨言论（简称仇恨言论）使用量的增加表示出深切关注。此概念涉及基于个人或群体的性别、性取向、种族、宗教或国籍而煽动歧视或暴力的表达形式或视听内容。及时监控这一现象有助于社会及其政府预防紧张局势、犯罪和冲突，这不仅危及最根本的民主价值观，也威胁到秩序稳定和社会安宁。社交平台的快速大众化将其转变为人们当今用于创造和分享信息的主要媒介之一。因此，如Twitter、Instagram或Facebook等社交媒体平台已成为仇恨言论主要传播的舞台。遗憾的是，这些平台的广泛覆盖范围、其公开性质、在其中持续存在的社交动态以及缺乏明确的监管框架，反而加剧并扩大了这一现象的规模。从诸如推文等对话中挖掘数据以构建数据集，可以作为与仇恨言论分析相关的跨学科研究的数据资源，并有助于我们创建工具，以深化对与仇恨言论传播和分析相关的社会动态的理解。该数据集符合Twitter的隐私政策、开发者协议以及内容再分发指南，以及科学数据管理的FAIR原则（可发现性、可访问性、互操作性和可重用性）。数据描述：该数据集包含共计4,547条与智利方言或新闻相关的仇恨言论推文ID及其作者ID，这些推文于2020年至2022年7月间在Twitter上发布，以及6,542条与分类推文上下文相关的推文ID。tweets_train.csv - 训练集.public_test_data.csv - 测试集.referenced_tweets_data.csv - 引用推文数据。训练集包含2255个示例，分为5类：“仇恨”、“女性”、“LGBTQ+社区”、“移民社区”和“原住民”，这些类别由0至1的值表示，其中0代表假，1代表真，0表示推文不包含该类别。训练集示例包括以下列： tweet_id - 推文ID。 author_id - 作者ID。 conversation_id - 元组，包含引用推文ID（来自referenced_tweets_data.csv文件），这些ID的顺序是，第一个位置是引用推文的ID，第二个位置的ID是由第一个位置的推文引用的，依此类推。该数据集仅包含推文和作者ID，符合隐私政策、开发者协议和内容再分发指南中提到的条款和条件。推文ID需要被激活才能使用。激活此数据集可以使用Hydrator应用程序（下载链接和如何使用Hydrator的逐步教程）。

提供机构：

IEEE Dataport

5,000+

优质数据集

54 个

任务类型

进入经典数据集