Chilean Twitter Hate Speech Dataset
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14619077
下载链接
链接失效反馈官方服务:
资源简介:
The dataset comprises a total of 4,547 Tweets ID’s, Authors ID’s about Hate Speech related to Chilean dialect or news that were posted on Twitter from 2020 to July 2022 and, 6542 Tweets ID’s related to the classified tweet's context.
tweets_data.csv - Whole dataset.
referenced_tweets_data.csv - Referenced Tweets data
tweets_data.parquet.gzip - Whole dataset.
referenced_tweets_data.parquet.gzip - Referenced Tweets data
The training set includes 4,547 examples labeled in 5 clases: "Odio", "Mujeres", "Comunidad LGBTQ+", "Comunidades Migrantes", "Pueblos Originarios" with values from 0 to 3 indicating the amount of annotators that indicated the tweet belonged in that class.
The Whole dataset includes the following columns:
tweet_id: Tweet identifier. (Anonymized)
author_id: Author identifier. (Anonymized)
conversation_id: Tuple which contains the tweets_id (from the file referenced_tweeets_data.csv) to which the labeled tweet references. this Id’s are in such order that in the first position is the tweet referenced in the labeled tweet, then the id in the second position is referenced by the tweet in the first position and so on and so on…
text: full text.
Odio: Hate classification votes.
Mujeres: Women classification votes.
Comunidades Migrantes: Immigrant communities classification votes.
Pueblos Originarios: Native americans classification votes.
The Referenced Tweets data has the following columns:
tweet_id: Tweet identifier. (Anonymized)
author_id: Author identifier. (Anonymized)
conversation_id: Contains either an ID number indicating the next referenced tweet in Referenced Tweets data or a 0 indicative of that tweet being the last referenced tweet.
text: full text.
创建时间:
2025-01-15



