thai-toxicity-tweet

Name: thai-toxicity-tweet
Creator: OpenDataLab
License: 暂无描述

OpenXLab2026-04-18 收录

下载链接：

https://openxlab.org.cn/datasets/OpenDataLab/thai-toxicity-tweet

下载链接

链接失效反馈

官方服务：

资源简介：

Thai Toxicity Tweet Corpus contains 3,300 tweets (506 tweets with texts missing) annotated by humans with guidelines including a 44-word dictionary. The author obtained 2,027 and 1,273 toxic and non-toxic tweets, respectively; these were labeled by three annotators. The result of corpus analysis indicates that tweets that include toxic words are not always toxic. Further, it is more likely that a tweet is toxic, if it contains toxic words indicating their original meaning. Moreover, disagreements in annotation are primarily because of sarcasm, unclear existing target, and word sense ambiguity. Notes from data cleaner: The data is included into huggingface/datasets in Dec 2020. By this time, 506 of the tweets are not available publicly anymore. We denote these by TWEET-NOT-FOUND in tweet-text. Processing can be found at this PR.

提供机构：

OpenDataLab

创建时间：

2023-12-07