five

ConvoTox: A Public Conversation Toxicity Dataset

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14653410
下载链接
链接失效反馈
官方服务:
资源简介:
Online social media has become increasingly popular in recent years due to its ease of access and ability to connect with others. One of social media's main draws is its anonymity, allowing users to share their thoughts and opinions without fear of judgment or retribution. This anonymity has also made social media prone to harmful content, which requires moderation to ensure responsible and productive use. Several methods using artificial intelligence have been employed to detect harmful content. However, conversation and contextual analysis of hate speech are still understudied. Most promising works only analyze a single text at a time rather than the conversation supporting it. To address this gap, we present ConvoTox, a large-scale dataset designed for studying toxicity in conversational settings. ConvoTox comprises over 1 million responses collected from the top 100 posts across 8 Reddit communities that allow profanity, including both posts and their corresponding comment threads. The dataset is organized in a tree-based structure, enabling researchers to analyze user behavior and interaction patterns in context. We also provide a potential metric based on tree aggregation to analyze further. Initial insights reveal that toxicity in comments often propagates in conversational threads and that immediate context significantly influences the tone of a response. By providing a rich resource for conversation-focused toxicity analysis, ConvoTox aims to support advancements in understanding and mitigating harmful online behavior.
创建时间:
2025-01-15
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作