agamgoy/balanced_upvotes_downvotes_data

Name: agamgoy/balanced_upvotes_downvotes_data
Creator: agamgoy
Published: 2024-09-10 16:06:54
License: 暂无描述

Hugging Face2024-09-10 更新2024-12-14 收录

下载链接：

https://hf-mirror.com/datasets/agamgoy/balanced_upvotes_downvotes_data

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集包含来自Reddit的评论数据，涵盖了评论的唯一标识符、子论坛标识符、创建时间、评论内容、作者信息、父评论标识符、评分、是否置顶、子论坛名称、作者样式类、过滤标识、链接标识符、是否被移除、作者样式文本、是否被标记等字段。此外，数据集还包含多个与评论内容相关的毒性评分字段（如毒性、严重毒性、身份攻击、侮辱、亵渎、威胁、性暗示等）以及一些社交行为相关的字段（如笑声、感激、捐赠、礼貌、支持、同意等）。数据集分为训练集，包含2,658,104个样本，总大小为2,453,880,546字节。

This dataset contains comment data from Reddit, including fields such as unique comment identifiers (id), subreddit identifiers (subreddit_id), creation times (created and created_utc), comment content (body), author information (author), parent comment identifiers (parent_id), scores (score), whether the comment is stickied (stickied), subreddit names (subreddit), author flair CSS classes (author_flair_css_class), filter identifiers (filter), link identifiers (link_id), whether the comment is removed (removed), author flair text (author_flair_text), and whether the comment is distinguished (distinguished). Additionally, the dataset includes multiple fields related to toxicity scores (such as TOXICITY, SEVERE_TOXICITY, IDENTITY_ATTACK, INSULT, PROFANITY, THREAT, SEXUALLY_EXPLICIT) and fields related to social behaviors (such as laughter, gratitude, donation, politeness, support, agreement). The dataset is divided into a training set containing 2,658,104 samples, with a total size of 2,453,880,546 bytes.

提供机构：

agamgoy

5,000+

优质数据集

54 个

任务类型

进入经典数据集