agamgoy/balanced_upvotes_downvotes_data
收藏Hugging Face2024-09-10 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/agamgoy/balanced_upvotes_downvotes_data
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含来自Reddit的评论数据,涵盖了评论的唯一标识符、子论坛标识符、创建时间、评论内容、作者信息、父评论标识符、评分、是否置顶、子论坛名称、作者样式类、过滤标识、链接标识符、是否被移除、作者样式文本、是否被标记等字段。此外,数据集还包含多个与评论内容相关的毒性评分字段(如毒性、严重毒性、身份攻击、侮辱、亵渎、威胁、性暗示等)以及一些社交行为相关的字段(如笑声、感激、捐赠、礼貌、支持、同意等)。数据集分为训练集,包含2,658,104个样本,总大小为2,453,880,546字节。
This dataset contains comment data from Reddit, including fields such as unique comment identifiers (id), subreddit identifiers (subreddit_id), creation times (created and created_utc), comment content (body), author information (author), parent comment identifiers (parent_id), scores (score), whether the comment is stickied (stickied), subreddit names (subreddit), author flair CSS classes (author_flair_css_class), filter identifiers (filter), link identifiers (link_id), whether the comment is removed (removed), author flair text (author_flair_text), and whether the comment is distinguished (distinguished). Additionally, the dataset includes multiple fields related to toxicity scores (such as TOXICITY, SEVERE_TOXICITY, IDENTITY_ATTACK, INSULT, PROFANITY, THREAT, SEXUALLY_EXPLICIT) and fields related to social behaviors (such as laughter, gratitude, donation, politeness, support, agreement). The dataset is divided into a training set containing 2,658,104 samples, with a total size of 2,453,880,546 bytes.
提供机构:
agamgoy



