five

Reddit Social Group Mentions (Reddit-SGM)

收藏
DataCite Commons2026-01-30 更新2026-05-07 收录
下载链接:
https://darus.uni-stuttgart.de/citation?persistentId=doi:10.18419/DARUS-4557
下载链接
链接失效反馈
官方服务:
资源简介:
<p><strong>Dataset Name:</strong> Reddit Social Group Mentions (Reddit-SGM) is a dataset based on Reddit comments and contains mentions of social groups annotated by three different human annotators.</p> <p><strong>Corpus Size:</strong> 2,040 comments</p> <p><strong>Source:</strong> <a href="https://doi.org/10.1609/icwsm.v16i1.19377">Reddit Politosphere Dataset (DOI)</a></p> <p><strong>Subreddits Included:</strong></p> <ul> <li><em>r/politics</em></li> <li><em>r/worldpolitics</em></li> <li><em>r/ukpolitics</em></li> <li><em>r/Economics</em></li> <li><em>r/Libertarian</em></li> </ul> <p><strong>Focus:</strong> Analysis of social group mentions within political discussions on Reddit</p> <p><strong>Key Fields:</strong></p> <ul> <li><strong>body_cleaned_id:</strong> A unique identifier for each comment, originally sourced from the Reddit Politosphere Dataset</li> <li><strong>segment_id:</strong> A unique identifier for annotated mentions within a comment</li> <li><strong>subreddit:</strong> Name of the subreddit where the comment was posted</li> <li><strong>year:</strong> Year of the comment's publication</li> <li><strong>body_cleaned:</strong> Cleaned text content of the comment</li> </ul> <p><strong>Annotation Fields:</strong> Information related to the annotations provided by annotators</p> <ul> <li><strong>annotations1, 2, 3:</strong> Annotations provided by annotators 1, 2, and 3 for the <em>body_cleaned</em> field</li> <li><strong>vote_segments:</strong> Segments (Mentions) annotated by all annotators, segments are separated by commas, and the number after <q>---</q> represents the index of the segment's starting position in the comment</li> <li><strong>vote_counts:</strong> Number of votes received for each segment (mention), votes are listed in the same order as the corresponding segments (mentions) in vote_segments, separated by commas</li> <li><strong>segment:</strong> Mention of a social group within the comment</li> <li><strong>count:</strong> Total vote count for each segment, the number after <q>---</q> indicates the starting index of the segment in the comment</li> </ul> <p><strong>Additional Fields for describing label variation:</strong> Information capturing label variation in the annotations provided by annotators </p> <ul> <li><strong>disagreements:</strong> Indicator of whether annotators disagreed on the annotation</li> <li><strong>reason_disagreement:</strong> Category of disagreement, from the following categories: <ul> <li><em>Referential ambiguity (RA)</em></li> <li><em>Metonymy (M)</em></li> <li><em>Adjective & Description (A) Note: this category is denoted as A&D in the paper, it is represented without an ampersand in the dataset to avoid issues. </em></li> <li><em>Determiner (D)</em></li> <li><em>Plural Noun (P)</em></li> <li><em>Individual (I)</em></li> <li><em>Annotation Error (AE)</em></li> </ul> </li> <li><strong>type_socialgroup:</strong> Social group category, which can be: <ul> <li><em>Intimate Group (IG)</em></li> <li><em>Organized Group (OG)</em></li> <li><em>Aggregate Group (AG)</em></li> </ul> </li> <li><strong>segment_belongs_to:</strong> Indicates if the segment is part of a broader segment</li> </ul>
提供机构:
DaRUS
创建时间:
2024-10-30
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作