Reddit Social Group Mentions (Reddit-SGM)
收藏DataCite Commons2026-01-30 更新2026-05-07 收录
下载链接:
https://darus.uni-stuttgart.de/citation?persistentId=doi:10.18419/DARUS-4557
下载链接
链接失效反馈官方服务:
资源简介:
<p><strong>Dataset Name:</strong> Reddit Social Group Mentions (Reddit-SGM) is a dataset based on Reddit comments and contains mentions of social groups annotated by three different human annotators.</p>
<p><strong>Corpus Size:</strong> 2,040 comments</p>
<p><strong>Source:</strong> <a href="https://doi.org/10.1609/icwsm.v16i1.19377">Reddit Politosphere Dataset (DOI)</a></p>
<p><strong>Subreddits Included:</strong></p>
<ul>
<li><em>r/politics</em></li>
<li><em>r/worldpolitics</em></li>
<li><em>r/ukpolitics</em></li>
<li><em>r/Economics</em></li>
<li><em>r/Libertarian</em></li>
</ul>
<p><strong>Focus:</strong> Analysis of social group mentions within political discussions on Reddit</p>
<p><strong>Key Fields:</strong></p>
<ul>
<li><strong>body_cleaned_id:</strong> A unique identifier for each comment, originally sourced from the Reddit Politosphere Dataset</li>
<li><strong>segment_id:</strong> A unique identifier for annotated mentions within a comment</li>
<li><strong>subreddit:</strong> Name of the subreddit where the comment was posted</li>
<li><strong>year:</strong> Year of the comment's publication</li>
<li><strong>body_cleaned:</strong> Cleaned text content of the comment</li>
</ul>
<p><strong>Annotation Fields:</strong> Information related to the annotations provided by annotators</p>
<ul>
<li><strong>annotations1, 2, 3:</strong> Annotations provided by annotators 1, 2, and 3 for the <em>body_cleaned</em> field</li>
<li><strong>vote_segments:</strong> Segments (Mentions) annotated by all annotators, segments are separated by commas, and the number after <q>---</q> represents the index of the segment's starting position in the comment</li>
<li><strong>vote_counts:</strong> Number of votes received for each segment (mention), votes are listed in the same order as the corresponding segments (mentions) in vote_segments, separated by commas</li>
<li><strong>segment:</strong> Mention of a social group within the comment</li>
<li><strong>count:</strong> Total vote count for each segment, the number after <q>---</q> indicates the starting index of the segment in the comment</li>
</ul>
<p><strong>Additional Fields for describing label variation:</strong> Information capturing label variation in the annotations provided by annotators </p>
<ul>
<li><strong>disagreements:</strong> Indicator of whether annotators disagreed on the annotation</li>
<li><strong>reason_disagreement:</strong> Category of disagreement, from the following categories:
<ul>
<li><em>Referential ambiguity (RA)</em></li>
<li><em>Metonymy (M)</em></li>
<li><em>Adjective & Description (A) Note: this category is denoted as A&D in the paper, it is represented without an ampersand in the dataset to avoid issues. </em></li>
<li><em>Determiner (D)</em></li>
<li><em>Plural Noun (P)</em></li>
<li><em>Individual (I)</em></li>
<li><em>Annotation Error (AE)</em></li>
</ul>
</li>
<li><strong>type_socialgroup:</strong> Social group category, which can be:
<ul>
<li><em>Intimate Group (IG)</em></li>
<li><em>Organized Group (OG)</em></li>
<li><em>Aggregate Group (AG)</em></li>
</ul>
</li>
<li><strong>segment_belongs_to:</strong> Indicates if the segment is part of a broader segment</li>
</ul>
提供机构:
DaRUS
创建时间:
2024-10-30



