SFU Opinion and Comments Corpus
收藏DataCite Commons2024-04-15 更新2024-07-13 收录
下载链接:
https://summit.sfu.ca/item/35643
下载链接
链接失效反馈官方服务:
资源简介:
The SFU Opinion and Comments Corpus (SOCC) is a corpus for the analysis of online news comments. Our corpus contains comments and the articles from which the comments originated. The articles are all opinion articles, not hard news articles. The corpus is larger than any other currently available comments corpora, and has been collected with attention to preserving reply structures and other metadata. In addition to the raw corpus, we also present annotations for four different phenomena: constructiveness, toxicity, negation and its scope, and appraisal. The data is divided into two main parts: raw data and annotated data. The raw data contains three CSVs: gnm_artcles.csv, gnm_comments.csv, and gnm_comment_threads.csv. The annotated data contains annotations for constructiveness, negation, and appraisal. The details of our different corpora and how to use them are on the following GitHub page. https://github.com/sfu-discourse-lab/SOCC/blob/master/README.md
提供机构:
Simon Fraser University
创建时间:
2018-01-26
搜集汇总
数据集介绍

背景与挑战
背景概述
SFU Opinion and Comments Corpus (SOCC) 是一个用于在线新闻评论分析的大规模语料库,包含观点文章及其相关评论,并保留了评论的回复结构和元数据。该数据集不仅提供原始数据(如文章、评论和线程的CSV文件),还包含对建设性、毒性、否定及评价等四种语言现象的注释,适用于新闻话语、情感分析等研究领域。
以上内容由遇见数据集搜集并总结生成



