Blog Credibility Corpus
收藏arXiv2021-06-21 更新2024-06-21 收录
下载链接:
https://github.com/serenpa/Blog-Credibility-Corpus
下载链接
链接失效反馈官方服务:
资源简介:
Blog Credibility Corpus是由曼彻斯特城市大学计算与数学系的研究团队创建的一个专门用于评估软件实践者博客文章可信度的数据集。该数据集包含234篇来自单一软件实践者博客的文章,总计19996个句子,这些句子被标注了论证和证据信息。数据集的创建过程涉及三位标注者对文章进行详细标注,使用了一系列定义明确的标注标准。该数据集主要用于支持未来研究在自动检测灰色文献可信度方面的应用,特别是在软件工程领域中,帮助识别高质量的实践者博客内容。
Blog Credibility Corpus is a specialized dataset for evaluating the credibility of blog posts written by software practitioners, developed by a research team from the School of Computing and Mathematics at Manchester Metropolitan University. It includes 234 articles sourced from the blog of a single software practitioner, amounting to a total of 19,996 sentences that have been annotated with argumentation and evidence information. The construction of this dataset involved three annotators carrying out detailed annotation work on the articles, utilizing a set of well-defined annotation standards. This corpus is primarily intended to support future research on automatic credibility detection of grey literature, particularly in the field of software engineering, to assist in identifying high-quality practitioner blog content.
提供机构:
曼彻斯特城市大学计算与数学系
创建时间:
2021-06-21



