FS150T-Corpus
收藏arXiv2023-07-15 更新2024-08-06 收录
下载链接:
http://arxiv.org/abs/2205.11472v2
下载链接
链接失效反馈官方服务:
资源简介:
FS150T-Corpus是由达姆施塔特工业大学计算机科学系无处不在的知识处理实验室创建的一个新数据集,专注于论据挖掘任务。该数据集包含21,600个样本,覆盖150个争议性话题,每个话题有144个样本。数据集内容来源于多个领域,如政治、技术和经济,旨在通过增加样本多样性来提高模型的鲁棒性。创建过程中,研究团队通过ElasticSearch索引CommonCrawl数据,并利用Amazon Mechanical Turk进行众包标注。该数据集的应用领域主要集中在论据挖掘,旨在解决如何通过精心设计的训练样本和预训练模型来提高任务性能的问题。
The FS150T-Corpus is a novel dataset developed by the Ubiquitous Knowledge Processing Lab (UKP Lab), Department of Computer Science, Technische Universität Darmstadt, specifically for the argument mining task. It contains 21,600 samples across 150 controversial topics, with 144 samples allocated to each topic. Drawn from diverse domains including politics, technology and economics, this dataset aims to improve model robustness by enhancing sample diversity. During its construction, the research team indexed CommonCrawl data using ElasticSearch and conducted crowdsourced annotation via Amazon Mechanical Turk. The primary application scope of the FS150T-Corpus lies in argument mining, targeting the challenge of boosting task performance through carefully curated training samples and pre-trained models.
提供机构:
达姆施塔特工业大学计算机科学系无处不在的知识处理实验室
创建时间:
2022-05-24



