five

Argumentation sentences 1.0

收藏
DataCite Commons2026-05-03 更新2024-07-13 收录
下载链接:
https://spraakbanken.gu.se/resurser/argumentation-sentences
下载链接
链接失效反馈
官方服务:
资源简介:
I. IDENTIFYING INFORMATION Title* Argumentation sentences Subtitle A translated corpus for classifying sentence stance in relation to a topic. Created by* Anna Lindahl (anna.lindahl@svenska.gu.se) Publisher(s)* Språkbanken Text (sb-info@svenska.gu.se) Link(s) / permanent identifier(s)* https://spraakbanken.gu.se/en/resources/superlim License(s)* CC BY 4.0 Abstract* Argumentation sentences is a translated corpus for the task of identifying stance in relation to a topic. It consists of sentences labeled with pro, con or non in relation to one of six topics. The original dataset [1] can be found here https://github.com/trtm/AURC. The test set is manually corrected translations, the training set is machine translated. Funded by* Vinnova (grant no. 2021-04165) Cite as Related datasets Part of the SuperLim collection (https://spraakbanken.gu.se/en/resources/superlim) II. USAGE Key applications Machine learning, argumentation mining, stance classification Intended task(s)/usage(s) Evaluate models on the following task: Given a sentence and a topic, determine if the sentence is for, against or neutral in relation to the topic. Recommended evaluation measures Krippendorff’s alpha (the official SuperLim measure), MCC, F Dataset function(s) Training, testing Recommended split(s) Train, dev, test (provided) III. DATA Primary data* Text Language* Swedish Dataset in numbers* 5265 sentences split over 6 topics, 3450 train, 750 dev and 1065 test Nature of the content* Topics: Abortion, Death penalty, Nuclear power, Marijuana legalization, Minimum wage, Cloning. Each topic has a set of associated sentences, lableled with pro, con or non in relation to the topic. Format* Jsonl with the following keys: sentence_id = the id for each sentence, topic = the topic for each sentence, label = the label for each sentence, can be pro, con or non, sentence = the sentence itself Tab-separated with 4 columns: the id for each sentence, topic = the topic for each sentence, label = the label for each sentence, can be pro, con or non, sentence = the sentence itself Data source(s)* The original data comes from the AURC dataset [1] ( https://github.com/trtm/AURC). For this corpus, only the in-domain topics were used. Data collection method(s)* Collected from the Common Crawl archive. See [1] Data selection and filtering* A subset of the original data, only the in-domain topics are used. Data preprocessing* Sentences were machine translated. The test set was then manually corrected. Data labeling* The sentences are labeled with pro, con or non, signifying their stance in relation to a topic. Annotator characteristics IV. ETHICS AND CAVEATS Ethical considerations Things to watch out for V. ABOUT DOCUMENTATION Data last updated* 20221215 Which changes have been made, compared to the previous version* First version Access to previous versions This document created* 20221215 by Anna Lindahl This document last updated* 20220203 by Anna Lindahl Where to look for further details Documentation template version* v1.1 VI. OTHER Related projects References [1] Trautmann, D., Daxenberger, J., Stab, C., Schütze, H., & Gurevych, I. (2020, April). Fine-grained argument unit recognition and classification. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 34, No. 05, pp. 9048-9056).
提供机构:
Språkbanken Text
创建时间:
2024-06-18
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作