five

mfromm/AMSR

收藏
Hugging Face2023-04-12 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/mfromm/AMSR
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: openrail task_categories: - text-classification language: - en tags: - argument-mining - argument-identification pretty_name: AMSR size_categories: - 1K<n<10K --- Argument Mining in Scientific Reviews (AMSR) We release a new dataset of peer-reviews from different computer science conferences with annotated arguments, called AMSR (**A**rgument **M**ining in **S**cientific **R**eviews). 1. Raw Data conferences_raw/ contains directories for each conference we scraped (e.g., [iclr20](./data/iclr20)). The respective directory of each conference comprises multiple `*.json` files, where every file contains the information belonging to a single paper, such as the title, the abstract, the submission date and the reviews. The reviews are stored in a list called `"review_content"`. 2. Cleaned Data conferences_cleaned/ contains reviews and papers where we removed all unwated character sequences from the reviews. For details on the details of the preprocessing steps, please refer to our paper "Argument Mining Driven Analysis of Peer-Reviews". 3. Annotated Data conferences_annotated/ contains sentence_level and token_level data of 77 reviews, annotated each by 3 annotators. We have three labels: PRO - Arguments supporting the acceptance of the paper. CON - Arguments opposing the acceptance of the paper. NON - Non-argumentative sentences/tokens which have no influence on the acceptance of the paper. And following we have three tasks: Argumentation Detection: A binary classification of whether a text span is an argument. The classes are denoted by ARG and NON, where ARG is the union of PRO and CON classes. Stance Detection: A binary classification whether an argumentative text span is supporting or opposing the paper acceptance. he model is trained and evaluated only on argumentative PRO and CON text spans. Joint Detection: A multi-class classification between the classes PRO, CON and NON, i.e. the combination of argumentation and stance detection. 4. Generalization across Conferences conferences_annotated_generalization/ contains token_level data of 77 reviews split diffrently than in 3. We studied the model’s generalization to peer-reviews for papers from other (sub)domains. To this end, wereduce the test set to only contain reviews from the GI’20conference. The focus of the GI’20 conference is ComputerGraphics and Human-Computer Interaction, while the otherconferences are focused on Representation Learning, AI andMedical Imaging. We consider the GI’20 as a subdomain since all conferences are from the domain of computer science. NO-GI: The original training dataset with all sentences from reviews of GI’20 removed. ALL A resampling of the original training dataset of the same size as NO-GI, with sentences from all conferences. 5. jupyter-Notebook ReviewStat is a jupyter notebook, which shows interesting statistics of the raw dataset.
提供机构:
mfromm
原始信息汇总

Argument Mining in Scientific Reviews (AMSR) 数据集概述

基本信息

  • 名称: AMSR (Argument Mining in Scientific Reviews)
  • 语言: 英语 (en)
  • 许可证: openrail
  • 任务类别: 文本分类
  • 标签: 论证挖掘, 论证识别
  • 数据集大小: 1K<n<10K

数据集内容

  1. 原始数据: 包含多个计算机科学会议的评审数据,存储为JSON格式,每个文件包含单篇论文的相关信息,如标题、摘要、提交日期和评审内容。
  2. 清洗后数据: 移除了评审中的不必要字符序列,详细预处理步骤参考论文 "Argument Mining Driven Analysis of Peer-Reviews"。
  3. 标注数据: 包含77篇评审的句子和词级别数据,由3位标注者标注,分为PRO(支持论文接受)、CON(反对论文接受)和NON(非论证性内容)三个标签。
  4. 跨会议泛化数据: 研究模型对其他(子)领域论文评审的泛化能力,特别关注GI’20会议的评审数据。
  5. jupyter Notebook: ReviewStat,展示原始数据集的有趣统计信息。

任务描述

  • 论证检测: 二分类任务,判断文本片段是否为论证,分为ARG(PRO和CON的联合)和NON两类。
  • 立场检测: 二分类任务,判断论证文本片段是支持还是反对论文接受。
  • 联合检测: 多分类任务,结合论证和立场检测,分为PRO、CON和NON三类。
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作