NEUDM/arts

Name: NEUDM/arts
Creator: NEUDM
Published: 2023-05-23 17:30:10
License: 暂无描述

Hugging Face2023-05-23 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/NEUDM/arts

下载链接

链接失效反馈

官方服务：

资源简介：

> 上述数据集为ABSA(Aspect-Based Sentiment Analysis)领域数据集，基本形式为从句子中抽取：方面术语、方面类别(术语类别)、术语在上下文中情感极性以及针对该术语的观点词，不同数据集抽取不同的信息，这点在jsonl文件的“instruction”键中有分别提到，在此我将其改造为了生成任务，需要模型按照一定格式生成抽取结果。 #### 以acos数据集中抽取的jsonl文件一条数据举例： ``` { "task_type": "generation", "dataset": "acos", "input": ["the computer has difficulty switching between tablet and computer ."], "output": "[['computer', 'laptop usability', 'negative', 'difficulty']]", "situation": "none", "label": "", "extra": "", "instruction": " Task: Extracting aspect terms and their corresponding aspect categories, sentiment polarities, and opinion words. Input: A sentence Output: A list of 4-tuples, where each tuple contains the extracted aspect term, its aspect category, sentiment polarity, and opinion words (if any). Supplement: \"Null\" means that there is no occurrence in the sentence. Example: Sentence: \"Also it's not a true SSD drive in there but eMMC, which makes a difference.\" Output: [['SSD drive', 'hard_disc operation_performance', 'negative', 'NULL']]' " } ``` > 此处未设置label和extra，在instruction中以如上所示的字符串模板，并给出一个例子进行one-shot，ABSA领域数据集(absa-quad,acos,arts,aste-data-v2,mams,semeval-2014,semeval-2015,semeval-2016,towe)每个数据集对应instruction模板相同，内容有细微不同，且部分数据集存在同一数据集不同数据instruction内容不同的情况。 #### 原始数据集 - 数据[链接](https://github.com/zhijing-jin/ARTS_TestSet) - Paper: [Tasty Burgers, Soggy Fries: Probing Aspect Robustness in Aspect-Based Sentiment Analysis](https://arxiv.org/pdf/2009.07964.pdf) - 说明：原始数据集由laptop和restaurant两个领域的的json数据组成，本次改造我将两个数据集的数据合并并区分为train、validation与test，该数据的提出目的是测试模型鲁棒性，因此在引用该数据集的文章中多是通过在一个领域的数据上训练，在该数据集的另一个领域上测试。 #### 当前SOTA *数据来自[论文](https://arxiv.org/abs/2303.02846)* - 评价指标：macro-averaged F1 - SOTA模型：CVIB - 其他领域数据训练后在restaurant数据集上macro-averaged F1：**70.29** - restaurant数据集上训练并测评的macro-averaged F1：**82.03** - 其他领域训练后在laptop上测评的macro-averaged F1：**69.39** - laptop数据集上训练并测评的macro-averaged F1：**77.53** ) - Paper：[Reducing Spurious Correlations for Aspect-Based Sentiment Analysis with Variational Information Bottleneck and Contrastive Learning](https://arxiv.org/pdf/2303.02846.pdf) - 说明：该论文来自[Google Scholar](https://scholar.google.com/scholar?as_ylo=2023&q=ABSA+ARTS&hl=zh-CN&as_sdt=0,5)检索到的引用ARTS原论文的论文之一，我比较了2023年的一些论文工作后筛选了一个最优指标以及模型。

提供机构：

NEUDM

原始信息汇总

数据集概述

数据集类型

数据集属于ABSA(Aspect-Based Sentiment Analysis)领域。

数据集内容

数据集包含多个子集，如acos, absa-quad, arts, aste-data-v2, mams, semeval-2014, semeval-2015, semeval-2016, towe。
每个子集的数据格式为jsonl文件，记录了方面术语、方面类别、情感极性和观点词。

数据集结构

每条数据包含以下字段：
- task_type: 任务类型，此处为"generation"。
- dataset: 数据集名称。
- input: 输入句子。
- output: 输出结果，格式为四元组列表，包含方面术语、方面类别、情感极性和观点词。
- instruction: 任务说明，详细描述了如何从句子中提取信息。

数据集用途

用于测试模型的鲁棒性，通常在一个领域的数据上训练，在另一个领域的数据上测试。

数据集来源

原始数据集由laptop和restaurant两个领域的json数据组成，后被合并并区分为train、validation与test。

当前SOTA模型

模型名称：CVIB
评价指标：macro-averaged F1
性能指标：
- 在restaurant数据集上训练并测评的macro-averaged F1：82.03
- 在laptop数据集上训练并测评的macro-averaged F1：77.53

5,000+

优质数据集

54 个

任务类型

进入经典数据集