HiTZ/Multilingual-Opinion-Target-Extraction

Name: HiTZ/Multilingual-Opinion-Target-Extraction
Creator: HiTZ
Published: 2023-11-22 13:32:07
License: 暂无描述

Hugging Face2023-11-22 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/HiTZ/Multilingual-Opinion-Target-Extraction

下载链接

链接失效反馈

官方服务：

资源简介：

--- arxiv: 2210.12623 paperswithcode_id: aspect-based-sentiment-analysis license: apache-2.0 configs: - config_name: en data_files: - split: train path: en.ote.train.json - split: test path: en.ote.test.json - config_name: es data_files: - split: train path: es.ote.train.json - split: test path: es.ote.test.json - config_name: fr data_files: - split: train path: fr.ote.train.json - split: test path: fr.ote.test.json - config_name: ru data_files: - split: train path: ru.ote.train.json - split: test path: ru.ote.test.json - config_name: tr data_files: - split: train path: tr.ote.train.json task_categories: - token-classification language: - en - fr - es - ru - tr tags: - opinion - target - absa - aspect - sentiment analysis pretty_name: Multilingual Opinion Target Extraction size_categories: - 1K<n<10K --- This repository contains the English '[SemEval-2014 Task 4: Aspect Based Sentiment Analysis](https://aclanthology.org/S14-2004/)'. translated with DeepL into Spanish, French, Russian, and Turkish. The **labels have been manually projected**. For more details, read this paper: [Model and Data Transfer for Cross-Lingual Sequence Labelling in Zero-Resource Settings](https://arxiv.org/abs/2210.12623). **Intended Usage**: Since the datasets are parallel across languages, they are ideal for evaluating annotation projection algorithms, such as [T-Projection](https://arxiv.org/abs/2212.10548). # Label Dictionary ```python { "O": 0, "B-TARGET": 1, "I-TARGET": 2 } ``` # Cication If you use this data, please cite the following papers: ```bibtex @inproceedings{garcia-ferrero-etal-2022-model, title = "Model and Data Transfer for Cross-Lingual Sequence Labelling in Zero-Resource Settings", author = "Garc{\'\i}a-Ferrero, Iker and Agerri, Rodrigo and Rigau, German", editor = "Goldberg, Yoav and Kozareva, Zornitsa and Zhang, Yue", booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2022", month = dec, year = "2022", address = "Abu Dhabi, United Arab Emirates", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2022.findings-emnlp.478", doi = "10.18653/v1/2022.findings-emnlp.478", pages = "6403--6416", abstract = "Zero-resource cross-lingual transfer approaches aim to apply supervised modelsfrom a source language to unlabelled target languages. In this paper we performan in-depth study of the two main techniques employed so far for cross-lingualzero-resource sequence labelling, based either on data or model transfer. Although previous research has proposed translation and annotation projection(data-based cross-lingual transfer) as an effective technique for cross-lingualsequence labelling, in this paper we experimentally demonstrate that highcapacity multilingual language models applied in a zero-shot (model-basedcross-lingual transfer) setting consistently outperform data-basedcross-lingual transfer approaches. A detailed analysis of our results suggeststhat this might be due to important differences in language use. Morespecifically, machine translation often generates a textual signal which isdifferent to what the models are exposed to when using gold standard data,which affects both the fine-tuning and evaluation processes. Our results alsoindicate that data-based cross-lingual transfer approaches remain a competitiveoption when high-capacity multilingual language models are not available.", } @inproceedings{pontiki-etal-2014-semeval, title = "{S}em{E}val-2014 Task 4: Aspect Based Sentiment Analysis", author = "Pontiki, Maria and Galanis, Dimitris and Pavlopoulos, John and Papageorgiou, Harris and Androutsopoulos, Ion and Manandhar, Suresh", editor = "Nakov, Preslav and Zesch, Torsten", booktitle = "Proceedings of the 8th International Workshop on Semantic Evaluation ({S}em{E}val 2014)", month = aug, year = "2014", address = "Dublin, Ireland", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/S14-2004", doi = "10.3115/v1/S14-2004", pages = "27--35", } ```

提供机构：

HiTZ

原始信息汇总

数据集概述

基本信息

arxiv: 2210.12623
paperswithcode_id: aspect-based-sentiment-analysis
license: apache-2.0
pretty_name: Multilingual Opinion Target Extraction
size_categories: 1K<n<10K
task_categories: token-classification
language: en, fr, es, ru, tr
tags: opinion, target, absa, aspect, sentiment analysis

数据文件配置

config_name: en
- train: en.ote.train.json
- test: en.ote.test.json
config_name: es
- train: es.ote.train.json
- test: es.ote.test.json
config_name: fr
- train: fr.ote.train.json
- test: fr.ote.test.json
config_name: ru
- train: ru.ote.train.json
- test: ru.ote.test.json
config_name: tr
- train: tr.ote.train.json

标签字典

python { "O": 0, "B-TARGET": 1, "I-TARGET": 2 }

5,000+

优质数据集

54 个

任务类型

进入经典数据集