five

HiTZ/Multilingual-Opinion-Target-Extraction

收藏
Hugging Face2023-11-22 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/HiTZ/Multilingual-Opinion-Target-Extraction
下载链接
链接失效反馈
官方服务:
资源简介:
--- arxiv: 2210.12623 paperswithcode_id: aspect-based-sentiment-analysis license: apache-2.0 configs: - config_name: en data_files: - split: train path: en.ote.train.json - split: test path: en.ote.test.json - config_name: es data_files: - split: train path: es.ote.train.json - split: test path: es.ote.test.json - config_name: fr data_files: - split: train path: fr.ote.train.json - split: test path: fr.ote.test.json - config_name: ru data_files: - split: train path: ru.ote.train.json - split: test path: ru.ote.test.json - config_name: tr data_files: - split: train path: tr.ote.train.json task_categories: - token-classification language: - en - fr - es - ru - tr tags: - opinion - target - absa - aspect - sentiment analysis pretty_name: Multilingual Opinion Target Extraction size_categories: - 1K<n<10K --- This repository contains the English '[SemEval-2014 Task 4: Aspect Based Sentiment Analysis](https://aclanthology.org/S14-2004/)'. translated with DeepL into Spanish, French, Russian, and Turkish. The **labels have been manually projected**. For more details, read this paper: [Model and Data Transfer for Cross-Lingual Sequence Labelling in Zero-Resource Settings](https://arxiv.org/abs/2210.12623). **Intended Usage**: Since the datasets are parallel across languages, they are ideal for evaluating annotation projection algorithms, such as [T-Projection](https://arxiv.org/abs/2212.10548). # Label Dictionary ```python { "O": 0, "B-TARGET": 1, "I-TARGET": 2 } ``` # Cication If you use this data, please cite the following papers: ```bibtex @inproceedings{garcia-ferrero-etal-2022-model, title = "Model and Data Transfer for Cross-Lingual Sequence Labelling in Zero-Resource Settings", author = "Garc{\'\i}a-Ferrero, Iker and Agerri, Rodrigo and Rigau, German", editor = "Goldberg, Yoav and Kozareva, Zornitsa and Zhang, Yue", booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2022", month = dec, year = "2022", address = "Abu Dhabi, United Arab Emirates", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2022.findings-emnlp.478", doi = "10.18653/v1/2022.findings-emnlp.478", pages = "6403--6416", abstract = "Zero-resource cross-lingual transfer approaches aim to apply supervised modelsfrom a source language to unlabelled target languages. In this paper we performan in-depth study of the two main techniques employed so far for cross-lingualzero-resource sequence labelling, based either on data or model transfer. Although previous research has proposed translation and annotation projection(data-based cross-lingual transfer) as an effective technique for cross-lingualsequence labelling, in this paper we experimentally demonstrate that highcapacity multilingual language models applied in a zero-shot (model-basedcross-lingual transfer) setting consistently outperform data-basedcross-lingual transfer approaches. A detailed analysis of our results suggeststhat this might be due to important differences in language use. Morespecifically, machine translation often generates a textual signal which isdifferent to what the models are exposed to when using gold standard data,which affects both the fine-tuning and evaluation processes. Our results alsoindicate that data-based cross-lingual transfer approaches remain a competitiveoption when high-capacity multilingual language models are not available.", } @inproceedings{pontiki-etal-2014-semeval, title = "{S}em{E}val-2014 Task 4: Aspect Based Sentiment Analysis", author = "Pontiki, Maria and Galanis, Dimitris and Pavlopoulos, John and Papageorgiou, Harris and Androutsopoulos, Ion and Manandhar, Suresh", editor = "Nakov, Preslav and Zesch, Torsten", booktitle = "Proceedings of the 8th International Workshop on Semantic Evaluation ({S}em{E}val 2014)", month = aug, year = "2014", address = "Dublin, Ireland", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/S14-2004", doi = "10.3115/v1/S14-2004", pages = "27--35", } ```
提供机构:
HiTZ
原始信息汇总

数据集概述

基本信息

  • arxiv: 2210.12623
  • paperswithcode_id: aspect-based-sentiment-analysis
  • license: apache-2.0
  • pretty_name: Multilingual Opinion Target Extraction
  • size_categories: 1K<n<10K
  • task_categories: token-classification
  • language: en, fr, es, ru, tr
  • tags: opinion, target, absa, aspect, sentiment analysis

数据文件配置

  • config_name: en
    • train: en.ote.train.json
    • test: en.ote.test.json
  • config_name: es
    • train: es.ote.train.json
    • test: es.ote.test.json
  • config_name: fr
    • train: fr.ote.train.json
    • test: fr.ote.test.json
  • config_name: ru
    • train: ru.ote.train.json
    • test: ru.ote.test.json
  • config_name: tr
    • train: tr.ote.train.json

标签字典

python { "O": 0, "B-TARGET": 1, "I-TARGET": 2 }

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作