five

DGurgurov/danish_sa

收藏
Hugging Face2024-05-30 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/DGurgurov/danish_sa
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集包含Isbister等人于2021年提供的丹麦语情感分析数据。

该数据集包含Isbister等人于2021年提供的丹麦语情感分析数据。
提供机构:
DGurgurov
原始信息汇总

数据集概述

数据集名称

Sentiment Analysis Data for the Danish Language

数据集描述

本数据集包含由Isbister等人于2021年发布的情感分析数据。

数据结构

该数据用于改进低资源语言的图知识词嵌入项目。

语言

丹麦语(da)

任务类别

文本分类

许可证

MIT

引用信息

bibtex @inproceedings{isbister-etal-2021-stop, title = "Should we Stop Training More Monolingual Models, and Simply Use Machine Translation Instead?", author = "Isbister, Tim and Carlsson, Fredrik and Sahlgren, Magnus", editor = "Dobnik, Simon and {O}vrelid, Lilja", booktitle = "Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa)", month = may # " 31--2 " # jun, year = "2021", address = "Reykjavik, Iceland (Online)", publisher = {Link{"o}ping University Electronic Press, Sweden}, url = "https://aclanthology.org/2021.nodalida-main.42", pages = "385--390", abstract = "Most work in NLP makes the assumption that it is desirable to develop solutions in the native language in question. There is consequently a strong trend towards building native language models even for low-resource languages. This paper questions this development, and explores the idea of simply translating the data into English, thereby enabling the use of pretrained, and large-scale, English language models. We demonstrate empirically that a large English language model coupled with modern machine translation outperforms native language models in most Scandinavian languages. The exception to this is Finnish, which we assume is due to inferior translation quality. Our results suggest that machine translation is a mature technology, which raises a serious counter-argument for training native language models for low-resource languages. This paper therefore strives to make a provocative but important point. As English language models are improving at an unprecedented pace, which in turn improves machine translation, it is from an empirical and environmental stand-point more effective to translate data from low-resource languages into English, than to build language models for such languages.", }

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作