five

AStitchInLanguageModels

收藏
arXiv2021-09-10 更新2024-06-21 收录
下载链接:
https://github.com/H-TayyarMadabushi/AStitchInLanguageModels
下载链接
链接失效反馈
官方服务:
资源简介:
AStitchInLanguageModels数据集由谢菲尔德大学计算机科学系创建,包含4558个英语例子和1872个葡萄牙语例子,总计6430个条目。该数据集专注于多词表达(MWEs)的习语性,提供了自然发生的句子及其上下文,以及对MWEs的细粒度分类,包括组合意义、习语意义、专有名词和“元用法”。数据集旨在评估语言模型在检测和表示习语性方面的能力,特别关注零样本、单样本和小样本学习场景。该数据集的应用领域包括情感分析和机器翻译等分类和序列到序列任务,旨在解决语言模型在处理习语性MWEs时的局限性。

The AStitchInLanguageModels dataset was created by the Department of Computer Science at the University of Sheffield. It contains 4,558 English examples and 1,872 Portuguese examples, totaling 6,430 entries. This dataset focuses on the idiomaticity of multi-word expressions (MWEs), providing naturally occurring sentences along with their contexts, as well as fine-grained classifications of MWEs, including compositional meanings, idiomatic meanings, proper nouns, and "meta usage". It aims to evaluate the ability of language models to detect and represent idiomaticity, with particular focus on zero-shot, one-shot, and few-shot learning scenarios. The application domains of this dataset include classification and sequence-to-sequence tasks such as sentiment analysis and machine translation, and it is designed to address the limitations of language models when handling idiomatic MWEs.
提供机构:
谢菲尔德大学计算机科学系
创建时间:
2021-09-10
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作