five

PAWS-X

收藏
arXiv2019-08-31 更新2024-06-21 收录
下载链接:
https://github.com/google-research-datasets/paws
下载链接
链接失效反馈
官方服务:
资源简介:
PAWS-X数据集由谷歌研究团队开发,包含23,659个人工翻译的跨语言对抗性语料对,涵盖法语、西班牙语、德语、中文、日语和韩语六种语言。数据集旨在通过高词汇重叠的语料对,测试模型对句子结构和上下文的理解能力。创建过程中,使用了人工翻译和神经机器翻译技术,确保数据质量。PAWS-X数据集的应用领域主要集中在推动多语言研究,特别是在解决语义理解和上下文敏感性问题方面。

The PAWS-X dataset, developed by the Google Research team, consists of 23,659 manually translated cross-lingual adversarial sentence pairs covering six languages: French, Spanish, German, Chinese, Japanese, and Korean. This dataset is designed to test models' ability to understand sentence structure and contextual information via sentence pairs with high lexical overlap. During its development, both manual translation and neural machine translation techniques were employed to ensure data quality. The PAWS-X dataset is mainly applied to advancing multilingual research, especially in addressing issues related to semantic understanding and contextual sensitivity.
提供机构:
谷歌研究,山景城
创建时间:
2019-08-31
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作