five

ZurichNLP/paws-x-italian

收藏
Hugging Face2025-12-09 更新2026-01-03 收录
下载链接:
https://hf-mirror.com/datasets/ZurichNLP/paws-x-italian
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - it tags: - paraphrase - italian - synthetic dataset_info: features: - name: id dtype: int32 - name: sentence1 dtype: string - name: sentence2 dtype: string - name: label dtype: class_label: names: '0': '0' '1': '1' splits: - name: train num_bytes: 12714604 num_examples: 49401 - name: test num_bytes: 509309 num_examples: 1977 - name: validation num_bytes: 511699 num_examples: 2000 --- [![Paper](https://img.shields.io/badge/📄%20Paper-arXiv%3A2512.07538-B31B1B.svg)](https://arxiv.org/pdf/2512.07538) # PAWS-X Italian Paraphrase Dataset This dataset is a machine-translated Italian version of the English PAWS-X dataset. The original PAWS-X dataset (Yang et al. 2019) is a multilingual version of PAWS (Zhang et al. 2019) for paraphrase identification. ## Dataset Structure ### Data Fields - `sentence1`: First sentence in the pair - `sentence2`: Second sentence in the pair - `labels`: - 0: Non-paraphrases - 1: Paraphrases ### Data Splits The dataset is split into: - Training set - Validation set - Test set This dataset has been created within the _SwissGov-RSD: A Human-annotated, Cross-lingual Benchmark for Token-level Recognition of Semantic Differences Between Related Documents_ project: ```bibtex @misc{wastl2025swissgovrsdhumanannotatedcrosslingualbenchmark, title={SwissGov-RSD: A Human-annotated, Cross-lingual Benchmark for Token-level Recognition of Semantic Differences Between Related Documents}, author={Michelle Wastl and Jannis Vamvas and Rico Sennrich}, year={2025}, eprint={2512.07538}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2512.07538}, } ```
提供机构:
ZurichNLP
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作