ZurichNLP/paws-x-italian

Name: ZurichNLP/paws-x-italian
Creator: ZurichNLP
Published: 2025-12-09 08:26:18
License: 暂无描述

Hugging Face2025-12-09 更新2026-01-03 收录

下载链接：

https://hf-mirror.com/datasets/ZurichNLP/paws-x-italian

下载链接

链接失效反馈

官方服务：

资源简介：

--- language: - it tags: - paraphrase - italian - synthetic dataset_info: features: - name: id dtype: int32 - name: sentence1 dtype: string - name: sentence2 dtype: string - name: label dtype: class_label: names: '0': '0' '1': '1' splits: - name: train num_bytes: 12714604 num_examples: 49401 - name: test num_bytes: 509309 num_examples: 1977 - name: validation num_bytes: 511699 num_examples: 2000 --- [![Paper](https://img.shields.io/badge/📄%20Paper-arXiv%3A2512.07538-B31B1B.svg)](https://arxiv.org/pdf/2512.07538) # PAWS-X Italian Paraphrase Dataset This dataset is a machine-translated Italian version of the English PAWS-X dataset. The original PAWS-X dataset (Yang et al. 2019) is a multilingual version of PAWS (Zhang et al. 2019) for paraphrase identification. ## Dataset Structure ### Data Fields - `sentence1`: First sentence in the pair - `sentence2`: Second sentence in the pair - `labels`: - 0: Non-paraphrases - 1: Paraphrases ### Data Splits The dataset is split into: - Training set - Validation set - Test set This dataset has been created within the _SwissGov-RSD: A Human-annotated, Cross-lingual Benchmark for Token-level Recognition of Semantic Differences Between Related Documents_ project: ```bibtex @misc{wastl2025swissgovrsdhumanannotatedcrosslingualbenchmark, title={SwissGov-RSD: A Human-annotated, Cross-lingual Benchmark for Token-level Recognition of Semantic Differences Between Related Documents}, author={Michelle Wastl and Jannis Vamvas and Rico Sennrich}, year={2025}, eprint={2512.07538}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2512.07538}, } ```

提供机构：

ZurichNLP

5,000+

优质数据集

54 个

任务类型

进入经典数据集