CATIE-AQ/paws-x_fr_prompt_paraphrase_detection

Name: CATIE-AQ/paws-x_fr_prompt_paraphrase_detection
Creator: CATIE-AQ
Published: 2025-02-10 15:37:15
License: 暂无描述

Hugging Face2025-02-10 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/CATIE-AQ/paws-x_fr_prompt_paraphrase_detection

下载链接

链接失效反馈

官方服务：

资源简介：

paws-x_fr_prompt_paraphrase_detection是一个专门用于法语段落检测任务的子集，包含了1,174,822条数据。这个数据集是基于原始的paws-x数据集（法语部分）构建的，并且使用了一系列提示语来构建输入和目标列，以匹配xP3数据集的格式。数据集分为训练集、验证集和测试集，分别包含1,086,822、44,000和44,000个样本。

paws-x_fr_prompt_paraphrase_detection is a subset specifically designed for French paraphrase detection tasks, containing 1,174,822 entries. This dataset is built based on the original paws-x dataset (French part) and uses a series of prompts to construct input and target columns to match the format of the xP3 dataset. The dataset is split into training, validation, and test sets, containing 1,086,822, 44,000, and 44,000 samples respectively.

提供机构：

CATIE-AQ

原始信息汇总

数据集概述

paws-x_fr_prompt_paraphrase_detection 是一个专门用于法语释义检测的数据集，包含1,174,822行数据。该数据集是Dataset of French Prompts (DFP)的一个子集，原始数据来自paws-x数据集的法语部分。

数据集特点

语言: 法语
许可证: 其他
大小: 1M<n<10M
任务类别: 文本分类
标签: 释义检测, DFP, 法语提示
多语言性: 单语
源数据集: paws-x

数据集结构

训练集: 1,086,822样本
验证集: 44,000样本
测试集: 44,000样本

使用方法

python from datasets import load_dataset dataset = load_dataset("CATIE-AQ/paws-x_fr_prompt_paraphrase_detection")

引用信息

原始数据

@InProceedings{pawsx2019emnlp, title = {{PAWS-X: A Cross-lingual Adversarial Dataset for Paraphrase Identification}}, author = {Yang, Yinfei and Zhang, Yuan and Tar, Chris and Baldridge, Jason}, booktitle = {Proc. of EMNLP}, year = {2019} }

本数据集

@misc {centre_aquitain_des_technologies_de_linformation_et_electroniques_2023,
author = { {Centre Aquitain des Technologies de lInformation et Electroniques} },
title = { DFP (Revision 1d24c09) },
year = 2023,
url = { https://huggingface.co/datasets/CATIE-AQ/DFP },
doi = { 10.57967/hf/1200 },
publisher = { Hugging Face }
}

许可证说明

该数据集可自由用于任何目的，但使用时需对Google LLC作为数据源表示感谢。数据集按“原样”提供，不提供任何明示或暗示的保证。Google不对使用该数据集可能导致的任何直接或间接损害承担责任。

5,000+

优质数据集

54 个

任务类型

进入经典数据集