mteb/PAC

Name: mteb/PAC
Creator: mteb
Published: 2025-05-07 20:08:56
License: 暂无描述

Hugging Face2025-05-07 更新2025-05-31 收录

下载链接：

https://hf-mirror.com/datasets/mteb/PAC

下载链接

链接失效反馈

官方服务：

资源简介：

PAC（Polish Paraphrase Corpus）是一个波兰语的文本数据集，用于文本分类任务。该数据集包含法律和书面领域的文本，分为训练集、测试集和验证集。它是一个单语言数据集，遵循cc-by-nc-sa-4.0许可证。数据集的统计信息显示，测试集有3453个样本，平均文本长度约为185.28个字符；训练集有4284个样本，平均文本长度也约为185.28个字符。数据集的标签有两个类别。

PAC (Polish Paraphrase Corpus) is a Polish language text dataset for text classification tasks. The dataset contains texts from legal and written domains and is split into training, testing, and validation sets. It is a monolingual dataset licensed under cc-by-nc-sa-4.0. The dataset statistics show that the test set has 3453 samples with an average text length of about 185.28 characters; the training set has 4284 samples with an average text length of about 185.28 characters as well. The dataset has two label categories.

提供机构：

mteb

5,000+

优质数据集

54 个

任务类型

进入经典数据集