five

mteb/PAC

收藏
Hugging Face2025-05-07 更新2025-05-31 收录
下载链接:
https://hf-mirror.com/datasets/mteb/PAC
下载链接
链接失效反馈
官方服务:
资源简介:
PAC(Polish Paraphrase Corpus)是一个波兰语的文本数据集,用于文本分类任务。该数据集包含法律和书面领域的文本,分为训练集、测试集和验证集。它是一个单语言数据集,遵循cc-by-nc-sa-4.0许可证。数据集的统计信息显示,测试集有3453个样本,平均文本长度约为185.28个字符;训练集有4284个样本,平均文本长度也约为185.28个字符。数据集的标签有两个类别。

PAC (Polish Paraphrase Corpus) is a Polish language text dataset for text classification tasks. The dataset contains texts from legal and written domains and is split into training, testing, and validation sets. It is a monolingual dataset licensed under cc-by-nc-sa-4.0. The dataset statistics show that the test set has 3453 samples with an average text length of about 185.28 characters; the training set has 4284 samples with an average text length of about 185.28 characters as well. The dataset has two label categories.
提供机构:
mteb
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作