alighasemi/fa-paraphrase
收藏Hugging Face2022-12-07 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/alighasemi/fa-paraphrase
下载链接
链接失效反馈官方服务:
资源简介:
---
Tasks:
- Text2Text Generation
Fine-Grained Tasks:
- paraphrase
- query-paraphrasing
Languages:
- Persian
Multilinguality:
- monolingual
- fa
- fa-IR
Sizes:
- n>1M
dataset_info:
features:
- name: sentence1
dtype: string
- name: sentence2
dtype: string
splits:
- name: train
num_bytes: 139373682.4
num_examples: 881408
- name: test
num_bytes: 17421710.3
num_examples: 110176
- name: validation
num_bytes: 17421710.3
num_examples: 110176
download_size: 98032993
dataset_size: 174217103.00000003
---
# Dataset Card for "fa-paraphrase"
This dataset contains over 1.1 million rows. Each row contains a pair of Farsi sentences which are a paraphrase of each other. The datasets used to create this dataset can be found here:
* [tapaco](https://huggingface.co/datasets/tapaco)
* [kaggle](https://www.kaggle.com/datasets/armannikkhah/persian-paraphrase-dataset)
[More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
提供机构:
alighasemi
原始信息汇总
数据集概述
基本信息
- 任务类型:Text2Text Generation
- 细粒度任务:
- paraphrase
- query-paraphrasing
- 语言:Persian
- 多语言性:monolingual, fa, fa-IR
- 数据集大小:n>1M
数据集结构
- 特征:
- sentence1:string
- sentence2:string
- 分割:
- 训练集:
- 大小:139373682.4 bytes
- 示例数:881408
- 测试集:
- 大小:17421710.3 bytes
- 示例数:110176
- 验证集:
- 大小:17421710.3 bytes
- 示例数:110176
- 训练集:
- 下载大小:98032993 bytes
- 数据集总大小:174217103.00000003 bytes
数据集内容
- 包含超过110万行数据,每行包含一对波斯语句子,这些句子是彼此的释义。



