el2e10/aya-parapharse-punjabi

Name: el2e10/aya-parapharse-punjabi
Creator: el2e10
Published: 2024-01-26 14:13:19
License: 暂无描述

Hugging Face2024-01-26 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/el2e10/aya-parapharse-punjabi

下载链接

链接失效反馈

官方服务：

资源简介：

--- language: - pa license: cc size_categories: - n<1K source_datasets: - extended|ai4bharat/IndicXParaphrase task_categories: - text-generation pretty_name: Aya Paraphrase Punjabi dataset_info: features: - name: inputs dtype: string - name: targets dtype: string - name: template_lang dtype: string - name: template_id dtype: int64 splits: - name: train num_bytes: 629535 num_examples: 1001 download_size: 230066 dataset_size: 629535 configs: - config_name: default data_files: - split: train path: data/train-* --- ### Description This dataset is derived from the already existing dataset made by AI4Bharat. We have used the [IndicXParaphrase](https://huggingface.co/datasets/ai4bharat/IndicXParaphrase) dataset of AI4Bharat to create this instruction style dataset. We have used the malayalam split of the above mentioned dataset to create this one. This was created as part of [Aya Open Science Initiative](https://sites.google.com/cohere.com/aya-en/home) from Cohere For AI. IndicXParaphrase is multilingual, and n-way parallel dataset for paraphrase detection in 10 Indic languages. The original dataset(IndicXParaphrase) was made available under the cc-0 license. ### Template The following templates(Punjabi) where used for converting the original dataset: ``` #Template 1 prompt: ਵੱਖ-ਵੱਖ ਸ਼ਬਦਾਂ ਦੀ ਵਰਤੋਂ ਕਰਕੇ ਹੇਠਾਂ ਦਿੱਤੇ ਵਾਕ ਨੂੰ ਲਿਖੋ: "{original_sentence}" completion: {paraphrased_sentence} ``` ``` #Template 2 prompt: ਨਿਮਨਲਿਖਤ ਵਾਕ ਨੂੰ ਵੱਖਰੇ ਤਰੀਕੇ ਨਾਲ ਦੁਬਾਰਾ ਲਿਖੋ: "{original_sentence}" completion: {paraphrased_sentence} ``` ``` #Template 3 prompt: ਹੇਠਾਂ ਦਿੱਤੇ ਵਾਕ ਨੂੰ ਸਮਝਾਓ: "{original_sentence}" completion: {paraphrased_sentence} ``` ### Acknowledgement Thank you, Amarjit for helping with the preparation of this dataset by providing the Punjabi translation of the above mentioned English prompts.

提供机构：

el2e10

原始信息汇总

数据集概述

基本信息

语言: 旁遮普语 (pa)
许可证: cc
数据集大小分类: n<1K
源数据集: 扩展自 ai4bharat/IndicXParaphrase
任务类别: 文本生成
数据集名称: Aya Paraphrase Punjabi

数据集结构

特征:
- inputs: 字符串类型
- targets: 字符串类型
- template_lang: 字符串类型
- template_id: 64位整数类型
分割:
- train:
  - 字节数: 629535
  - 样本数: 1001
下载大小: 230066 字节
数据集大小: 629535 字节

配置

配置名称: default
数据文件:
- train: 路径为 data/train-*

模板

模板 1:
- 提示: ਵੱਖ-ਵੱਖ ਸ਼ਬਦਾਂ ਦੀ ਵਰਤੋਂ ਕਰਕੇ ਹੇਠਾਂ ਦਿੱਤੇ ਵਾਕ ਨੂੰ ਲਿਖੋ: "{original_sentence}"
- 完成: {paraphrased_sentence}
模板 2:
- 提示: ਨਿਮਨਲਿਖਤ ਵਾਕ ਨੂੰ ਵੱਖਰੇ ਤਰੀਕੇ ਨਾਲ ਦੁਬਾਰਾ ਲਿਖੋ: "{original_sentence}"
- 完成: {paraphrased_sentence}
模板 3:
- 提示: ਹੇਠਾਂ ਦਿੱਤੇ ਵਾਕ ਨੂੰ ਸਮਝਾਓ: "{original_sentence}"
- 完成: {paraphrased_sentence}

5,000+

优质数据集

54 个

任务类型

进入经典数据集