el2e10/aya-paraphrase
收藏Hugging Face2024-02-04 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/el2e10/aya-paraphrase
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc
task_categories:
- text-generation
language:
- ml
- gu
- mr
- hi
- pa
- bn
pretty_name: Aya Paraphrase
size_categories:
- 1K<n<10K
configs:
- config_name: default
data_files:
- split: mal
path: data/mal.parquet
- split: ben
path: data/ben.parquet
- split: guj
path: data/guj.parquet
- split: hin
path: data/hin.parquet
- split: mar
path: data/mar.parquet
- split: pan
path: data/pan.parquet
---
### Description
This dataset is derived from the already existing dataset made by AI4Bharat. We have used the [IndicXParaphrase](https://huggingface.co/datasets/ai4bharat/IndicXParaphrase) dataset of AI4Bharat to create this instruction style dataset.
This was created as part of [Aya Open Science Initiative](https://sites.google.com/cohere.com/aya-en/home) from Cohere For AI.
IndicXParaphrase is multilingual, and n-way parallel dataset for paraphrase detection in 10 Indic languages. The original dataset(IndicXParaphrase) was made available under the cc-0 license.
### Template
The following templates where used for converting the original dataset:
```
#Template 1
prompt:
Write the following sentence using different words: "{original_sentence}"
completion:
{paraphrased_sentence}
```
```
#Template 2
prompt:
Rewrite the following sentence in different way: "{original_sentence}"
completion:
{paraphrased_sentence}
```
```
#Template 3
prompt:
Paraphrase the following sentence:: "{original_sentence}"
completion:
{paraphrased_sentence}
```
### Acknowledgement
Thank you, Jay Patel for helping by providing the Gujarati translations, Amarjit for helping by providing the Punjabi translations,
Yogesh Haribhau Kulkarni for helping by providing the Marathi translations,
Ganesh Jagadeesan for helping by providing the Hindi translations and Tahmid Hossain for helping by providing the Bengali translations of the above mentioned English prompts.
提供机构:
el2e10
原始信息汇总
数据集概述
基本信息
- 许可证: cc
- 任务类别: 文本生成
- 语言: 马拉雅拉姆语 (ml), 古吉拉特语 (gu), 马拉地语 (mr), 印地语 (hi), 旁遮普语 (pa), 孟加拉语 (bn)
- 数据集名称: Aya Paraphrase
- 数据集大小: 1K<n<10K
配置信息
- 配置名称: default
- 数据文件:
- split: mal, 路径: data/mal.parquet
- split: ben, 路径: data/ben.parquet
- split: guj, 路径: data/guj.parquet
- split: hin, 路径: data/hin.parquet
- split: mar, 路径: data/mar.parquet
- split: pan, 路径: data/pan.parquet
描述
该数据集源自AI4Bharat已有的IndicXParaphrase数据集,用于创建指令式数据集。IndicXParaphrase是一个多语言、n-way并行数据集,用于10种印度语言的释义检测。
模板
以下模板用于转换原始数据集: #Template 1 prompt: Write the following sentence using different words: "{original_sentence}"
completion: {paraphrased_sentence}
#Template 2 prompt: Rewrite the following sentence in different way: "{original_sentence}"
completion: {paraphrased_sentence}
#Template 3 prompt: Paraphrase the following sentence:: "{original_sentence}"
completion: {paraphrased_sentence}
致谢
感谢Jay Patel提供古吉拉特语翻译,Amarjit提供旁遮普语翻译,Yogesh Haribhau Kulkarni提供马拉地语翻译,Ganesh Jagadeesan提供印地语翻译,Tahmid Hossain提供孟加拉语翻译。



