el2e10/aya-paraphrase

Name: el2e10/aya-paraphrase
Creator: el2e10
Published: 2024-02-04 10:15:11
License: 暂无描述

Hugging Face2024-02-04 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/el2e10/aya-paraphrase

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc task_categories: - text-generation language: - ml - gu - mr - hi - pa - bn pretty_name: Aya Paraphrase size_categories: - 1K<n<10K configs: - config_name: default data_files: - split: mal path: data/mal.parquet - split: ben path: data/ben.parquet - split: guj path: data/guj.parquet - split: hin path: data/hin.parquet - split: mar path: data/mar.parquet - split: pan path: data/pan.parquet --- ### Description This dataset is derived from the already existing dataset made by AI4Bharat. We have used the [IndicXParaphrase](https://huggingface.co/datasets/ai4bharat/IndicXParaphrase) dataset of AI4Bharat to create this instruction style dataset. This was created as part of [Aya Open Science Initiative](https://sites.google.com/cohere.com/aya-en/home) from Cohere For AI. IndicXParaphrase is multilingual, and n-way parallel dataset for paraphrase detection in 10 Indic languages. The original dataset(IndicXParaphrase) was made available under the cc-0 license. ### Template The following templates where used for converting the original dataset: ``` #Template 1 prompt: Write the following sentence using different words: "{original_sentence}" completion: {paraphrased_sentence} ``` ``` #Template 2 prompt: Rewrite the following sentence in different way: "{original_sentence}" completion: {paraphrased_sentence} ``` ``` #Template 3 prompt: Paraphrase the following sentence:: "{original_sentence}" completion: {paraphrased_sentence} ``` ### Acknowledgement Thank you, Jay Patel for helping by providing the Gujarati translations, Amarjit for helping by providing the Punjabi translations, Yogesh Haribhau Kulkarni for helping by providing the Marathi translations, Ganesh Jagadeesan for helping by providing the Hindi translations and Tahmid Hossain for helping by providing the Bengali translations of the above mentioned English prompts.

提供机构：

el2e10

原始信息汇总

数据集概述

基本信息

许可证: cc
任务类别: 文本生成
语言: 马拉雅拉姆语 (ml), 古吉拉特语 (gu), 马拉地语 (mr), 印地语 (hi), 旁遮普语 (pa), 孟加拉语 (bn)
数据集名称: Aya Paraphrase
数据集大小: 1K<n<10K

配置信息

配置名称: default
数据文件:
- split: mal, 路径: data/mal.parquet
- split: ben, 路径: data/ben.parquet
- split: guj, 路径: data/guj.parquet
- split: hin, 路径: data/hin.parquet
- split: mar, 路径: data/mar.parquet
- split: pan, 路径: data/pan.parquet

描述

该数据集源自AI4Bharat已有的IndicXParaphrase数据集，用于创建指令式数据集。IndicXParaphrase是一个多语言、n-way并行数据集，用于10种印度语言的释义检测。

模板

以下模板用于转换原始数据集: #Template 1 prompt: Write the following sentence using different words: "{original_sentence}"

completion: {paraphrased_sentence}

#Template 2 prompt: Rewrite the following sentence in different way: "{original_sentence}"

completion: {paraphrased_sentence}

#Template 3 prompt: Paraphrase the following sentence:: "{original_sentence}"

completion: {paraphrased_sentence}

致谢

感谢Jay Patel提供古吉拉特语翻译，Amarjit提供旁遮普语翻译，Yogesh Haribhau Kulkarni提供马拉地语翻译，Ganesh Jagadeesan提供印地语翻译，Tahmid Hossain提供孟加拉语翻译。

5,000+

优质数据集

54 个

任务类型

进入经典数据集