el2e10/aya-paraphrase-marathi

Name: el2e10/aya-paraphrase-marathi
Creator: el2e10
Published: 2024-01-26 14:13:43
License: 暂无描述

Hugging Face2024-01-26 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/el2e10/aya-paraphrase-marathi

下载链接

链接失效反馈

官方服务：

资源简介：

--- language: - mr license: cc size_categories: - n<1K source_datasets: - extended|ai4bharat/IndicXParaphrase task_categories: - text-generation pretty_name: Aya Paraphrase Marathi dataset_info: features: - name: inputs dtype: string - name: targets dtype: string - name: template_lang dtype: string - name: template_id dtype: int64 splits: - name: train num_bytes: 683937 num_examples: 1001 download_size: 245473 dataset_size: 683937 configs: - config_name: default data_files: - split: train path: data/train-* --- ### Description This dataset is derived from the already existing dataset made by AI4Bharat. We have used the [IndicXParaphrase](https://huggingface.co/datasets/ai4bharat/IndicXParaphrase) dataset of AI4Bharat to create this instruction style dataset. We have used the malayalam split of the above mentioned dataset to create this one. This was created as part of [Aya Open Science Initiative](https://sites.google.com/cohere.com/aya-en/home) from Cohere For AI. IndicXParaphrase is multilingual, and n-way parallel dataset for paraphrase detection in 10 Indic languages. The original dataset(IndicXParaphrase) was made available under the cc-0 license. ### Template The following templates(Marathi) where used for converting the original dataset: ``` #Template 1 prompt: खालील वाक्य दुसरे-भिन्न शब्द वापरून लिहा: "{original_sentence}" completion: {paraphrased_sentence} ``` ``` #Template 2 prompt: खालील वाक्य वेगळ्या प्रकारे पुन्हा लिहा: "{original_sentence}" completion: {paraphrased_sentence} ``` ``` #Template 3 prompt: खालील वाक्य दुसरे शब्द वापरून रूपांतरित-अनुवादित करा: "{original_sentence}" completion: {paraphrased_sentence} ``` ### Acknowledgement Thank you, Yogesh Haribhau Kulkarni for helping with the preparation of this dataset by providing the Marathi translation of the above mentioned English prompts.

提供机构：

el2e10

原始信息汇总

数据集概述

基本信息

语言: 马拉地语 (mr)
许可证: cc
数据集大小分类: n<1K
源数据集: 扩展自 ai4bharat/IndicXParaphrase
任务类别: 文本生成
数据集名称: Aya Paraphrase Marathi

数据集结构

特征:
- inputs: 字符串类型
- targets: 字符串类型
- template_lang: 字符串类型
- template_id: 整数类型 (int64)
分割:
- train:
  - 字节数: 683937
  - 样本数: 1001
下载大小: 245473
数据集大小: 683937

配置

配置名称: default
- 数据文件:
  - 分割: train
  - 路径: data/train-*

描述

该数据集源自AI4Bharat已有的IndicXParaphrase数据集，用于创建指令风格的马拉地语数据集。使用了IndicXParaphrase数据集的马拉雅拉姆语部分。

模板

以下模板用于转换原始数据集：

模板1

prompt: खालील वाक्य दुसरे-भिन्न शब्द वापरून लिहा: "{original_sentence}"

completion: {paraphrased_sentence}

模板2

prompt: खालील वाक्य वेगळ्या प्रकारे पुन्हा लिहा: "{original_sentence}"

completion: {paraphrased_sentence}

模板3

prompt: खालील वाक्य दुसरे शब्द वापरून रूपांतरित-अनुवादित करा: "{original_sentence}"

completion: {paraphrased_sentence}

致谢

感谢Yogesh Haribhau Kulkarni在准备此数据集时提供上述英文提示的马拉地语翻译。

5,000+

优质数据集

54 个

任务类型

进入经典数据集