jayshah5696/alpaca-small-gujarati

Name: jayshah5696/alpaca-small-gujarati
Creator: jayshah5696
Published: 2023-12-25 21:40:06
License: 暂无描述

Hugging Face2023-12-25 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/jayshah5696/alpaca-small-gujarati

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc-by-nc-4.0 --- Original data source - [https://huggingface.co/datasets/tatsu-lab/alpaca](https://huggingface.co/datasets/tatsu-lab/alpaca) Used Google Translate API to translate the dataset into Gujarati. ### Data Instances An example of "train" looks as follows: ```json { "instruction": "Identify the odd one out.", "input": "Twitter, Instagram, Telegram", "output": "Telegram", "text": "Below is an instruction that describes a task...", "gujarati_instruction": "વિષમને ઓળખો.", "gujarati_input": "ટ્વિટર, ઇન્સ્ટાગ્રામ, ટેલિગ્રામ", "gujarati_output": "ટેલિગ્રામ" } ``` ### Data Fields The data fields are as follows: * `instruction`: describes the task the model should perform. Each of the 52K instructions is unique. * `input`: optional context or input for the task. For example, when the instruction is "Summarize the following article", the input is the article. Around 40% of the examples have an input. * `output`: the answer to the instruction as generated by `text-davinci-003`. * `text`: the `instruction`, `input` and `output` formatted with the [prompt template](https://github.com/tatsu-lab/stanford_alpaca#data-release) used by the authors for fine-tuning their models. * `gujarati_instruction`: Gujarati translation of the instruction * `gujarati_input`: Gujarati translation of the input * `gujarati_output`: Gujarati translation of the output ### Data Splits | | train | |---------------|------:| | alpaca | 88 |

提供机构：

jayshah5696

原始信息汇总

数据实例

一个 "train" 的示例如下：

json { "instruction": "Identify the odd one out.", "input": "Twitter, Instagram, Telegram", "output": "Telegram", "text": "Below is an instruction that describes a task...", "gujarati_instruction": "વિષમને ઓળખો.", "gujarati_input": "ટ્વિટર, ઇન્સ્ટાગ્રામ, ટેલિગ્રામ", "gujarati_output": "ટેલિગ્રામ" }

数据字段

数据字段如下：

instruction: 描述模型应执行的任务。52K条指令中的每一条都是唯一的。
input: 任务的上下文或输入（可选）。例如，当指令是 "Summarize the following article" 时，输入是文章。大约40%的示例有输入。
output: text-davinci-003 生成的指令答案。
text: 使用作者用于微调模型的提示模板格式化的 instruction, input 和 output。
gujarati_instruction: 指令的古吉拉特语翻译
gujarati_input: 输入的古吉拉特语翻译
gujarati_output: 输出的古吉拉特语翻译

数据分割

	train
alpaca	88

5,000+

优质数据集

54 个

任务类型

进入经典数据集