DavidLanz/alpaca-tw-input-output-52k
收藏Hugging Face2023-09-25 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/DavidLanz/alpaca-tw-input-output-52k
下载链接
链接失效反馈官方服务:
资源简介:
---
task_categories:
- text-generation
- conversational
- question-answering
language:
- en
size_categories:
- 10K<n<100K
license: cc-by-4.0
tags:
- gpt3
- alpaca
- instruction-finetuning
---
# Dataset Card for "alpaca-tw-input-output-52k"
This dataset contains English Instruction-Following generated by GPT-3.5 using Alpaca prompts for fine-tuning LLMs.
The dataset was originaly shared in this repository: https://github.com/ntunlplab/traditional-chinese-alpaca. This is just a wraper for compatibility with huggingface's datasets library.
## Dataset structure
It contains 52K instruction-following data generated by GPT-3.5 using the same prompts as in Alpaca.
The dataset has the same format as Alpaca data, except the output is generated by GPT-3.5:
- `instruction`: `str`, describes the task the model should perform. Each of the 52K instructions is unique.
- `input`: `str`, optional context or input for the task.
- `output`: `str`, the answer to the instruction as generated by `GPT-3.5`.
## Difference with the original Alpaca dataset
The original Alpaca dataset used text-davinci-003 to complete the prompts. This dataset uses those same prompts, but generating the completions with GPT-3.5. Thus, in general, the responses are of higher quality and lenght. Here is an example:
#### Example from Alpaca-GPT3:
```bash
{
'instruction': '辨識那一個平台與其他不同。',
'input': 'Twitter, Instagram, Telegram',
'output': '在Twitter、Instagram和Telegram之間,Telegram是與其他兩者最不同的平台。'
}
```
## Licensing Information
The dataset is available under the [Creative Commons NonCommercial (CC BY-NC 4.0)](https://creativecommons.org/licenses/by-nc/4.0/legalcode).
提供机构:
DavidLanz
原始信息汇总
数据集卡片 "alpaca-tw-input-output-52k"
概述
该数据集包含由GPT-3.5生成的英语指令遵循数据,使用Alpaca提示进行LLM微调。
数据集结构
数据集包含52K由GPT-3.5生成的指令遵循数据,使用与Alpaca相同的提示。数据格式与Alpaca数据相同,除了输出是由GPT-3.5生成的:
instruction:str, 描述模型应执行的任务。52K条指令每条都是唯一的。input:str, 任务的可选上下文或输入。output:str, GPT-3.5生成的指令答案。
与原始Alpaca数据集的区别
原始Alpaca数据集使用text-davinci-003完成提示。该数据集使用相同的提示,但使用GPT-3.5生成完成。因此,总体上,响应的质量和长度更高。



