walkernet/test
收藏数据集卡片 "alpaca-gpt4"
数据集描述
该数据集包含由GPT-4使用Alpaca提示生成的英语指令跟随数据,用于微调大型语言模型(LLMs)。
数据集结构
数据集包含52K由GPT-4使用与Alpaca相同提示生成的指令跟随数据。数据集格式与Alpaca数据相同,除了输出是由GPT-4生成的:
instruction:str,描述模型应执行的任务。52K条指令每条都是唯一的。input:str,任务的上下文或输入(可选)。output:str,由GPT-4生成的指令答案。text:str,将前面的字段连接在一起,并在开头加上与Alpaca相同的提示。
与原始Alpaca数据集的区别
原始Alpaca数据集使用text-davinci-003来完成提示。该数据集使用相同的提示,但使用GPT-4生成完成。因此,总体上,响应的质量和长度更高。
Alpaca-GPT4示例:
bash {instruction: Identify the odd one out., input: Twitter, Instagram, Telegram, output: The odd one out is Telegram. Twitter and Instagram are social media platforms mainly for sharing information, images and videos while Telegram is a cloud-based instant messaging and voice-over-IP service., text: Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
Instruction:
Identify the odd one out.
Input:
Twitter, Instagram, Telegram
Response:
The odd one out is Telegram. Twitter and Instagram are social media platforms mainly for sharing information, images and videos while Telegram is a cloud-based instant messaging and voice-over-IP service.}
原始Alpaca示例:
bash {instruction: Identify the odd one out., input: Twitter, Instagram, Telegram, output: Telegram, text: Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
Instruction:
Identify the odd one out.
Input:
Twitter, Instagram, Telegram
Response:
Telegram}



