whitefox44/AlpacaGPT3.5Customized

Name: whitefox44/AlpacaGPT3.5Customized
Creator: whitefox44
Published: 2023-04-02 18:17:21
License: 暂无描述

Hugging Face2023-04-02 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/whitefox44/AlpacaGPT3.5Customized

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: apache-2.0 --- # Alpaca-like Model Training Dataset Generated from GPT-3.5 This dataset contains 56,000+ samples generated from GPT-3.5 and extreme content filtering removed and replaced with relevant generated outputs. Specifically designed for training Alpaca-like models. Each sample includes an instruction, an optional input text, and an output text. ## Dataset Description The dataset is a collection of diverse tasks and prompts that are suited for training and fine-tuning Alpaca-like language models. It includes a wide variety of tasks such as text summarization, question-answering, translation, and more. The dataset has been generated using GPT-3.5, which provides high-quality samples. ### Data Format Each sample in the dataset is represented as a JSON object with the following structure: ```json { "instruction": "string", "input": "string", "output": "string" } instruction: A text string that provides the instruction for the task. It can be a question, a prompt, or a description of the desired output. input: An optional text string that provides the input for the task. It can be a text passage, a set of questions, or any other relevant information for the task. If there is no specific input, this field is empty. output: A text string that provides the output generated by GPT-3.5 in response to the instruction and input. Dataset Usage This dataset can be used for various natural language understanding and generation tasks, including but not limited to: Training and fine-tuning Alpaca-like language models. Evaluating the performance of pre-trained models on diverse tasks. Exploring the capabilities and limitations of GPT-3.5 generated samples. Conducting research on transfer learning and few-shot learning in NLP. Free for all. If you want to follow me you can @WoodnBarrlGame on twitter.

提供机构：

whitefox44

原始信息汇总

Alpaca-like Model Training Dataset Generated from GPT-3.5

概述

本数据集包含56,000多个样本，由GPT-3.5生成，并经过极端内容过滤，替换为相关的生成输出。专为训练Alpaca-like模型设计。每个样本包括一个指令、一个可选的输入文本和一个输出文本。

数据集描述

数据集包含多样化的任务和提示，适合训练和微调Alpaca-like语言模型。任务类型包括文本摘要、问答、翻译等。数据集使用GPT-3.5生成，提供高质量样本。

数据格式

每个样本以JSON对象形式表示，结构如下： json { "instruction": "string", "input": "string", "output": "string" }

instruction: 任务指令，可以是问题、提示或期望输出的描述。
input: 可选的输入文本，可以是文本段落、一组问题或其他任务相关信息。如无特定输入，此字段为空。
output: GPT-3.5根据指令和输入生成的输出文本。

数据集用途

训练和微调Alpaca-like语言模型。
评估预训练模型在多样化任务上的性能。
探索GPT-3.5生成样本的能力和限制。
进行NLP中迁移学习和少样本学习的研究。

许可

本数据集遵循Apache-2.0许可协议。

5,000+

优质数据集

54 个

任务类型

进入经典数据集