whitefox44/AlpacaGPT3.5Customized
收藏Hugging Face2023-04-02 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/whitefox44/AlpacaGPT3.5Customized
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
---
# Alpaca-like Model Training Dataset Generated from GPT-3.5
This dataset contains 56,000+ samples generated from GPT-3.5 and extreme content filtering removed and replaced with relevant generated outputs. Specifically designed for training Alpaca-like models. Each sample includes an instruction, an optional input text, and an output text.
## Dataset Description
The dataset is a collection of diverse tasks and prompts that are suited for training and fine-tuning Alpaca-like language models. It includes a wide variety of tasks such as text summarization, question-answering, translation, and more. The dataset has been generated using GPT-3.5, which provides high-quality samples.
### Data Format
Each sample in the dataset is represented as a JSON object with the following structure:
```json
{
"instruction": "string",
"input": "string",
"output": "string"
}
instruction: A text string that provides the instruction for the task. It can be a question, a prompt, or a description of the desired output.
input: An optional text string that provides the input for the task. It can be a text passage, a set of questions, or any other relevant information for the task. If there is no specific input, this field is empty.
output: A text string that provides the output generated by GPT-3.5 in response to the instruction and input.
Dataset Usage
This dataset can be used for various natural language understanding and generation tasks, including but not limited to:
Training and fine-tuning Alpaca-like language models.
Evaluating the performance of pre-trained models on diverse tasks.
Exploring the capabilities and limitations of GPT-3.5 generated samples.
Conducting research on transfer learning and few-shot learning in NLP.
Free for all.
If you want to follow me you can @WoodnBarrlGame on twitter.
提供机构:
whitefox44
原始信息汇总
Alpaca-like Model Training Dataset Generated from GPT-3.5
概述
本数据集包含56,000多个样本,由GPT-3.5生成,并经过极端内容过滤,替换为相关的生成输出。专为训练Alpaca-like模型设计。每个样本包括一个指令、一个可选的输入文本和一个输出文本。
数据集描述
数据集包含多样化的任务和提示,适合训练和微调Alpaca-like语言模型。任务类型包括文本摘要、问答、翻译等。数据集使用GPT-3.5生成,提供高质量样本。
数据格式
每个样本以JSON对象形式表示,结构如下: json { "instruction": "string", "input": "string", "output": "string" }
- instruction: 任务指令,可以是问题、提示或期望输出的描述。
- input: 可选的输入文本,可以是文本段落、一组问题或其他任务相关信息。如无特定输入,此字段为空。
- output: GPT-3.5根据指令和输入生成的输出文本。
数据集用途
- 训练和微调Alpaca-like语言模型。
- 评估预训练模型在多样化任务上的性能。
- 探索GPT-3.5生成样本的能力和限制。
- 进行NLP中迁移学习和少样本学习的研究。
许可
本数据集遵循Apache-2.0许可协议。



