five

whitefox44/AlpacaGPT3.5Customized

收藏
Hugging Face2023-04-02 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/whitefox44/AlpacaGPT3.5Customized
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 --- # Alpaca-like Model Training Dataset Generated from GPT-3.5 This dataset contains 56,000+ samples generated from GPT-3.5 and extreme content filtering removed and replaced with relevant generated outputs. Specifically designed for training Alpaca-like models. Each sample includes an instruction, an optional input text, and an output text. ## Dataset Description The dataset is a collection of diverse tasks and prompts that are suited for training and fine-tuning Alpaca-like language models. It includes a wide variety of tasks such as text summarization, question-answering, translation, and more. The dataset has been generated using GPT-3.5, which provides high-quality samples. ### Data Format Each sample in the dataset is represented as a JSON object with the following structure: ```json { "instruction": "string", "input": "string", "output": "string" } instruction: A text string that provides the instruction for the task. It can be a question, a prompt, or a description of the desired output. input: An optional text string that provides the input for the task. It can be a text passage, a set of questions, or any other relevant information for the task. If there is no specific input, this field is empty. output: A text string that provides the output generated by GPT-3.5 in response to the instruction and input. Dataset Usage This dataset can be used for various natural language understanding and generation tasks, including but not limited to: Training and fine-tuning Alpaca-like language models. Evaluating the performance of pre-trained models on diverse tasks. Exploring the capabilities and limitations of GPT-3.5 generated samples. Conducting research on transfer learning and few-shot learning in NLP. Free for all. If you want to follow me you can @WoodnBarrlGame on twitter.
提供机构:
whitefox44
原始信息汇总

Alpaca-like Model Training Dataset Generated from GPT-3.5

概述

本数据集包含56,000多个样本,由GPT-3.5生成,并经过极端内容过滤,替换为相关的生成输出。专为训练Alpaca-like模型设计。每个样本包括一个指令、一个可选的输入文本和一个输出文本。

数据集描述

数据集包含多样化的任务和提示,适合训练和微调Alpaca-like语言模型。任务类型包括文本摘要、问答、翻译等。数据集使用GPT-3.5生成,提供高质量样本。

数据格式

每个样本以JSON对象形式表示,结构如下: json { "instruction": "string", "input": "string", "output": "string" }

  • instruction: 任务指令,可以是问题、提示或期望输出的描述。
  • input: 可选的输入文本,可以是文本段落、一组问题或其他任务相关信息。如无特定输入,此字段为空。
  • output: GPT-3.5根据指令和输入生成的输出文本。

数据集用途

  • 训练和微调Alpaca-like语言模型。
  • 评估预训练模型在多样化任务上的性能。
  • 探索GPT-3.5生成样本的能力和限制。
  • 进行NLP中迁移学习和少样本学习的研究。

许可

本数据集遵循Apache-2.0许可协议。

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作