jondurbin/airoboros-gpt4-1.1

Name: jondurbin/airoboros-gpt4-1.1
Creator: jondurbin
Published: 2023-06-22 15:00:56
License: 暂无描述

Hugging Face2023-06-22 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/jondurbin/airoboros-gpt4-1.1

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc-by-nc-4.0 --- The data was generated by gpt-4, and therefore is subject to OpenAI ToS. The tool used to generate the data [airoboros](https://github.com/jondurbin/airoboros) is apache-2. Specific areas of focus for this training data: * trivia * math * nonsensical math * coding * closed context question answering * closed context question answering, with multiple contexts to choose from as confounding factors * writing * multiple choice This is largely an overlap of the original [dataset](https://huggingface.co/datasets/jondurbin/airoboros-gpt4), but with a few extras: * fixed contextual entries that were missing closing tags (e.g. "ENDINPUT", "ENDINSTRUCTION", etc.) * fixed an issue where source information was provided, even if not asked (the model always tried to provide source info) * added some questions that were unrelated to the provided context, to train the model to say when it can't provide an answer * added several new contexual instructions, including some with FAQ style to hopefully prevent questions in the context from breaking the inference * hundreds more coding samples, focusing primarily on python, java, javascript, c/c++, and golang ### Usage and License Notices All airoboros models and datasets are intended and licensed for research use only. I've used the 'cc-nc-4.0' license, but really it is subject to a custom/special license because: - the base model is LLaMa, which has it's own special research license - the dataset(s) were generated with OpenAI (gpt-4 and/or gpt-3.5-turbo), which has a clausing saying the data can't be used to create models to compete with openai So, to reiterate: this model (and datasets) cannot be used commercially.

提供机构：

jondurbin

原始信息汇总

数据集概述

数据来源

数据由gpt-4生成，受OpenAI服务条款约束。
数据生成工具为airoboros，遵循Apache-2许可。

数据内容

主要关注领域包括：
- 琐事
- 数学
- 荒谬数学
- 编程
- 封闭情境下的问答
- 封闭情境下的问答，包含多个情境作为混淆因素
- 写作
- 多项选择

数据集特点

与原始数据集jondurbin/airoboros-gpt4有较大重叠，但增加了以下内容：
- 修复了缺失闭合标签的上下文条目（如"ENDINPUT", "ENDINSTRUCTION"等）。
- 解决了模型即使未被要求也提供源信息的问题。
- 添加了一些与提供情境无关的问题，以训练模型在无法提供答案时进行识别。
- 增加了新的上下文指令，包括FAQ样式，以防止情境中的问题破坏推理。
- 增加了数百个编程示例，主要集中在Python, Java, JavaScript, C/C++, 和Golang。

使用许可

数据集和模型仅供研究使用，遵循cc-nc-4.0许可。
由于基础模型LLaMa具有特殊研究许可，且数据生成使用了OpenAI的gpt-4和/或gpt-3.5-turbo，数据集实际上受限于特殊的许可条件，禁止商业使用。

5,000+

优质数据集

54 个

任务类型

进入经典数据集