five

jondurbin/airoboros-gpt4-1.1

收藏
Hugging Face2023-06-22 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/jondurbin/airoboros-gpt4-1.1
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-nc-4.0 --- The data was generated by gpt-4, and therefore is subject to OpenAI ToS. The tool used to generate the data [airoboros](https://github.com/jondurbin/airoboros) is apache-2. Specific areas of focus for this training data: * trivia * math * nonsensical math * coding * closed context question answering * closed context question answering, with multiple contexts to choose from as confounding factors * writing * multiple choice This is largely an overlap of the original [dataset](https://huggingface.co/datasets/jondurbin/airoboros-gpt4), but with a few extras: * fixed contextual entries that were missing closing tags (e.g. "ENDINPUT", "ENDINSTRUCTION", etc.) * fixed an issue where source information was provided, even if not asked (the model always tried to provide source info) * added some questions that were unrelated to the provided context, to train the model to say when it can't provide an answer * added several new contexual instructions, including some with FAQ style to hopefully prevent questions in the context from breaking the inference * hundreds more coding samples, focusing primarily on python, java, javascript, c/c++, and golang ### Usage and License Notices All airoboros models and datasets are intended and licensed for research use only. I've used the 'cc-nc-4.0' license, but really it is subject to a custom/special license because: - the base model is LLaMa, which has it's own special research license - the dataset(s) were generated with OpenAI (gpt-4 and/or gpt-3.5-turbo), which has a clausing saying the data can't be used to create models to compete with openai So, to reiterate: this model (and datasets) cannot be used commercially.
提供机构:
jondurbin
原始信息汇总

数据集概述

数据来源

  • 数据由gpt-4生成,受OpenAI服务条款约束。
  • 数据生成工具为airoboros,遵循Apache-2许可。

数据内容

  • 主要关注领域包括:
    • 琐事
    • 数学
    • 荒谬数学
    • 编程
    • 封闭情境下的问答
    • 封闭情境下的问答,包含多个情境作为混淆因素
    • 写作
    • 多项选择

数据集特点

  • 与原始数据集jondurbin/airoboros-gpt4有较大重叠,但增加了以下内容:
    • 修复了缺失闭合标签的上下文条目(如"ENDINPUT", "ENDINSTRUCTION"等)。
    • 解决了模型即使未被要求也提供源信息的问题。
    • 添加了一些与提供情境无关的问题,以训练模型在无法提供答案时进行识别。
    • 增加了新的上下文指令,包括FAQ样式,以防止情境中的问题破坏推理。
    • 增加了数百个编程示例,主要集中在Python, Java, JavaScript, C/C++, 和Golang。

使用许可

  • 数据集和模型仅供研究使用,遵循cc-nc-4.0许可。
  • 由于基础模型LLaMa具有特殊研究许可,且数据生成使用了OpenAI的gpt-4和/或gpt-3.5-turbo,数据集实际上受限于特殊的许可条件,禁止商业使用。
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作