five

airoboros-3.2

收藏
魔搭社区2026-01-09 更新2025-02-15 收录
下载链接:
https://modelscope.cn/datasets/jondurbin/airoboros-3.2
下载链接
链接失效反馈
官方服务:
资源简介:
## Overview This dataset is a continuation of the [airoboros-3.1](https://hf.co/datasets/jondurbin/airoboros-3.1) with the following changes: * MathJSON has been removed for the time-being, because it seems to confuse the models at times, causing more problems than it's worth. The mathjson dataset can be found [here](https://huggingface.co/datasets/jondurbin/mathjson-alpha) * The de-censorship data has been re-added, to ensure a non-DPO SFT model using this dataset is relatively uncensored. * ~11k instructions from [slimorca](https://huggingface.co/datasets/Open-Orca/SlimOrca) where extended to have an additional, follow-up turn to enhance multi-turn capabilities. ## Format The format is now in ShareGPT format, to better accomodate the OS ecosystem fine-tuning tooling. ## Usage restriction To use this data, you must acknowledge/agree to the following: - a small sampling of the data contained within is "toxic"/"harmful", and contains profanity and other types of sensitive content - none of the content or views contained in the dataset necessarily align with my personal beliefs or opinions, they are simply text generated by LLMs without a great amount of validation - you are able to use the dataset lawfully, particularly in locations with less-than-free speech laws - you, and you alone are responsible for having downloaded and used the dataset, and I am completely indemnified from any and all liabilities Also note that the data was generated primarily with gpt-4, and therefore may have some strings attached to the OpenAI terms of service.

## 概述 本数据集为[airoboros-3.1](https://hf.co/datasets/jondurbin/airoboros-3.1)的后续迭代版本,更新内容如下: * 暂时移除了MathJSON模块,因其时常会对模型造成干扰,弊大于利。MathJSON数据集可于[此处](https://huggingface.co/datasets/jondurbin/mathjson-alpha)获取。 * 重新加入了去内容审查(de-censorship)数据,以确保使用该数据集训练的非直接偏好优化(DPO)监督微调(SFT)模型具备相对无审查的输出能力。 * 从[slimorca](https://huggingface.co/datasets/Open-Orca/SlimOrca)中选取约1.1万条指令,并为其新增一轮后续对话,以增强模型的多轮对话能力。 ## 格式说明 本数据集现已采用ShareGPT格式,以更好适配开源生态下的微调工具链。 ## 使用限制 使用本数据集前,您需同意以下条款: - 本数据集包含少量“有毒”/“有害”内容,涉及亵渎性语言及其他敏感信息; - 数据集所载内容或观点均不代表笔者个人立场,其仅为大语言模型(LLM)生成的未经充分校验的文本; - 您需确保在符合当地法律法规的前提下使用本数据集,尤其在言论自由受限的地区; - 数据集的下载与使用责任由您本人承担,笔者不对任何相关法律责任承担连带责任。 另需注意,本数据集主要由GPT-4生成,因此需遵守OpenAI服务条款的相关约束。
提供机构:
maas
创建时间:
2025-08-29
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作