airoboros-3.2

Name: airoboros-3.2
Creator: maas
Published: 2026-01-09 15:56:33
License: 暂无描述

魔搭社区2026-01-09 更新2025-02-15 收录

下载链接：

https://modelscope.cn/datasets/jondurbin/airoboros-3.2

下载链接

链接失效反馈

官方服务：

资源简介：

## Overview This dataset is a continuation of the [airoboros-3.1](https://hf.co/datasets/jondurbin/airoboros-3.1) with the following changes: * MathJSON has been removed for the time-being, because it seems to confuse the models at times, causing more problems than it's worth. The mathjson dataset can be found [here](https://huggingface.co/datasets/jondurbin/mathjson-alpha) * The de-censorship data has been re-added, to ensure a non-DPO SFT model using this dataset is relatively uncensored. * ~11k instructions from [slimorca](https://huggingface.co/datasets/Open-Orca/SlimOrca) where extended to have an additional, follow-up turn to enhance multi-turn capabilities. ## Format The format is now in ShareGPT format, to better accomodate the OS ecosystem fine-tuning tooling. ## Usage restriction To use this data, you must acknowledge/agree to the following: - a small sampling of the data contained within is "toxic"/"harmful", and contains profanity and other types of sensitive content - none of the content or views contained in the dataset necessarily align with my personal beliefs or opinions, they are simply text generated by LLMs without a great amount of validation - you are able to use the dataset lawfully, particularly in locations with less-than-free speech laws - you, and you alone are responsible for having downloaded and used the dataset, and I am completely indemnified from any and all liabilities Also note that the data was generated primarily with gpt-4, and therefore may have some strings attached to the OpenAI terms of service.

## 概述本数据集为[airoboros-3.1](https://hf.co/datasets/jondurbin/airoboros-3.1)的后续迭代版本，更新内容如下： * 暂时移除了MathJSON模块，因其时常会对模型造成干扰，弊大于利。MathJSON数据集可于[此处](https://huggingface.co/datasets/jondurbin/mathjson-alpha)获取。 * 重新加入了去内容审查（de-censorship）数据，以确保使用该数据集训练的非直接偏好优化（DPO）监督微调（SFT）模型具备相对无审查的输出能力。 * 从[slimorca](https://huggingface.co/datasets/Open-Orca/SlimOrca)中选取约1.1万条指令，并为其新增一轮后续对话，以增强模型的多轮对话能力。 ## 格式说明本数据集现已采用ShareGPT格式，以更好适配开源生态下的微调工具链。 ## 使用限制使用本数据集前，您需同意以下条款： - 本数据集包含少量“有毒”/“有害”内容，涉及亵渎性语言及其他敏感信息； - 数据集所载内容或观点均不代表笔者个人立场，其仅为大语言模型（LLM）生成的未经充分校验的文本； - 您需确保在符合当地法律法规的前提下使用本数据集，尤其在言论自由受限的地区； - 数据集的下载与使用责任由您本人承担，笔者不对任何相关法律责任承担连带责任。另需注意，本数据集主要由GPT-4生成，因此需遵守OpenAI服务条款的相关约束。

提供机构：

maas

创建时间：

2025-08-29

5,000+

优质数据集

54 个

任务类型

进入经典数据集