airoboros-3.0
收藏魔搭社区2025-12-05 更新2025-12-06 收录
下载链接:
https://modelscope.cn/datasets/jondurbin/airoboros-3.0
下载链接
链接失效反馈官方服务:
资源简介:
## Overview
This dataset is a continuation of the airoboros datasets, with two main new contributions:
* MathJSON - math questions, prefixed with __"Create a MathJSON solution to the following:"__, which then outputs a JSON between __`<mathjson>`__ and __`</mathjson>`__ tags, which can be parsed and passed to a deterministic library to perform calculations.
* Anon-contributed RP dataset to enhance multi-turn coherency.
Some of the MathJSON data was adapted from https://huggingface.co/datasets/meta-math/MetaMathQA
## Format
The format is now in ShareGPT format, to better accomodate the OS ecosystem fine-tuning tooling.
## MathJSON
As we all know, math is really difficult for LLMs, especially floating point operations, trig functions, factorials, and several-digit numbers.
Instead of sending millions and millions of examples to the LLM for fine-tuning, I've created a new category `mathjson` which simply presents the solution to the problem as a JSON string that can be evaluated by traditional computation.
The dataset has roughly 4k samples, as a proof of concept. This can be expanded fairly easily, now that the fine-tuned models have a basic understanding of how to represent the problems.
For example:
```
Create a MathJSON solution to the following: Calculate the area of a circle with radius of 17.2456cm. Include your reasoning. [/INST]
```
```
The area (A) of a circle is given by the formula A = πr², where r is the radius of the circle. Since the radius is given in centimeters, we need to convert it to meters first. We know that 1 meter equals 100 centimeters. Therefore, we can represent the radius as: r = 17.2456 cm * 100/1.
Once we have the radius in meters, we can calculate the area of the circle using the formula A = πr².
Solution as MathJSON:
<mathjson>
[
"Multiply",
"Pi",
[
"Power",
17.2456,
2
]
]
</mathjson>
```
Simply extract the JSON string between the mathjson tags, and evaluate it with https://cortexjs.io/compute-engine/ or my ultra-hacky limited-functionality python version of it https://github.com/jondurbin/airoboros/blob/mathjson/airoboros/mathjson.py
## 概述
本数据集是airoboros数据集的延续,主要新增了两项贡献:
* **数学JSON(MathJSON)**:该类别包含数学问题,以`"Create a MathJSON solution to the following:"`作为前缀,随后会在`<mathjson>`与`</mathjson>`标签之间输出一段JSON字符串,该字符串可被解析并传入确定性计算库以执行运算。
* **匿名贡献的角色扮演(RP, Role-Playing)数据集**:用于提升多轮对话的连贯性。
本数据集部分数学JSON数据改编自https://huggingface.co/datasets/meta-math/MetaMathQA
## 格式
本数据集现已采用ShareGPT格式,以更好适配开源生态的微调工具链。
## 数学JSON
众所周知,数学问题对大语言模型(Large Language Model, LLM)而言极具挑战性,尤其是浮点运算、三角函数、阶乘以及多位数计算场景。
相较于向大语言模型提供百万级训练样本进行微调,本工作新增了`mathjson`类别,仅需将问题的求解方案以可通过传统计算引擎求值的JSON字符串形式呈现即可。
作为概念验证,本数据集目前包含约4000条样本。鉴于经微调的模型已基本掌握问题的表示方式,后续可轻松扩展数据集规模。
示例如下:
为以下问题生成数学JSON求解方案:计算半径为17.2456厘米的圆的面积。请包含推理过程。[/INST]
圆的面积(A)可由公式 $A = pi r^2$ 计算得出,其中$r$为圆的半径。由于题目中半径单位为厘米,我们首先需要将其转换为米。已知1米等于100厘米,因此半径可表示为:$r = 17.2456 ext{ cm} imes 100/1$。
获取以米为单位的半径后,我们可通过公式 $A = pi r^2$ 计算圆的面积。
数学JSON形式的求解方案:
<mathjson>
[
"Multiply",
"Pi",
[
"Power",
17.2456,
2
]
]
</mathjson>
仅需提取`<mathjson>`与`</mathjson>`标签之间的JSON字符串,即可通过https://cortexjs.io/compute-engine/ 或作者开发的简易版有限功能Python实现(https://github.com/jondurbin/airoboros/blob/mathjson/airoboros/mathjson.py)进行求值。
提供机构:
maas
创建时间:
2025-08-29



