five

airoboros-3.0

收藏
魔搭社区2025-12-05 更新2025-12-06 收录
下载链接:
https://modelscope.cn/datasets/jondurbin/airoboros-3.0
下载链接
链接失效反馈
官方服务:
资源简介:
## Overview This dataset is a continuation of the airoboros datasets, with two main new contributions: * MathJSON - math questions, prefixed with __"Create a MathJSON solution to the following:"__, which then outputs a JSON between __`<mathjson>`__ and __`</mathjson>`__ tags, which can be parsed and passed to a deterministic library to perform calculations. * Anon-contributed RP dataset to enhance multi-turn coherency. Some of the MathJSON data was adapted from https://huggingface.co/datasets/meta-math/MetaMathQA ## Format The format is now in ShareGPT format, to better accomodate the OS ecosystem fine-tuning tooling. ## MathJSON As we all know, math is really difficult for LLMs, especially floating point operations, trig functions, factorials, and several-digit numbers. Instead of sending millions and millions of examples to the LLM for fine-tuning, I've created a new category `mathjson` which simply presents the solution to the problem as a JSON string that can be evaluated by traditional computation. The dataset has roughly 4k samples, as a proof of concept. This can be expanded fairly easily, now that the fine-tuned models have a basic understanding of how to represent the problems. For example: ``` Create a MathJSON solution to the following: Calculate the area of a circle with radius of 17.2456cm. Include your reasoning. [/INST] ``` ``` The area (A) of a circle is given by the formula A = πr², where r is the radius of the circle. Since the radius is given in centimeters, we need to convert it to meters first. We know that 1 meter equals 100 centimeters. Therefore, we can represent the radius as: r = 17.2456 cm * 100/1. Once we have the radius in meters, we can calculate the area of the circle using the formula A = πr². Solution as MathJSON: <mathjson> [ "Multiply", "Pi", [ "Power", 17.2456, 2 ] ] </mathjson> ``` Simply extract the JSON string between the mathjson tags, and evaluate it with https://cortexjs.io/compute-engine/ or my ultra-hacky limited-functionality python version of it https://github.com/jondurbin/airoboros/blob/mathjson/airoboros/mathjson.py

## 概述 本数据集是airoboros数据集的延续,主要新增了两项贡献: * **数学JSON(MathJSON)**:该类别包含数学问题,以`"Create a MathJSON solution to the following:"`作为前缀,随后会在`<mathjson>`与`</mathjson>`标签之间输出一段JSON字符串,该字符串可被解析并传入确定性计算库以执行运算。 * **匿名贡献的角色扮演(RP, Role-Playing)数据集**:用于提升多轮对话的连贯性。 本数据集部分数学JSON数据改编自https://huggingface.co/datasets/meta-math/MetaMathQA ## 格式 本数据集现已采用ShareGPT格式,以更好适配开源生态的微调工具链。 ## 数学JSON 众所周知,数学问题对大语言模型(Large Language Model, LLM)而言极具挑战性,尤其是浮点运算、三角函数、阶乘以及多位数计算场景。 相较于向大语言模型提供百万级训练样本进行微调,本工作新增了`mathjson`类别,仅需将问题的求解方案以可通过传统计算引擎求值的JSON字符串形式呈现即可。 作为概念验证,本数据集目前包含约4000条样本。鉴于经微调的模型已基本掌握问题的表示方式,后续可轻松扩展数据集规模。 示例如下: 为以下问题生成数学JSON求解方案:计算半径为17.2456厘米的圆的面积。请包含推理过程。[/INST] 圆的面积(A)可由公式 $A = pi r^2$ 计算得出,其中$r$为圆的半径。由于题目中半径单位为厘米,我们首先需要将其转换为米。已知1米等于100厘米,因此半径可表示为:$r = 17.2456 ext{ cm} imes 100/1$。 获取以米为单位的半径后,我们可通过公式 $A = pi r^2$ 计算圆的面积。 数学JSON形式的求解方案: <mathjson> [ "Multiply", "Pi", [ "Power", 17.2456, 2 ] ] </mathjson> 仅需提取`<mathjson>`与`</mathjson>`标签之间的JSON字符串,即可通过https://cortexjs.io/compute-engine/ 或作者开发的简易版有限功能Python实现(https://github.com/jondurbin/airoboros/blob/mathjson/airoboros/mathjson.py)进行求值。
提供机构:
maas
创建时间:
2025-08-29
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作