MU-NLPC/Calc-svamp
收藏Hugging Face2023-10-30 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/MU-NLPC/Calc-svamp
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
license: mit
size_categories:
- n<1K
task_categories:
- text-generation
tags:
- math world problems
- math
- arithmetics
dataset_info:
- config_name: default
features:
- name: id
dtype: string
- name: question
dtype: string
- name: chain
dtype: string
- name: result
dtype: string
- name: result_float
dtype: float64
- name: equation
dtype: string
- name: problem_type
dtype: string
splits:
- name: test
num_bytes: 335744
num_examples: 1000
download_size: 116449
dataset_size: 335744
- config_name: original-splits
features:
- name: id
dtype: string
- name: question
dtype: string
- name: chain
dtype: string
- name: result
dtype: string
- name: result_float
dtype: float64
- name: equation
dtype: string
- name: problem_type
dtype: string
splits:
- name: test
num_bytes: 335744
num_examples: 1000
download_size: 116449
dataset_size: 335744
configs:
- config_name: default
data_files:
- split: test
path: data/test-*
- config_name: original-splits
data_files:
- split: test
path: original-splits/test-*
---
# Dataset Card for Calc-SVAMP
## Summary
The dataset is a collection of simple math word problems focused on arithmetics. It is derived from <https://github.com/arkilpatel/SVAMP/>.
The main addition in this dataset variant is the `chain` column. It was created by converting the solution to a simple html-like language that can be easily
parsed (e.g. by BeautifulSoup). The data contains 3 types of tags:
- gadget: A tag whose content is intended to be evaluated by calling an external tool (sympy-based calculator in this case)
- output: An output of the external tool
- result: The final answer to the mathematical problem (a number)
## Supported Tasks
This variant of the dataset is intended for training Chain-of-Thought reasoning models able to use external tools to enhance the factuality of their responses.
This dataset presents in-context scenarios where models can outsource the computations in the reasoning chain to a calculator.
## Construction process
We created the dataset by converting the **equation** attribute in the original dataset to a sequence (chain) of calculations, with final one being the result to the math problem.
We also perform in-dataset and cross-dataset data-leak detection within the [Calc-X collection](https://huggingface.co/collections/MU-NLPC/calc-x-652fee9a6b838fd820055483).
However, for SVAMP specifically, we detected no data leaks and filtered no data.
## Content and data splits
The dataset contains the same data instances as the original dataset except for a correction of inconsistency between `equation` and `answer` in one data instance.
To the best of our knowledge, the original dataset does not contain an official train-test split. We treat the whole dataset as a testing benchmark.
## Attributes:
- **id**: problem id from the original dataset
- **question**: the question intended to answer
- **chain**: series of simple operations (derived from `equation`) that leads to the solution
- **result**: the result (number) as a string
- **result_float**: result converted to a floating point
- **equation**: a nested expression that evaluates to the correct result
- **problem_type**: a category of the problem
Attributes **id**, **question**, **chain**, and **result** are present in all datasets in [Calc-X collection](https://huggingface.co/collections/MU-NLPC/calc-x-652fee9a6b838fd820055483).
## Related work
This dataset was created as a part of a larger effort in training models capable of using a calculator during inference, which we call Calcformers.
- [**Calc-X collection**](https://huggingface.co/collections/MU-NLPC/calc-x-652fee9a6b838fd820055483) - datasets for training Calcformers
- [**Calcformers collection**](https://huggingface.co/collections/MU-NLPC/calcformers-65367392badc497807b3caf5) - calculator-using models we trained and published on HF
- [**Calc-X and Calcformers paper**](https://arxiv.org/abs/2305.15017)
- [**Calc-X and Calcformers repo**](https://github.com/prompteus/calc-x)
Here are links to the original dataset:
- [**original SVAMP dataset and repo**](https://github.com/arkilpatel/SVAMP/)
- [**original SVAMP paper**](https://www.semanticscholar.org/paper/Are-NLP-Models-really-able-to-Solve-Simple-Math-Patel-Bhattamishra/13c4e5a6122f3fa2663f63e49537091da6532f35)
## Licence
MIT, consistent with the original source dataset linked above.
## Cite
If you use this version of dataset in research, please cite the original [SVAMP paper](https://www.semanticscholar.org/paper/Are-NLP-Models-really-able-to-Solve-Simple-Math-Patel-Bhattamishra/13c4e5a6122f3fa2663f63e49537091da6532f35), and [Calc-X collection](https://arxiv.org/abs/2305.15017) as follows:
```bibtex
@inproceedings{kadlcik-etal-2023-soft,
title = "Calc-X and Calcformers: Empowering Arithmetical Chain-of-Thought through Interaction with Symbolic Systems",
author = "Marek Kadlčík and Michal Štefánik and Ondřej Sotolář and Vlastimil Martinek",
booktitle = "Proceedings of the The 2023 Conference on Empirical Methods in Natural Language Processing: Main track",
month = dec,
year = "2023",
address = "Singapore, Singapore",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/2305.15017",
}
```
The Calc-SVAMP dataset is a collection of simple math word problems focused on arithmetic operations. It is derived from the original SVAMP dataset and includes a new column called chain, which represents the solution in a simple html-like language. The dataset is designed for training Chain-of-Thought reasoning models that can use external tools like calculators to enhance the accuracy of their responses. The dataset contains attributes such as id, question, chain, result, result_float, equation, and problem_type. It does not have a train-test split and is treated as a testing benchmark. The dataset is part of a larger effort called Calcformers, which aims to train models capable of using calculators during inference.
提供机构:
MU-NLPC
原始信息汇总
数据集概述
基本信息
- 语言: 英语
- 许可证: MIT
- 大小分类: n<1K
- 任务分类: 文本生成
- 标签: 数学世界问题, 数学, 算术
数据集配置
默认配置
- 配置名称: default
- 特征:
- id: 字符串
- question: 字符串
- chain: 字符串
- result: 字符串
- result_float: 浮点数
- equation: 字符串
- problem_type: 字符串
- 分割:
- test:
- 字节数: 335744
- 样本数: 1000
- test:
- 下载大小: 116449
- 数据集大小: 335744
原始分割配置
- 配置名称: original-splits
- 特征:
- id: 字符串
- question: 字符串
- chain: 字符串
- result: 字符串
- result_float: 浮点数
- equation: 字符串
- problem_type: 字符串
- 分割:
- test:
- 字节数: 335744
- 样本数: 1000
- test:
- 下载大小: 116449
- 数据集大小: 335744
数据文件配置
- 默认配置:
- 数据文件:
- 分割: test
- 路径: data/test-*
- 数据文件:
- 原始分割配置:
- 数据文件:
- 分割: test
- 路径: original-splits/test-*
- 数据文件:
数据集内容
数据集概述
该数据集是一系列专注于算术的简单数学应用题集合。主要新增了chain列,通过将解决方案转换为易于解析的简单html-like语言(例如通过BeautifulSoup)来创建。数据包含三种类型的标签:
- gadget: 内容旨在通过调用外部工具(本例中为基于sympy的计算器)进行评估的标签
- output: 外部工具的输出
- result: 数学问题的最终答案(一个数字)
支持的任务
该数据集变体旨在训练能够使用外部工具增强其响应事实性的思维链推理模型。该数据集提供了模型可以在推理链中将计算外包给计算器的情境。
构建过程
通过将原始数据集中的equation属性转换为一系列计算序列(链)来创建数据集,最后一个计算是数学问题的结果。我们还进行了数据泄露检测,但在SVAMP中未检测到数据泄露,未过滤任何数据。
内容和数据分割
数据集包含与原始数据集相同的数据实例,除了在一个数据实例中修正了equation和answer之间的一致性错误。据我们所知,原始数据集没有官方的训练-测试分割,我们将整个数据集视为测试基准。
属性
- id: 原始数据集中的问题ID
- question: 要回答的问题
- chain: 一系列简单操作(源自
equation),导致解决方案 - result: 结果(数字)作为字符串
- result_float: 结果转换为浮点数
- equation: 评估为正确结果的嵌套表达式
- problem_type: 问题的类别
id, question, chain和result属性在Calc-X集合中的所有数据集中都存在。
搜集汇总
数据集介绍

构建方式
Calc-SVAMP数据集源自SVAMP项目,专注于算术领域的简单数学应用题。该数据集的构建过程涉及将原始数据集中的方程属性转换为一系列计算步骤,最终形成解决问题的链条。这一过程中,还引入了HTML类语言标记,便于解析和外部工具调用。数据集在Calc-X集合中进行了数据泄露检测,确保数据的独立性和准确性。
特点
Calc-SVAMP数据集的特点在于其独特的`chain`列,该列通过将解决方案转换为易于解析的HTML类语言,增强了模型在推理过程中使用外部工具的能力。数据集包含1000个测试实例,每个实例均包含问题、计算链、结果及其浮点表示、方程和问题类型等属性,为训练链式思维推理模型提供了丰富的上下文场景。
使用方法
Calc-SVAMP数据集主要用于训练能够使用外部工具(如计算器)增强推理能力的链式思维模型。研究人员可以通过解析`chain`列中的计算步骤,模拟模型在解决数学问题时调用外部工具的过程。数据集作为测试基准,适用于评估模型在算术应用题上的表现,特别是在需要精确计算和逻辑推理的场景中。
背景与挑战
背景概述
Calc-SVAMP数据集由MU-NLPC团队于2023年创建,旨在推动算术推理模型的发展。该数据集源自SVAMP项目,专注于解决简单的数学文字问题,特别是算术运算。其主要创新在于引入了`chain`列,将解决方案转换为易于解析的HTML类语言,从而支持模型在推理过程中调用外部工具(如基于SymPy的计算器)。该数据集是Calc-X系列的一部分,旨在训练能够使用计算器的推理模型(Calcformers),并在自然语言处理领域推动了算术推理与符号系统交互的研究。
当前挑战
Calc-SVAMP数据集面临的挑战主要集中在两个方面。首先,在领域问题层面,尽管数据集旨在提升模型在算术推理中的表现,但如何确保模型在复杂多步推理中准确调用外部工具仍是一个难题。其次,在构建过程中,团队需要将原始数据集中的方程转换为可解析的计算链,同时确保数据一致性和避免数据泄露。此外,由于原始数据集缺乏官方的训练-测试划分,如何有效评估模型性能也成为一项挑战。这些问题的解决对于推动算术推理模型的实用化具有重要意义。
常用场景
经典使用场景
Calc-SVAMP数据集在自然语言处理领域中被广泛用于训练和评估能够进行链式推理的模型,特别是在解决数学应用题方面。该数据集通过提供一系列数学问题及其对应的计算链,使得模型能够在推理过程中调用外部工具(如计算器)来增强其答案的准确性。这种场景特别适用于需要精确计算的复杂数学问题。
实际应用
在实际应用中,Calc-SVAMP数据集可以用于开发智能教育系统,帮助学生解决数学问题。通过将数据集中的数学问题与计算链相结合,系统能够提供详细的解题步骤,帮助学生理解问题的解决过程。此外,该数据集还可用于开发智能助手,帮助用户快速解决日常生活中的数学问题。
衍生相关工作
Calc-SVAMP数据集衍生了一系列相关研究,特别是在链式推理模型和计算器辅助推理领域。例如,Calc-X项目和Calcformers模型都是基于该数据集开发的,旨在训练能够使用计算器进行推理的模型。这些研究不仅推动了自然语言处理技术的发展,还为解决复杂数学问题提供了新的工具和方法。
以上内容由遇见数据集搜集并总结生成



