UDACA/MMA-with-Text
收藏Hugging Face2024-03-18 更新2024-06-11 收录
下载链接:
https://hf-mirror.com/datasets/UDACA/MMA-with-Text
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-4.0
language:
- en
---
# Isa-MMA: Isabelle Components of the Multilingual Mathematical Autoformalization (MMA) Dataset
## Dataset Description
Jiang et. al. published a paper entitled [*Multilingual Mathematical Autoformalization*](https://arxiv.org/abs/2311.03755) that included datasets in Isabelle and Lean. This dataset is the combination of the Isabelle test and Isabelle val datasets merged into one. In addition, this dataset contains formal and informal stastements that have been cleaned of cues. For example, the original datasets had cues within statements for training (e.g.every informal statement ended with "Translate the statement in natural language to Isabelle).
Instead, these have been removed and I have added a text column to match that of the Isa-AF dataset ("Informal statement is: [informal statement value] Formal statement is: [formal statement value]")
## Data Fields
- `input`: The words "Statement in natural language:" followed by a statement in natural language.
- `output`: The statement from the `input` column formalized in Isabelle.
许可证:CC BY 4.0
语言:
- 英语
# Isa-MMA:多语言数学自动形式化(MMA)数据集的Isabelle组件
## 数据集说明
Jiang等人发表了题为《多语言数学自动形式化》的论文(https://arxiv.org/abs/2311.03755),该论文包含了Isabelle与Lean格式的数据集。本数据集为原数据集中Isabelle测试集与Isabelle验证集的合并集合。此外,本数据集包含已清除提示词的形式化与非形式化语句。例如,原始数据集的训练语句中带有提示词(如每条非形式化语句均以“将该自然语言语句翻译为Isabelle”结尾)。本数据集已移除此类提示词,并新增了一列文本,格式与Isa-AF数据集保持一致:“非形式化语句为:[非形式化语句内容] 形式化语句为:[形式化语句内容]”
## 数据字段
- `input`:内容为“自然语言语句:”后接对应的自然语言语句。
- `output`:为`input`列中语句对应的Isabelle形式化表述。
提供机构:
UDACA
原始信息汇总
Isa-MMA: Isabelle Components of the Multilingual Mathematical Autoformalization (MMA) Dataset
数据集描述
本数据集由Jiang等人发表的论文《Multilingual Mathematical Autoformalization》中包含的Isabelle测试和验证数据集合并而成。数据集中的正式和非正式声明已去除提示信息,并新增了一个文本列,格式与Isa-AF数据集相匹配,即“非正式声明是:[非正式声明值] 正式声明是:[正式声明值]”。
数据字段
input: 包含“自然语言中的声明:”后接自然语言声明。output: 来自input列的声明在Isabelle中的形式化表示。
搜集汇总
数据集介绍

背景与挑战
背景概述
Isa-MMA is a dataset for autoformalization, containing pairs of informal mathematical statements and their formal Isabelle counterparts, cleaned of training cues and enhanced with additional text columns for consistency with related datasets.
以上内容由遇见数据集搜集并总结生成



