shuyuej/CleanedMetaMathQA
收藏MetaMath Dataset
数据集概述
- 数据集名称: MetaMath Dataset
- 数据格式: 包含三部分信息,分别是原始问题(original_question)、改述问题(paraphrased_question)和答案详情(answer_detail)。
数据集使用
-
加载数据: 使用以下代码从Hugging Face的datasets库中加载数据。 python from datasets import load_dataset
dataset = load_dataset("shuyuej/CleanedMetaMathQA") dataset = dataset[train] print(dataset)
数据集修改
-
修改代码: 提供了一段Python代码,用于加载原始数据集并进行处理,最终将处理后的数据保存为JSONL格式文件。 python
coding=utf-8
import re
import jsonlines from datasets import load_dataset
Load the dataset
dataset = load_dataset("meta-math/MetaMathQA") dataset = dataset["train"]
data = []
Define a regular expression pattern
pattern = re.compile(r ####(.*?) The answer is: , re.DOTALL) for example in dataset: original_question = example[original_question] paraphrased_question = example[query] answer_detail = example[response] # Use the pattern to find the information match = re.search(pattern, answer_detail) if match: info = match.group(1).strip() answer_detail = answer_detail.replace(
+ info, )
data.append({"original_question": original_question,
"paraphrased_question": paraphrased_question,
"answer_detail": answer_detail})
Save the modified data to a jsonl file
output_file = CleanedMetaMathQA.jsonl with jsonlines.open(output_file, w) as writer: writer.write_all(data)
print(f"Modified data saved to {output_file}")



