yanzihan1/math12k
收藏Hugging Face2026-04-05 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/yanzihan1/math12k
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: problem
dtype: string
- name: answer
dtype: string
splits:
- name: train
num_bytes: 2606447
num_examples: 12000
- name: test
num_bytes: 104912
num_examples: 500
download_size: 1572140
dataset_size: 2711359
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
- split: test
path: data/test-*
license: mit
task_categories:
- question-answering
language:
- en
size_categories:
- 10K<n<100K
---
This dataset was converted from [https://github.com/openai/prm800k](https://github.com/openai/prm800k) using the following script.
```python
import json
import os
from datasets import Dataset, DatasetDict
def generate_data(data_path: str):
with open(data_path, "r", encoding="utf-8") as f:
for line in f:
data = json.loads(line)
yield {
"problem": data["problem"],
"answer": data["answer"],
}
def main():
trainset = Dataset.from_generator(generate_data, gen_kwargs={"data_path": os.path.join("prm800k", "math_splits", "train.jsonl")})
testset = Dataset.from_generator(generate_data, gen_kwargs={"data_path": os.path.join("prm800k", "math_splits", "test.jsonl")})
dataset = DatasetDict({"train": trainset, "test": testset})
dataset.push_to_hub("hiyouga/math12k")
if __name__ == "__main__":
main()
```
提供机构:
yanzihan1



