five

yanzihan1/math12k

收藏
Hugging Face2026-04-05 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/yanzihan1/math12k
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: problem dtype: string - name: answer dtype: string splits: - name: train num_bytes: 2606447 num_examples: 12000 - name: test num_bytes: 104912 num_examples: 500 download_size: 1572140 dataset_size: 2711359 configs: - config_name: default data_files: - split: train path: data/train-* - split: test path: data/test-* license: mit task_categories: - question-answering language: - en size_categories: - 10K<n<100K --- This dataset was converted from [https://github.com/openai/prm800k](https://github.com/openai/prm800k) using the following script. ```python import json import os from datasets import Dataset, DatasetDict def generate_data(data_path: str): with open(data_path, "r", encoding="utf-8") as f: for line in f: data = json.loads(line) yield { "problem": data["problem"], "answer": data["answer"], } def main(): trainset = Dataset.from_generator(generate_data, gen_kwargs={"data_path": os.path.join("prm800k", "math_splits", "train.jsonl")}) testset = Dataset.from_generator(generate_data, gen_kwargs={"data_path": os.path.join("prm800k", "math_splits", "test.jsonl")}) dataset = DatasetDict({"train": trainset, "test": testset}) dataset.push_to_hub("hiyouga/math12k") if __name__ == "__main__": main() ```
提供机构:
yanzihan1
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作