turtle170/Axiom-30

Name: turtle170/Axiom-30
Creator: turtle170
Published: 2026-01-18 08:22:00
License: 暂无描述

Hugging Face2026-01-18 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/turtle170/Axiom-30

下载链接

链接失效反馈

官方服务：

资源简介：

--- language: - en license: apache-2.0 task_categories: - text-generation - question-answering tags: - mathematics - reasoning - chain-of-thought - instruction-tuning - synthetic-data - axiomatic-reasoning - filtered - extreme-quality - ChatML pretty_name: Axiom-30 size_categories: - 10M<n<100M --- **Dataset Details:** Axiom-30 is a dataset containing 30 million samples, all above 500 characters, adapted from the OpenMath, OpenThought, and UltraChat datasets. The dataset is separated into 3 sections: The math section, The thought section, and finally the chat section. **Important Notes:** The uncompressed .jsonl file from the Gzip file will be around 125GB. This datasethas three distinct formatting. The pytho code for sorting them is: def formatting_prompts_func(examples): instructions = examples["problem"] if "problem" in examples else examples["instruction"] # If it's UltraChat, we might need to pull from 'messages' responses = examples["generated_solution"] if "generated_solution" in examples else examples["output"] texts = [] for instr, resp in zip(instructions, responses): # This converts everything to ChatML on-the-fly! text = f"<|im_start|>user\n{instr}<|im_end|>\n<|im_start|>assistant\n{resp}<|im_end|>" texts.append(text) return { "text": texts }

提供机构：

turtle170

5,000+

优质数据集

54 个

任务类型

进入经典数据集