five

turtle170/Axiom-30

收藏
Hugging Face2026-01-18 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/turtle170/Axiom-30
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - en license: apache-2.0 task_categories: - text-generation - question-answering tags: - mathematics - reasoning - chain-of-thought - instruction-tuning - synthetic-data - axiomatic-reasoning - filtered - extreme-quality - ChatML pretty_name: Axiom-30 size_categories: - 10M<n<100M --- **Dataset Details:** Axiom-30 is a dataset containing 30 million samples, all above 500 characters, adapted from the OpenMath, OpenThought, and UltraChat datasets. The dataset is separated into 3 sections: The math section, The thought section, and finally the chat section. **Important Notes:** The uncompressed .jsonl file from the Gzip file will be around 125GB. This datasethas three distinct formatting. The pytho code for sorting them is: def formatting_prompts_func(examples): instructions = examples["problem"] if "problem" in examples else examples["instruction"] # If it's UltraChat, we might need to pull from 'messages' responses = examples["generated_solution"] if "generated_solution" in examples else examples["output"] texts = [] for instr, resp in zip(instructions, responses): # This converts everything to ChatML on-the-fly! text = f"<|im_start|>user\n{instr}<|im_end|>\n<|im_start|>assistant\n{resp}<|im_end|>" texts.append(text) return { "text": texts }
提供机构:
turtle170
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作