EssentialAI/eai-taxonomy-math-w-fm
收藏Hugging Face2025-06-22 更新2025-07-05 收录
下载链接:
https://hf-mirror.com/datasets/EssentialAI/eai-taxonomy-math-w-fm
下载链接
链接失效反馈官方服务:
资源简介:
Taxonomy Math w/ FM 是一个高质量的数学数据集,通过基于分类法的过滤从网络数据中获取,包含 340 亿个数学内容的令牌。该数据集是 Essential-Web 项目的一部分,该项目引入了一种使用丰富的元数据和简单的语义过滤器进行数据集整理的新范例。与需要复杂特定领域管道的传统数学数据集不同,我们的方法利用了一个 12 类别的分类法来高效地识别和提取高质量的数学内容。
Taxonomy Math w/ FM is a high-quality mathematics dataset curated from web data using taxonomy-based filtering, containing 34 billion tokens of mathematical content. This dataset is part of the Essential-Web project, which introduces a new paradigm for dataset curation using expressive metadata and simple semantic filters. Unlike traditional math datasets that require complex domain-specific pipelines, our approach leverages a 12-category taxonomy to efficiently identify and extract high-quality mathematical content.
提供机构:
EssentialAI



