deepcopy/UniMER
收藏Hugging Face2025-06-18 更新2025-10-25 收录
下载链接:
https://hf-mirror.com/datasets/deepcopy/UniMER
下载链接
链接失效反馈官方服务:
资源简介:
UniMER数据集是专门为通用数学表达式识别(MER)发布的数据集。它包含了真实全面的UniMER-1M训练集,拥有超过一百万个代表广泛和复杂数学表达式的实例,以及精心设计的UniMER测试集,用于在真实世界场景中评估MER模型。数据集详情如下:- UniMER-1M 训练集:总样本数1,061,791,组成简洁与复杂、扩展公式表达式的平衡融合,目标帮助训练鲁棒性强、高精度的MER模型,增强识别准确性和模型泛化能力。- UniMER 测试集:总样本数23,757,分为简单印刷表达式(SPE)、复杂印刷表达式(CPE)、屏幕截图表达式(SCE)和手写表达式(HWE)四种类型,用于全面评估真实场景下各类公式识别能力。
The UniMER dataset is a specialized collection curated to advance the field of Mathematical Expression Recognition (MER). It includes the comprehensive UniMER-1M training set with over one million instances representing a diverse and intricate range of mathematical expressions, as well as the meticulously designed UniMER Test Set for benchmarking MER models against real-world scenarios. Details of the dataset are as follows: - UniMER-1M Training Set: Total samples 1,061,791, composed of a balanced mix of concise and complex, extended formula expressions, aimed at training robust, high-accuracy MER models to enhance recognition precision and generalization. - UniMER Test Set: Total samples 23,757, categorized into four types of expressions: Simple Printed Expressions (SPE), Complex Printed Expressions (CPE), Screen Capture Expressions (SCE), and Handwritten Expressions (HWE), to provide a thorough evaluation of MER models across a spectrum of real-world conditions.
提供机构:
deepcopy



