five

deepcopy/UniMER

收藏
Hugging Face2025-06-18 更新2025-10-25 收录
下载链接:
https://hf-mirror.com/datasets/deepcopy/UniMER
下载链接
链接失效反馈
官方服务:
资源简介:
UniMER数据集是专门为通用数学表达式识别(MER)发布的数据集。它包含了真实全面的UniMER-1M训练集,拥有超过一百万个代表广泛和复杂数学表达式的实例,以及精心设计的UniMER测试集,用于在真实世界场景中评估MER模型。数据集详情如下:- UniMER-1M 训练集:总样本数1,061,791,组成简洁与复杂、扩展公式表达式的平衡融合,目标帮助训练鲁棒性强、高精度的MER模型,增强识别准确性和模型泛化能力。- UniMER 测试集:总样本数23,757,分为简单印刷表达式(SPE)、复杂印刷表达式(CPE)、屏幕截图表达式(SCE)和手写表达式(HWE)四种类型,用于全面评估真实场景下各类公式识别能力。

The UniMER dataset is a specialized collection curated to advance the field of Mathematical Expression Recognition (MER). It includes the comprehensive UniMER-1M training set with over one million instances representing a diverse and intricate range of mathematical expressions, as well as the meticulously designed UniMER Test Set for benchmarking MER models against real-world scenarios. Details of the dataset are as follows: - UniMER-1M Training Set: Total samples 1,061,791, composed of a balanced mix of concise and complex, extended formula expressions, aimed at training robust, high-accuracy MER models to enhance recognition precision and generalization. - UniMER Test Set: Total samples 23,757, categorized into four types of expressions: Simple Printed Expressions (SPE), Complex Printed Expressions (CPE), Screen Capture Expressions (SCE), and Handwritten Expressions (HWE), to provide a thorough evaluation of MER models across a spectrum of real-world conditions.
提供机构:
deepcopy
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作