five

CROHME+

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14968569
下载链接
链接失效反馈
官方服务:
资源简介:
Overview This dataset, introduced in the paper "The Return of Structural Approaches in Handwriting Recognition", builds upon the original CROHME dataset by automatically generating detailed structural annotations that map handwritten traces to their corresponding symbols. For files where trace group information was missing or incorrect, we have added or corrected these details to enable precise symbol classification and segmentation. In addition, for files lacking MathML annotations, we incorporated the necessary relational information to accurately represent the structure of the mathematical expressions. These enhancements not only improve error analysis and interpretability but also support more effective spatially aware applications. Dataset Breakdown This dataset extends the CROHME 2023 dataset by adding structural annotations to the syntactical and artificial training data - most real data was already manually labeled (with only minor issues in a few real train, validation, and test files). Syntactical Data: Generated by extracting and recombining valid sub-expressions from handwritten expressions to ensure syntactic correctness (trace groups were already annotated, but MathML annotation was missing). Artificial Data: Computationally produced from LaTeX sequences, these samples often had wrong trace group annotations and were lacking MathML annotations. Dataset Type Annotated Count Total Count Syntactical ~69,000 ~76,000 Artificial ~55,000 ~72,300 ⚠️ Note: The provided dataset only includes the correctly annotated expressions. It is intended to complement the original CROHME dataset, which already contains manually labelled annotations. Additional Resources A similar structural annotation approach was applied to the original MathWriting dataset, resulting in the enhanced MathWriting+ dataset with annotated equations. Usage and Citation If you utilize this dataset for your research or development work, please cite the paper: "The Return of Structural Approaches in Handwriting Recognition" Citing the work not only acknowledges the effort behind creating these annotations but also supports further research into interpretable HMER systems. Disclaimer While extensive validation and cross-checking have been performed, the dataset might contain a few minor mistakes. Nonetheless, the vast majority of the data is highly accurate and is expected to be a valuable resource for advancing recognition dependent on detailed annotation in handwritten mathematical expressions.
创建时间:
2025-03-20
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作