CROHME+
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14968569
下载链接
链接失效反馈官方服务:
资源简介:
Overview
This dataset, introduced in the paper "The Return of Structural Approaches in Handwriting Recognition", builds upon the original CROHME dataset by automatically generating detailed structural annotations that map handwritten traces to their corresponding symbols. For files where trace group information was missing or incorrect, we have added or corrected these details to enable precise symbol classification and segmentation. In addition, for files lacking MathML annotations, we incorporated the necessary relational information to accurately represent the structure of the mathematical expressions. These enhancements not only improve error analysis and interpretability but also support more effective spatially aware applications.
Dataset Breakdown
This dataset extends the CROHME 2023 dataset by adding structural annotations to the syntactical and artificial training data - most real data was already manually labeled (with only minor issues in a few real train, validation, and test files).
Syntactical Data: Generated by extracting and recombining valid sub-expressions from handwritten expressions to ensure syntactic correctness (trace groups were already annotated, but MathML annotation was missing).
Artificial Data: Computationally produced from LaTeX sequences, these samples often had wrong trace group annotations and were lacking MathML annotations.
Dataset Type
Annotated Count
Total Count
Syntactical
~69,000
~76,000
Artificial
~55,000
~72,300
⚠️ Note: The provided dataset only includes the correctly annotated expressions. It is intended to complement the original CROHME dataset, which already contains manually labelled annotations.
Additional Resources
A similar structural annotation approach was applied to the original MathWriting dataset, resulting in the enhanced MathWriting+ dataset with annotated equations.
Usage and Citation
If you utilize this dataset for your research or development work, please cite the paper:
"The Return of Structural Approaches in Handwriting Recognition"
Citing the work not only acknowledges the effort behind creating these annotations but also supports further research into interpretable HMER systems.
Disclaimer
While extensive validation and cross-checking have been performed, the dataset might contain a few minor mistakes. Nonetheless, the vast majority of the data is highly accurate and is expected to be a valuable resource for advancing recognition dependent on detailed annotation in handwritten mathematical expressions.
创建时间:
2025-03-20



