Evanstarcraft2/latex-formulas
收藏Hugging Face2025-12-13 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/Evanstarcraft2/latex-formulas
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含两个子集:raw_formulas和cleaned_formulas。raw_formulas数据集是从arxiv爬取的约100万未清洗且未分段的LaTeX公式图像-文本对。cleaned_formulas数据集是通过清洗raw_formulas数据集并与im2latex-100K数据集整合得到的,包含55万公式-图像对。数据集主要用于OCR、LaTeX-OCR和图像到LaTeX的转换任务。
There are two datasets: raw_formulas and cleaned_formulas. The raw_formulas dataset consists of approximately 1 million uncleaned and unsegmented LaTeX formula image-text pairs scraped from arxiv. The cleaned_formulas dataset is obtained by cleaning the raw_formulas dataset and integrating it with the im2latex-100K dataset, containing 550K formula-image pairs. The dataset is primarily used for OCR, LaTeX-OCR, and image-to-LaTeX conversion tasks.
提供机构:
Evanstarcraft2



