five

im2latex 230k

收藏
NIAID Data Ecosystem2026-03-14 收录
下载链接:
https://zenodo.org/record/7480548
下载链接
链接失效反馈
官方服务:
资源简介:
The dataset comprises of over 230,000 LaTeX math formulas and their corresponding .png images. The images vary in size and have a resolution of 72dpi. These formulas were extracted from LaTeX sources, originally from arXiv, and were parsed to create the dataset. The dataset size has been increased from 180,000 to 230,000 in version 3. The dataset was generated using a tool built with JavaScript and Python, which is available on GitHub. For further details, please refer to the following link: https://github.com/gmarus777/Printed-Latex-Data-Generation Formulas were parsed from LaTeX sources provided here: http://www.cs.cornell.edu/projects/kddcup/datasets.html(originally from  arXiv).  Contents: - folder `generated_png_images` contains PNG images - `corresponding_png_images.txt` each new line contains png images filename for the folder `generated_png_images` - `final_png_formulas.txt` each new line contains a corresponing LaTex formula - `230k.json` contains a vocabulary consisting of 579 tokens.   Version 3 updates: -- Dataset size increase to 230k (from 180k)
创建时间:
2023-03-17
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作