five

Limitless063/Lat

收藏
Hugging Face2025-12-24 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/Limitless063/Lat
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 dataset_info: - config_name: benchmark_complex features: - name: image dtype: image - name: latex_formula dtype: string - name: category dtype: string splits: - name: train num_bytes: 282311126.0 num_examples: 5000 download_size: 279828230 dataset_size: 282311126.0 - config_name: benchmark_matrix features: - name: image dtype: image - name: latex_formula dtype: string - name: category dtype: string splits: - name: train num_bytes: 170939510.5 num_examples: 4188 download_size: 169275462 dataset_size: 170939510.5 - config_name: benchmark_ordinary features: - name: image dtype: image - name: latex_formula dtype: string - name: category dtype: string splits: - name: train num_bytes: 58214022.0 num_examples: 5000 download_size: 57877559 dataset_size: 58214022.0 - config_name: benchmark_sample features: - name: image dtype: image - name: latex_formula dtype: string - name: category dtype: string splits: - name: train num_bytes: 34419755.0 num_examples: 5000 download_size: 34298656 dataset_size: 34419755.0 - config_name: benchmark_symbol features: - name: image dtype: image - name: latex_formula dtype: string - name: category dtype: string splits: - name: train num_bytes: 2607968.0 num_examples: 5000 download_size: 2603058 dataset_size: 2607968.0 - config_name: benchmark_text_hybrid features: - name: image dtype: image - name: latex_formula dtype: string - name: category dtype: string splits: - name: train num_bytes: 80552482.0 num_examples: 5000 download_size: 80349528 dataset_size: 80552482.0 - config_name: en features: - name: image dtype: image - name: latex_formula dtype: string - name: category dtype: string splits: - name: train num_bytes: 598399572.9993577 num_examples: 72350 download_size: 504986706 dataset_size: 598399572.9993577 - config_name: handwritten_nature features: - name: image dtype: image - name: latex_formula dtype: string - name: category dtype: string splits: - name: train num_bytes: 511641864.031384 num_examples: 45956 download_size: 519076937 dataset_size: 511641864.031384 - config_name: handwritten_online features: - name: image dtype: image - name: latex_formula dtype: string - name: category dtype: string splits: - name: train num_bytes: 462501035.7377783 num_examples: 8349 download_size: 414681282 dataset_size: 462501035.7377783 - config_name: zh features: - name: image dtype: image - name: latex_formula dtype: string - name: category dtype: string splits: - name: train num_bytes: 525770211.0879239 num_examples: 45955 download_size: 517162915 dataset_size: 525770211.0879239 configs: - config_name: benchmark_complex data_files: - split: train path: benchmark_complex/train-* - config_name: benchmark_matrix data_files: - split: train path: benchmark_matrix/train-* - config_name: benchmark_ordinary data_files: - split: train path: benchmark_ordinary/train-* - config_name: benchmark_sample data_files: - split: train path: benchmark_sample/train-* - config_name: benchmark_symbol data_files: - split: train path: benchmark_symbol/train-* - config_name: benchmark_text_hybrid data_files: - split: train path: benchmark_text_hybrid/train-* - config_name: en data_files: - split: train path: en/en_*/train-* # - config_name: en # data_files: # - split: train # path: [ # "en/en_11*/train-00000-of-00002.parquet", # "en/en_31*/train-00000-of-00002.parquet", # "en/en_51*/train-00000-of-00002.parquet", # "en/en_71*/train-00000-of-00002.parquet", # "en/en_91*/train-00000-of-00002.parquet" # ] - config_name: handwritten_nature data_files: - split: train path: handwritten_nature_dataset/handwritten_nature_dataset_*/train-* - config_name: handwritten_online data_files: - split: train path: handwritten_online/handwritten_online_*/train-* - config_name: zh data_files: - split: train path: zh/zh_*/train-* --- For more details, please refer to the [𝐓𝐞𝐱𝐓𝐞𝐥𝐥𝐞𝐫 GitHub repository](https://github.com/OleehyO/TexTeller?tab=readme-ov-file). - **IMPORTANT NOTE!!!** The handwritten subset of this dataset was collected entirely from existing open source work, which includes all test sets. If you want to use this subset for your experimental ablation, please filter it yourself based on the latex label of the test set
提供机构:
Limitless063
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作