datalab-to/ocr_finetune_example
收藏Hugging Face2025-08-08 更新2025-09-13 收录
下载链接:
https://hf-mirror.com/datasets/datalab-to/ocr_finetune_example
下载链接
链接失效反馈官方服务:
资源简介:
这是一个用于微调Surya OCR模型的数据集。数据集包含图片和对应的文本转录。图片可以是整页文档、文本块或单行文本。对于数学内容,需要使用特定的LaTeX标记。数据集支持各种宽高比、不同类型和质量的照片,可以通过组合不同类型的样本以提高模型的鲁棒性。
This dataset is for finetuning the Surya OCR model. It consists of images and their corresponding text transcriptions. Images can be full-page documents, text blocks, or single-line snippets. Math content needs to be marked with specific LaTeX tags. The dataset supports various aspect ratios, image types, and qualities, and you can enhance model robustness by combining different sample types.
提供机构:
datalab-to



