ocr_finetune_example

Name: ocr_finetune_example
Creator: maas
Published: 2025-12-05 16:55:13
License: 暂无描述

魔搭社区2025-12-05 更新2025-12-06 收录

下载链接：

https://modelscope.cn/datasets/datalab-to/ocr_finetune_example

下载链接

链接失效反馈

官方服务：

资源简介：

# Example Dataset for Surya OCR Finetuning This dataset is an example that lays out the expected format for finetuning Surya OCR. ## Data Requirements Image column: The input images (full pages, blocks, or single text lines — mix freely). Text column: The transcription corresponding to each image. For math content, ensure <math display="inline"></math> or <math display="block"></math> tags are wrapped around the latex ## Surya OCR supports: Various aspect ratios Different image types and qualities Full-page documents Cropped blocks of text Single-line snippets The base surya model is trained on a wide range of samples from all these categories, and you can combine any of these types in your training dataset for more robust performance, as demonstrated in this example dataset.

# 用于Surya光学字符识别（Optical Character Recognition，OCR）微调的示例数据集本数据集为阐明Surya OCR微调标准格式的示例。 ## 数据要求图像列：输入图像（可自由组合全页文档、文本块或单行文本）。文本列：与每张输入图像对应的转录文本。针对数学内容，需使用<math display="inline"></math>或<math display="block"></math>标签包裹LaTeX代码。 ## Surya OCR支持：多种宽高比不同类型与质量的图像全页文档裁剪后的文本块单行文本片段基础Surya模型已基于上述各类别的大量样本完成训练，您可在训练数据集中自由组合上述任意类型的数据，以获得更鲁棒的模型性能，本示例数据集即演示了这一做法。

提供机构：

maas

创建时间：

2025-10-22

5,000+

优质数据集

54 个

任务类型

进入经典数据集