storytracer/dots-mocr-latin-test-output

Name: storytracer/dots-mocr-latin-test-output
Creator: storytracer
Published: 2026-04-27 11:22:34
License: 暂无描述

Hugging Face2026-04-27 更新2026-05-03 收录

下载链接：

https://hf-mirror.com/datasets/storytracer/dots-mocr-latin-test-output

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集包含了使用dots.mocr模型对[/home/seb/data/ocr/latin-test-input/](https://huggingface.co/datasets//home/seb/data/ocr/latin-test-input/)中的图像进行OCR处理的结果。dots.mocr是一个3B的多语言模型，具有卓越的文档解析和SVG生成能力。数据集包含原始列以及新增的markdown列（存储提取的文本，格式为markdown）和inference_info列（记录所有应用于该数据集的OCR模型信息）。处理细节包括源数据集、模型、样本数量、处理时间和日期等。模型支持100多种语言的文档解析、表格提取、公式识别、布局感知、网页屏幕解析、场景文本检测和SVG代码生成等功能。

This dataset contains OCR results from images in [/home/seb/data/ocr/latin-test-input/](https://huggingface.co/datasets//home/seb/data/ocr/latin-test-input/) using dots.mocr, a 3B multilingual model with SOTA document parsing and SVG generation. The dataset includes all original columns plus markdown (extracted text in markdown format) and inference_info (JSON list tracking all OCR models applied to this dataset). Processing details cover source dataset, model, number of samples, processing time, and date. The model excels at 100+ languages support, table extraction, formulas, layout-aware parsing, web screen parsing, scene text spotting, and SVG code generation.

提供机构：

storytracer

5,000+

优质数据集

54 个

任务类型

进入经典数据集