davanstrien/encyclopaedia_britannica_illustrated-dots-ocr
收藏Hugging Face2025-10-22 更新2025-10-25 收录
下载链接:
https://hf-mirror.com/datasets/davanstrien/encyclopaedia_britannica_illustrated-dots-ocr
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含了使用DoTS.ocr模型从《大英百科全书》插图版(NationalLibraryOfScotland/encyclopaedia_britannica_illustrated)图像中提取的OCR结果。这是一个多语言文档解析模型,支持100多种语言,能够识别结构化数据、数学公式,并保持文档的阅读顺序和结构。数据集共有100个样本,经过处理以Markdown格式提取文本。
This dataset contains OCR results extracted from images in NationalLibraryOfScotland/encyclopaedia_britannica_illustrated using the DoTS.ocr model, a compact 1.7B parameter multilingual document parsing model that supports over 100 languages and is capable of recognizing structured data, mathematical formulas, and preserving the reading order and structure of documents. The dataset consists of 100 samples that have been processed to extract text in Markdown format.
提供机构:
davanstrien



