ShaitanRa/pascal-stahl-numarkdown
收藏Hugging Face2026-04-29 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/ShaitanRa/pascal-stahl-numarkdown
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含了使用NuMarkdown-8B-Thinking模型对[ShaitanRa/PascalStahl](https://huggingface.co/datasets/ShaitanRa/PascalStahl)数据集中的图像进行OCR处理后生成的markdown格式文本。处理详情包括源数据集、处理模型(numind/NuMarkdown-8B-Thinking)、样本数量(5个)、处理时间(4.1分钟)等。NuMarkdown-8B-Thinking是一个先进的基于推理的文档OCR模型,擅长处理复杂表格、数学公式、文档结构等,并生成格式良好的markdown输出。数据集结构包含所有原始列以及新增的markdown列(提取的markdown格式文本)和inference_info列(记录应用于该数据集的所有OCR模型的JSON列表)。
This dataset contains markdown-formatted OCR results from images in [ShaitanRa/PascalStahl](https://huggingface.co/datasets/ShaitanRa/PascalStahl) using NuMarkdown-8B-Thinking. Processing details include the source dataset, model (numind/NuMarkdown-8B-Thinking), number of samples (5), processing time (4.1 minutes), etc. NuMarkdown-8B-Thinking is a state-of-the-art reasoning-based document OCR model that excels at processing complex tables, mathematical formulas, document structures, etc., and generates well-formatted markdown output. The dataset structure contains all original columns plus markdown column (extracted text in markdown format) and inference_info column (JSON list tracking all OCR models applied to this dataset).
提供机构:
ShaitanRa



