five

davanstrien/falcon-ocr-layout-test

收藏
Hugging Face2026-04-08 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/davanstrien/falcon-ocr-layout-test
下载链接
链接失效反馈
官方服务:
资源简介:
--- tags: - ocr - document-processing - falcon-ocr - layout - uv-script - generated --- # Document Processing using Falcon OCR (layout mode) This dataset contains OCR results from images in [davanstrien/ufo-ColPali](https://huggingface.co/datasets/davanstrien/ufo-ColPali) using [Falcon OCR](https://huggingface.co/tiiuae/Falcon-OCR), a 0.3B early-fusion vision-language model. ## Processing Details - **Source Dataset**: [davanstrien/ufo-ColPali](https://huggingface.co/datasets/davanstrien/ufo-ColPali) - **Model**: [tiiuae/Falcon-OCR](https://huggingface.co/tiiuae/Falcon-OCR) - **Task Mode**: `layout` - Layout-aware OCR (region detection + per-region extraction) - **Number of Samples**: 5 - **Processing Time**: 0.9 min - **Processing Date**: 2026-04-08 05:50 UTC - **Backend**: falcon-perception (OCRInferenceEngine) ## Reproduction ```bash uv run https://huggingface.co/datasets/uv-scripts/ocr/raw/main/falcon-ocr.py \ davanstrien/ufo-ColPali \ <output-dataset> \ --task-mode layout \ --image-column image ``` Generated with [UV Scripts](https://huggingface.co/uv-scripts)
提供机构:
davanstrien
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作