five

NationalLibraryOfScotland/nls-index-cards-object-detection

收藏
Hugging Face2026-03-25 更新2026-04-05 收录
下载链接:
https://hf-mirror.com/datasets/NationalLibraryOfScotland/nls-index-cards-object-detection
下载链接
链接失效反馈
官方服务:
资源简介:
--- task_categories: - object-detection language: - en tags: - GLAM - libraries - cultural-heritage - index-cards - manuscripts - YOLO pretty_name: NLS Advocates Library Index Card Detection Dataset size_categories: - n<1K dataset_info: features: - name: image dtype: image - name: filename dtype: string - name: objects struct: - name: bbox list: list: float32 length: 4 - name: category list: class_label: names: '0': index_card - name: has_card dtype: bool - name: width dtype: int32 - name: height dtype: int32 splits: - name: train num_bytes: 56614096 num_examples: 100 download_size: 56602234 dataset_size: 56614096 configs: - config_name: default data_files: - split: train path: data/train-* --- # NLS Advocates Library Index Card Detection Dataset An object detection dataset for identifying index cards on scanned pages from the Advocates Library manuscript collection at the [National Library of Scotland](https://www.nls.uk/). ## Dataset Description This dataset contains 100 annotated scanned page images from the Advocates Library manuscript index card collection. Each image is annotated with bounding boxes marking the location of index cards on the page, enabling training of object detection models to automatically crop cards from full page scans. The dataset was created as part of an AI consultancy with NLS to build metadata extraction pipelines for library collections. ## Features | Feature | Type | Description | |---------|------|-------------| | `image` | Image | Scanned page image | | `filename` | string | Original filename | | `objects.bbox` | list of [x, y, w, h] | COCO-format bounding boxes | | `objects.category` | ClassLabel | Single class: `index_card` | | `has_card` | bool | Whether the page contains an index card | | `width` | int32 | Image width in pixels | | `height` | int32 | Image height in pixels | Pages without index cards (e.g. blank pages, dividers, covers) have `has_card: false` and empty bounding box lists. This allows the detector to learn what to skip. ## Use Cases - Training object detection models (e.g. YOLO) to identify and crop index cards from scanned pages - Preprocessing step for downstream metadata extraction pipelines - Benchmarking detection approaches for similar archival card collections ## Trained Model This dataset was used to train a YOLOv11n model that achieves **99.2% mAP@50-95**: [NationalLibraryOfScotland/archival-index-card-detector](https://huggingface.co/NationalLibraryOfScotland/archival-index-card-detector) ## Related Resources - **Trained detector**: [NationalLibraryOfScotland/archival-index-card-detector](https://huggingface.co/NationalLibraryOfScotland/archival-index-card-detector) - **Documentation**: [AI Design Patterns for Information Professionals](https://danielvanstrien.xyz/ai-patterns-for-glam/) ## Source National Library of Scotland, Advocates Library manuscript collection. Bounding box annotations created using SAM3 with manual correction. ## License To be determined by the National Library of Scotland.
提供机构:
NationalLibraryOfScotland
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作