NationalLibraryOfScotland/nls-index-cards-object-detection
收藏Hugging Face2026-03-25 更新2026-04-05 收录
下载链接:
https://hf-mirror.com/datasets/NationalLibraryOfScotland/nls-index-cards-object-detection
下载链接
链接失效反馈官方服务:
资源简介:
---
task_categories:
- object-detection
language:
- en
tags:
- GLAM
- libraries
- cultural-heritage
- index-cards
- manuscripts
- YOLO
pretty_name: NLS Advocates Library Index Card Detection Dataset
size_categories:
- n<1K
dataset_info:
features:
- name: image
dtype: image
- name: filename
dtype: string
- name: objects
struct:
- name: bbox
list:
list: float32
length: 4
- name: category
list:
class_label:
names:
'0': index_card
- name: has_card
dtype: bool
- name: width
dtype: int32
- name: height
dtype: int32
splits:
- name: train
num_bytes: 56614096
num_examples: 100
download_size: 56602234
dataset_size: 56614096
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
---
# NLS Advocates Library Index Card Detection Dataset
An object detection dataset for identifying index cards on scanned pages from the Advocates Library manuscript collection at the [National Library of Scotland](https://www.nls.uk/).
## Dataset Description
This dataset contains 100 annotated scanned page images from the Advocates Library manuscript index card collection. Each image is annotated with bounding boxes marking the location of index cards on the page, enabling training of object detection models to automatically crop cards from full page scans.
The dataset was created as part of an AI consultancy with NLS to build metadata extraction pipelines for library collections.
## Features
| Feature | Type | Description |
|---------|------|-------------|
| `image` | Image | Scanned page image |
| `filename` | string | Original filename |
| `objects.bbox` | list of [x, y, w, h] | COCO-format bounding boxes |
| `objects.category` | ClassLabel | Single class: `index_card` |
| `has_card` | bool | Whether the page contains an index card |
| `width` | int32 | Image width in pixels |
| `height` | int32 | Image height in pixels |
Pages without index cards (e.g. blank pages, dividers, covers) have `has_card: false` and empty bounding box lists. This allows the detector to learn what to skip.
## Use Cases
- Training object detection models (e.g. YOLO) to identify and crop index cards from scanned pages
- Preprocessing step for downstream metadata extraction pipelines
- Benchmarking detection approaches for similar archival card collections
## Trained Model
This dataset was used to train a YOLOv11n model that achieves **99.2% mAP@50-95**: [NationalLibraryOfScotland/archival-index-card-detector](https://huggingface.co/NationalLibraryOfScotland/archival-index-card-detector)
## Related Resources
- **Trained detector**: [NationalLibraryOfScotland/archival-index-card-detector](https://huggingface.co/NationalLibraryOfScotland/archival-index-card-detector)
- **Documentation**: [AI Design Patterns for Information Professionals](https://danielvanstrien.xyz/ai-patterns-for-glam/)
## Source
National Library of Scotland, Advocates Library manuscript collection. Bounding box annotations created using SAM3 with manual correction.
## License
To be determined by the National Library of Scotland.
提供机构:
NationalLibraryOfScotland



