Riksarkivet/diplomatics-lance
收藏Hugging Face2026-04-01 更新2026-04-05 收录
下载链接:
https://hf-mirror.com/datasets/Riksarkivet/diplomatics-lance
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-4.0
language:
- la
- sv
- de
- da
- "no"
- fr
tags:
- medieval
- manuscripts
- diplomatics
- archives
- riksarkivet
- lance
- lancedb
- htr
- handwritten-text-recognition
- iiif
size_categories:
- 10K<n<100K
---
# Diplomatics Lance — Medieval Documents from the Swedish National Archives
Two [LanceDB](https://lancedb.com/) tables with metadata for medieval documents held by [Riksarkivet](https://riksarkivet.se/) (the Swedish National Archives). Queryable remotely — no download required.
## Quick start
```python
import lancedb
db = lancedb.connect("hf://datasets/Riksarkivet/diplomatics-lance")
# List tables
db.list_tables() # ['mpo', 'sdhk']
# Full-text search
sdhk = db.open_table("sdhk")
results = sdhk.search("kung Magnus", query_type="fts").limit(10).to_arrow()
mpo = db.open_table("mpo")
results = mpo.search("Missale", query_type="fts").limit(10).to_arrow()
```
---
## Tables
### `mpo` — Medeltida PergamentOmslag (22,909 records)
A database of medieval book fragments (*Medeltida PergamentOmslag*, MPO).
The Swedish National Archives houses a collection of many thousand medieval book fragments. The fragments were used as wrappers for the accounts of the Swedish administration from the 1530s until around 1630. The books came from medieval churches and monasteries and became useless after the Reformation. At the same time the bailiffs of King Gustav Vasa needed strong wrappers for their tax records, and for that purpose the leaves of the medieval books were suitable. The number of the medieval book fragments amount to c. 23,000, most of which are still placed as wrappers for the same accounts as in the 16th century.
The fragments are the remains of around 11,000 different books. The oldest date to the 11th century, the youngest to the 16th century. Many of the books have come to Sweden from abroad, others have been produced here. Most of them are service-books, but there are also leaves from theological and law manuscripts.
The collection was catalogued in the years 1995–2004 in a project called the MPO project. Half of the fragments had been catalogued earlier in a card catalogue called the *Catalogus Codicum Mutilorum* (CCM). The text of the database is mainly written in German as German guidelines of manuscript description were followed.
**Fields:** `id`, `bildvisning_url`, `institution`, `institution_detail`, `ra_number`, `ccm_signum`, `collection`, `volume_signature`, `decoration`, `material`, `leaf_count`, `column_count`, `line_count`, `format_size`, `writing_space`, `damage`, `quire_notes`, `script`, `rubrication`, `notation`, `staff_lines`, `notes`, `manuscript_type`, `category`, `title`, `author`, `origin_place`, `use_place`, `dating`, `incunabulum`, `codex`, `literature`, `content`, `iiif_manifest`, `searchable_text`, `manifest_url`
**Literature:**
- Jan Brunius, *From Manuscripts to Wrappers. Medieval Book Fragments in the Swedish National Archives*, 2013.
- Kerstin Abukhanfusa, Jan Brunius & Solbritt Benneth (eds.), *Helgerånet. Från mässböcker till munkepärmar*, 1993.
- Kerstin Abukhanfusa, *Stympade böcker. Märkvärdiga blad ur svensk bokhistoria*, 2004 (also available in English: *Mutilated books. Wondrous leaves from Swedish bibliographical history*).
---
### `sdhk` — Svenskt Diplomatariums Huvudkartotek (44,264 records)
The main index of medieval Swedish charters (*Svenskt Diplomatariums Huvudkartotek*, SDHK). The register contains information about over 44,000 letters concerning Sweden up to 1540, with searchable fields including persons, places, dates, and document summaries (*regests*).
For many entries it is possible to search the full text of the charters as printed in the main series of *Svenskt Diplomatarium* (up to 1381), the supplementary series for 1401–1420, and large parts of the appendix series *Acta Pontificum Svecica*.
SDHK always contains detailed information about the sources and transmission of the letters — how and in what form they have been preserved — and in many cases references to scholarly literature.
Of the 44,264 records:
- **15,034** have digitised images accessible via IIIF (`has_manifest = true`)
- **6,131** have full scholarly transcriptions (`has_transcription = true`)
**Fields:** `id`, `title`, `author`, `date`, `place`, `language`, `summary`, `comments`, `edition`, `seals`, `original`, `medieval_copy`, `postmedieval_copy`, `medieval_reg`, `postmedieval_reg`, `photocopy`, `printed`, `print_reg`, `facsimile`, `translation`, `additional`, `has_manifest`, `has_transcription`, `searchable_text`, `manifest_url`
**Languages:** Swedish (~47%), Latin (~33%), German (~6%), Danish (~2%), Norwegian (~2%), and others.
---
## IIIF Access
Documents with images can be viewed via Riksarkivet's IIIF viewer:
- **SDHK:** `https://sok.riksarkivet.se/bildvisning/SDHK_{id}` or via `manifest_url`
- **MPO:** via `bildvisning_url` or `manifest_url`
IIIF manifests follow the pattern:
- SDHK: `https://lbiiif.riksarkivet.se/sdhk!{id}/manifest`
- MPO: `https://lbiiif.riksarkivet.se/arkis!{bildvisning_id}/manifest`
---
## License
The data is provided under [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/). Source: Riksarkivet (Swedish National Archives).
提供机构:
Riksarkivet
搜集汇总
数据集介绍

构建方式
Diplomatics Lance数据集的构建源于瑞典国家档案馆对中世纪文献的系统性数字化与编目工作。该数据集整合了两个核心表格:MPO(Medeltida PergamentOmslag)收录了约22,909条中世纪书籍残片记录,这些残片源自16世纪税务档案的封面材料,其编目工作遵循德国手稿描述规范,以德文为主进行元数据标注;SDHK(Svenskt Diplomatariums Huvudkartotek)则包含44,264份中世纪瑞典宪章的主索引,覆盖至1540年的文献,其中部分条目附有全文转录与数字化图像。数据通过LanceDB格式存储,支持远程查询,无需本地下载,体现了档案学与数字人文的深度融合。
特点
本数据集的核心特点在于其跨学科的资源整合与多维度的可访问性。MPO表格聚焦于中世纪书籍残片,详细记录了残片的物质特征、起源地、年代及内容分类,为手稿学研究提供了罕见的物质文化视角;SDHK表格则以宪章文献为核心,涵盖人物、地点、日期及摘要等多重字段,并支持拉丁语、瑞典语等多语言全文检索。数据集通过IIIF标准实现了图像资源的互操作访问,其中SDHK部分约34%的条目配有数字化图像,约14%包含学术转录文本,为历史语言学、古文书学及数字档案分析提供了结构化且可扩展的数据基础。
使用方法
使用本数据集时,研究者可通过LanceDB的Python接口直接进行远程数据探索。初始化连接后,可列出MPO与SDHK两个表格,并利用全文检索功能查询特定关键词,如宪章中的人物名称或残片的文献类型。对于需要图像分析的场景,可通过数据集提供的IIIF链接访问高分辨率手稿图像,结合元数据中的年代、语言及保存状态字段进行跨文档比较。数据遵循CC BY 4.0协议,允许学术引用与衍生应用,特别适用于中世纪历史重建、手写文字识别模型训练以及数字典藏系统的开发。
背景与挑战
背景概述
Diplomatics-Lance数据集由瑞典国家档案馆(Riksarkivet)于2024年前后构建并发布,旨在整合中世纪文献的数字化元数据。该数据集包含两个核心表格:MPO(Medeltida PergamentOmslag)记录了约22,909条中世纪书籍残片信息,源自1995年至2004年的编目项目;SDHK(Svenskt Diplomatariums Huvudkartotek)收录了44,264份瑞典中世纪特许状的主索引,覆盖至1540年的历史文献。这些资料涉及拉丁语、瑞典语、德语等多种语言,通过LanceDB技术实现远程查询,为古文书学、数字人文及手写文本识别研究提供了结构化数据基础,推动了中世纪档案的跨学科分析与可访问性。
当前挑战
该数据集致力于解决中世纪文献数字化中的核心挑战:一是领域问题层面,需应对多语言手写文本的识别与解析困难,尤其是古拉丁语和北欧方言的变体处理;二是构建过程中,原始档案的物理损伤、分散保存及历史编目标准不一(如MPO部分采用德语描述)导致数据整合复杂度高。此外,部分文献缺乏完整转录文本或高清图像,限制了机器学习模型训练与全文本检索的精度,对数字保存与知识提取构成持续挑战。
常用场景
经典使用场景
在中世纪文献学与数字人文研究领域,Diplomatics Lance数据集为学者提供了便捷的远程查询能力,无需下载即可通过LanceDB对瑞典国家档案馆的珍贵手稿进行全文检索。该数据集整合了MPO和SDHK两大核心表,前者涵盖约2.3万份中世纪书籍残片,后者收录超过4.4万份瑞典中世纪特许状索引,支持以拉丁语、瑞典语等多语言字段进行精细化搜索,极大地促进了手稿残片分类、特许状年代考证等经典研究场景的开展。
解决学术问题
该数据集有效解决了中世纪史研究中文献获取困难、跨语言文本分析复杂等学术问题。通过结构化元数据与可搜索文本字段,学者能够系统追溯书籍残片的起源、用途及流通轨迹,或深入分析特许状中的人物、地点与法律关系。其IIIF集成功能更实现了高分辨率数字图像的远程调阅,为手写文本识别、文献保存状态评估提供了可靠数据基础,推动了档案学、古文字学与历史语义学的跨学科融合。
衍生相关工作
基于该数据集衍生的经典工作包括对MPO项目中书籍残片的系统性编目研究,如Jan Brunius所著《From Manuscripts to Wrappers》对碎片来源与历史背景的深入阐释。在数字人文领域,学者利用SDHK的转录文本开展社会网络分析,重构中世纪瑞典的权力结构与地理联系。此外,结合HTR技术的手写体识别模型训练、基于IIIF的跨机构文献比对平台开发等工作,均依托该数据集的标准化元数据与开放访问架构,持续拓展中世纪文献研究的数字化边界。
以上内容由遇见数据集搜集并总结生成



