dh-unibe/image-text_koenigsfelden-charters-part-2
收藏Hugging Face2026-03-16 更新2026-04-05 收录
下载链接:
https://hf-mirror.com/datasets/dh-unibe/image-text_koenigsfelden-charters-part-2
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
config_name: default
features:
- name: image
dtype:
image:
decode: false
- name: xml_content
dtype: string
- name: filename
dtype: string
- name: project_name
dtype: string
splits:
- name: train
num_examples: 68
num_bytes: 2155475424
download_size: 2155475424
dataset_size: 2155475424
configs:
- config_name: default
data_files:
- split: train
path: data/train/**/*.parquet
tags:
- image-to-text
- htr
- trocr
- transcription
- pagexml
license: mit
language:
- de
- la
pretty_name: Early Modern German
---
# Dataset Card for transkribus-exports-6259-raw-xml
This dataset was created using pagexml-hf converter from Transkribus PageXML data.
## Dataset Summary
This dataset contains 68 samples across 1 split(s).
Based on the Königsfelden Data Set (https://zenodo.org/records/5179361).
PageXML representation of the text on the charters. For the transcription guidelines, see: (https://koenigsfelden.sources-online.org/intro.html).
Geographical scope: Switzerland<br>Period: 1300-1350<br>Languages: German<br>Type of document: Protocols<br>Provenance: State Archives of Zurich<br>
### Projects Included
- u-17_0208
- u-17_0223
- u-17_0246
- u-17_0249
- u-17_0251
- u-17_0252
- u-17_0254a
- u-17_0255
- u-17_0266a
- u-17_0267
- u-17_0268
- u-17_0273
- u-17_0276a
- u-17_0277_01
- u-17_0277_02
- u-17_0281
- u-17_0288
- u-17_0289
- u-17_0301
- u-17_0303
- u-17_0306a
- u-17_0307
- u-17_0307a
- u-17_0309
- u-17_0314
- u-17_0315
- u-17_0316
- u-17_0317
- u-17_0318
- u-17_0319
- u-17_0320
- u-17_0322
- u-17_0323
- u-17_0335
## Dataset Structure
### Data Splits
- **train**: 68 samples
### Dataset Size
- Approximate total size: 2055.62 MB
- Total samples: 68
### Features
- **image**: `Image(mode=None, decode=False)`
- **xml_content**: `Value('string')`
- **filename**: `Value('string')`
- **project_name**: `Value('string')`
## Data Organization
Data is organized as parquet shards by split and project:
```
data/
├── <split>/
│ └── <project_name>/
│ └── <timestamp>-<shard>.parquet
```
The HuggingFace Hub automatically merges all parquet files when loading the dataset.
## Usage
```python
from datasets import load_dataset
# Load entire dataset
dataset = load_dataset("dh-unibe/transkribus-exports-6259-raw-xml")
# Load specific split
train_dataset = load_dataset("dh-unibe/transkribus-exports-6259-raw-xml", split="train")
```
提供机构:
dh-unibe



