dh-unibe/image-text_koenigsfelden-adhr-colmar
收藏Hugging Face2026-03-16 更新2026-04-05 收录
下载链接:
https://hf-mirror.com/datasets/dh-unibe/image-text_koenigsfelden-adhr-colmar
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
config_name: default
features:
- name: image
dtype:
image:
decode: false
- name: xml_content
dtype: string
- name: filename
dtype: string
- name: project_name
dtype: string
splits:
- name: train
num_examples: 223
num_bytes: 6811999732
download_size: 6811999732
dataset_size: 6811999732
configs:
- config_name: default
data_files:
- split: train
path: data/train/**/*.parquet
tags:
- image-to-text
- htr
- trocr
- transcription
- pagexml
license: mit
---
# Dataset Card for image-text_koenigsfelden-adhr-colmar
This dataset was created using pagexml-hf converter from Transkribus PageXML data.
## Dataset Summary
This dataset contains 223 samples across 1 split(s).
Geographical scope: Switzerland<br>Period: 1300-1500<br>Languages: Middle High German, Latin<br>Type of document: Documents and files<br>Provenance: State Archives of Aargau<br>
### Projects Included
- FRAD068_03G_SAINT_PIERRE_SAINT_GILLES_032_01
- FRAD068_03G_SAINT_PIERRE_SAINT_GILLES_032_02
- FRAD068_03G_SAINT_PIERRE_SAINT_GILLES_032_03
- FRAD068_03G_SAINT_PIERRE_SAINT_GILLES_032_04
- FRAD068_03G_SAINT_PIERRE_SAINT_GILLES_032_05
- FRAD068_03G_SAINT_PIERRE_SAINT_GILLES_032_06
- FRAD068_03G_SAINT_PIERRE_SAINT_GILLES_032_07
- FRAD068_03G_SAINT_PIERRE_SAINT_GILLES_032_08
- FRAD068_03G_SAINT_PIERRE_SAINT_GILLES_032_09
- FRAD068_03G_SAINT_PIERRE_SAINT_GILLES_032_10
- FRAD068_03G_SAINT_PIERRE_SAINT_GILLES_032_11
- FRAD068_03G_SAINT_PIERRE_SAINT_GILLES_032_12
- FRAD068_03G_SAINT_PIERRE_SAINT_GILLES_032_13
- FRAD068_03G_SAINT_PIERRE_SAINT_GILLES_032_14
- FRAD068_03G_SAINT_PIERRE_SAINT_GILLES_032_15
- FRAD068_03G_SAINT_PIERRE_SAINT_GILLES_032_16
- FRAD068_03G_SAINT_PIERRE_SAINT_GILLES_032_17
- FRAD068_03G_SAINT_PIERRE_SAINT_GILLES_032_18
- FRAD068_03G_SAINT_PIERRE_SAINT_GILLES_032_19
- FRAD068_03G_SAINT_PIERRE_SAINT_GILLES_032_20
- FRAD068_03G_SAINT_PIERRE_SAINT_GILLES_032_21
- FRAD068_03G_SAINT_PIERRE_SAINT_GILLES_032_22
- FRAD068_03G_SAINT_PIERRE_SAINT_GILLES_032_23
- FRAD068_03G_SAINT_PIERRE_SAINT_GILLES_032_24
- FRAD068_03G_SAINT_PIERRE_SAINT_GILLES_032_25
- FRAD068_03G_SAINT_PIERRE_SAINT_GILLES_032_26
- FRAD068_03G_SAINT_PIERRE_SAINT_GILLES_032_27
- FRAD068_03G_SAINT_PIERRE_SAINT_GILLES_032_28
- FRAD068_03G_SAINT_PIERRE_SAINT_GILLES_032_29
- FRAD068_03G_SAINT_PIERRE_SAINT_GILLES_032_30
- FRAD068_03G_SAINT_PIERRE_SAINT_GILLES_032_31
- FRAD068_03G_SAINT_PIERRE_SAINT_GILLES_032_32
- FRAD068_03G_SAINT_PIERRE_SAINT_GILLES_032_33
- FRAD068_03G_SAINT_PIERRE_SAINT_GILLES_032_34
- FRAD068_03G_SAINT_PIERRE_SAINT_GILLES_032_35
- FRAD068_03G_SAINT_PIERRE_SAINT_GILLES_032_36
- FRAD068_03G_SAINT_PIERRE_SAINT_GILLES_032_37
- FRAD068_03G_SAINT_PIERRE_SAINT_GILLES_032_38
- FRAD068_03G_SAINT_PIERRE_SAINT_GILLES_032_39
- FRAD068_03G_SAINT_PIERRE_SAINT_GILLES_032_40
- FRAD068_03G_SAINT_PIERRE_SAINT_GILLES_032_41
## Dataset Structure
### Data Splits
- **train**: 223 samples
### Dataset Size
- Approximate total size: 6496.43 MB
- Total samples: 223
### Features
- **image**: `Image(mode=None, decode=False)`
- **xml_content**: `Value('string')`
- **filename**: `Value('string')`
- **project_name**: `Value('string')`
## Data Organization
Data is organized as parquet shards by split and project:
```
data/
├── <split>/
│ └── <project_name>/
│ └── <timestamp>-<shard>.parquet
```
The HuggingFace Hub automatically merges all parquet files when loading the dataset.
## Usage
```python
from datasets import load_dataset
# Load entire dataset
dataset = load_dataset("dh-unibe/image-text_koenigsfelden-adhr-colmar")
# Load specific split
train_dataset = load_dataset("dh-unibe/image-text_koenigsfelden-adhr-colmar", split="train")
```
提供机构:
dh-unibe



