five

dh-unibe/image-text_koenigsfelden-adhr-colmar

收藏
Hugging Face2026-03-16 更新2026-04-05 收录
下载链接:
https://hf-mirror.com/datasets/dh-unibe/image-text_koenigsfelden-adhr-colmar
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: config_name: default features: - name: image dtype: image: decode: false - name: xml_content dtype: string - name: filename dtype: string - name: project_name dtype: string splits: - name: train num_examples: 223 num_bytes: 6811999732 download_size: 6811999732 dataset_size: 6811999732 configs: - config_name: default data_files: - split: train path: data/train/**/*.parquet tags: - image-to-text - htr - trocr - transcription - pagexml license: mit --- # Dataset Card for image-text_koenigsfelden-adhr-colmar This dataset was created using pagexml-hf converter from Transkribus PageXML data. ## Dataset Summary This dataset contains 223 samples across 1 split(s). Geographical scope: Switzerland<br>Period: 1300-1500<br>Languages: Middle High German, Latin<br>Type of document: Documents and files<br>Provenance: State Archives of Aargau<br> ### Projects Included - FRAD068_03G_SAINT_PIERRE_SAINT_GILLES_032_01 - FRAD068_03G_SAINT_PIERRE_SAINT_GILLES_032_02 - FRAD068_03G_SAINT_PIERRE_SAINT_GILLES_032_03 - FRAD068_03G_SAINT_PIERRE_SAINT_GILLES_032_04 - FRAD068_03G_SAINT_PIERRE_SAINT_GILLES_032_05 - FRAD068_03G_SAINT_PIERRE_SAINT_GILLES_032_06 - FRAD068_03G_SAINT_PIERRE_SAINT_GILLES_032_07 - FRAD068_03G_SAINT_PIERRE_SAINT_GILLES_032_08 - FRAD068_03G_SAINT_PIERRE_SAINT_GILLES_032_09 - FRAD068_03G_SAINT_PIERRE_SAINT_GILLES_032_10 - FRAD068_03G_SAINT_PIERRE_SAINT_GILLES_032_11 - FRAD068_03G_SAINT_PIERRE_SAINT_GILLES_032_12 - FRAD068_03G_SAINT_PIERRE_SAINT_GILLES_032_13 - FRAD068_03G_SAINT_PIERRE_SAINT_GILLES_032_14 - FRAD068_03G_SAINT_PIERRE_SAINT_GILLES_032_15 - FRAD068_03G_SAINT_PIERRE_SAINT_GILLES_032_16 - FRAD068_03G_SAINT_PIERRE_SAINT_GILLES_032_17 - FRAD068_03G_SAINT_PIERRE_SAINT_GILLES_032_18 - FRAD068_03G_SAINT_PIERRE_SAINT_GILLES_032_19 - FRAD068_03G_SAINT_PIERRE_SAINT_GILLES_032_20 - FRAD068_03G_SAINT_PIERRE_SAINT_GILLES_032_21 - FRAD068_03G_SAINT_PIERRE_SAINT_GILLES_032_22 - FRAD068_03G_SAINT_PIERRE_SAINT_GILLES_032_23 - FRAD068_03G_SAINT_PIERRE_SAINT_GILLES_032_24 - FRAD068_03G_SAINT_PIERRE_SAINT_GILLES_032_25 - FRAD068_03G_SAINT_PIERRE_SAINT_GILLES_032_26 - FRAD068_03G_SAINT_PIERRE_SAINT_GILLES_032_27 - FRAD068_03G_SAINT_PIERRE_SAINT_GILLES_032_28 - FRAD068_03G_SAINT_PIERRE_SAINT_GILLES_032_29 - FRAD068_03G_SAINT_PIERRE_SAINT_GILLES_032_30 - FRAD068_03G_SAINT_PIERRE_SAINT_GILLES_032_31 - FRAD068_03G_SAINT_PIERRE_SAINT_GILLES_032_32 - FRAD068_03G_SAINT_PIERRE_SAINT_GILLES_032_33 - FRAD068_03G_SAINT_PIERRE_SAINT_GILLES_032_34 - FRAD068_03G_SAINT_PIERRE_SAINT_GILLES_032_35 - FRAD068_03G_SAINT_PIERRE_SAINT_GILLES_032_36 - FRAD068_03G_SAINT_PIERRE_SAINT_GILLES_032_37 - FRAD068_03G_SAINT_PIERRE_SAINT_GILLES_032_38 - FRAD068_03G_SAINT_PIERRE_SAINT_GILLES_032_39 - FRAD068_03G_SAINT_PIERRE_SAINT_GILLES_032_40 - FRAD068_03G_SAINT_PIERRE_SAINT_GILLES_032_41 ## Dataset Structure ### Data Splits - **train**: 223 samples ### Dataset Size - Approximate total size: 6496.43 MB - Total samples: 223 ### Features - **image**: `Image(mode=None, decode=False)` - **xml_content**: `Value('string')` - **filename**: `Value('string')` - **project_name**: `Value('string')` ## Data Organization Data is organized as parquet shards by split and project: ``` data/ ├── <split>/ │ └── <project_name>/ │ └── <timestamp>-<shard>.parquet ``` The HuggingFace Hub automatically merges all parquet files when loading the dataset. ## Usage ```python from datasets import load_dataset # Load entire dataset dataset = load_dataset("dh-unibe/image-text_koenigsfelden-adhr-colmar") # Load specific split train_dataset = load_dataset("dh-unibe/image-text_koenigsfelden-adhr-colmar", split="train") ```
提供机构:
dh-unibe
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作