dh-unibe/image-text_koenigsfelden-charters-part-2

Name: dh-unibe/image-text_koenigsfelden-charters-part-2
Creator: dh-unibe
Published: 2026-03-16 18:46:24
License: 暂无描述

Hugging Face2026-03-16 更新2026-04-05 收录

下载链接：

https://hf-mirror.com/datasets/dh-unibe/image-text_koenigsfelden-charters-part-2

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: config_name: default features: - name: image dtype: image: decode: false - name: xml_content dtype: string - name: filename dtype: string - name: project_name dtype: string splits: - name: train num_examples: 68 num_bytes: 2155475424 download_size: 2155475424 dataset_size: 2155475424 configs: - config_name: default data_files: - split: train path: data/train/**/*.parquet tags: - image-to-text - htr - trocr - transcription - pagexml license: mit language: - de - la pretty_name: Early Modern German --- # Dataset Card for transkribus-exports-6259-raw-xml This dataset was created using pagexml-hf converter from Transkribus PageXML data. ## Dataset Summary This dataset contains 68 samples across 1 split(s). Based on the Königsfelden Data Set (https://zenodo.org/records/5179361). PageXML representation of the text on the charters. For the transcription guidelines, see: (https://koenigsfelden.sources-online.org/intro.html). Geographical scope: Switzerland Period: 1300-1350 Languages: German Type of document: Protocols Provenance: State Archives of Zurich ### Projects Included - u-17_0208 - u-17_0223 - u-17_0246 - u-17_0249 - u-17_0251 - u-17_0252 - u-17_0254a - u-17_0255 - u-17_0266a - u-17_0267 - u-17_0268 - u-17_0273 - u-17_0276a - u-17_0277_01 - u-17_0277_02 - u-17_0281 - u-17_0288 - u-17_0289 - u-17_0301 - u-17_0303 - u-17_0306a - u-17_0307 - u-17_0307a - u-17_0309 - u-17_0314 - u-17_0315 - u-17_0316 - u-17_0317 - u-17_0318 - u-17_0319 - u-17_0320 - u-17_0322 - u-17_0323 - u-17_0335 ## Dataset Structure ### Data Splits - **train**: 68 samples ### Dataset Size - Approximate total size: 2055.62 MB - Total samples: 68 ### Features - **image**: `Image(mode=None, decode=False)` - **xml_content**: `Value('string')` - **filename**: `Value('string')` - **project_name**: `Value('string')` ## Data Organization Data is organized as parquet shards by split and project: ``` data/ ├── <split>/ │ └── <project_name>/ │ └── <timestamp>-<shard>.parquet ``` The HuggingFace Hub automatically merges all parquet files when loading the dataset. ## Usage ```python from datasets import load_dataset # Load entire dataset dataset = load_dataset("dh-unibe/transkribus-exports-6259-raw-xml") # Load specific split train_dataset = load_dataset("dh-unibe/transkribus-exports-6259-raw-xml", split="train") ```

提供机构：

dh-unibe

5,000+

优质数据集

54 个

任务类型

进入经典数据集