CATMuS/medieval-segmentation
收藏Hugging Face2024-07-22 更新2024-07-22 收录
下载链接:
https://hf-mirror.com/datasets/CATMuS/medieval-segmentation
下载链接
链接失效反馈官方服务:
资源简介:
CATMuS Medieval Segmentation(一致的手稿转录方法)是一个专门用于中世纪手稿布局分析的数据集,使用SegmOnto词汇进行区域和行分类。该数据集旨在解决布局分析任务中建立一致真实数据的挑战,特别是针对8至15世纪的拉丁文手稿。数据集包含超过200份手稿和印刷品,涵盖10种不同语言,提供了丰富的结构注释。数据集由多个机构和项目合作开发,提供了统一的注释框架和基准测试环境,促进了数字人文领域的发展。
The CATMuS Medieval Segmentation dataset is a specialized dataset designed for layout analysis of medieval manuscripts using the SegmOnto vocabulary for region and line classification. This dataset addresses the challenges associated with establishing consistent ground truth in layout analysis tasks, particularly for the complex and heterogeneous historical sources of medieval manuscripts in Latin scripts from the 8th to the 15th century CE. It is a subset of the manuscript present in the CATMuS Medieval dataset, which focuses on HTR only. The CATMuS dataset for layout analysis provides: a uniform framework for annotation practices for the layout of medieval manuscripts; a benchmarking environment for evaluating automatic layout analysis models across multiple dimensions thanks to some metadata (for now, century of production); a benchmarking environment for other tasks (such as datation approaches); a platform for exploratory work in computer vision and digital paleography focused on layout-based tasks, such as layout generation. Developed through collaboration among various institutions and projects, CATMuS Medieval offers an inter-compatible dataset that spans over 200 manuscripts and incunabula in 10 different languages, containing a wealth of structural annotations using the SegmOnto vocabulary. By ensuring consistency in layout analysis approaches, CATMuS aims to mitigate challenges arising from the diversity in standards for medieval manuscript analysis. It provides a comprehensive benchmark for evaluating layout analysis models on historical sources, facilitating advancements in the field of digital humanities.
提供机构:
CATMuS
原始信息汇总
数据集概述
基本信息
- 名称: medieval-segmentation
- 别名: CATMuS/medieval-segmentation, CATMuS Medieval Segmentation
- 创建者: CATMuS: Consistent Approach to Transcribing ManuScripts
- 描述: CATMuS Medieval Segmentation (Consistent Approaches to Transcribing Manuscripts) 是一个专门为中世纪手稿的布局分析设计的专业数据集,使用 SegmOnto 词汇进行区域和线条分类。该数据集解决了布局分析任务中建立一致地面真相的挑战。
- 关键词: image-segmentation, object-detection, mask-generation, cc-by-4.0, 1K - 10K, imagefolder, Image, Text, Datasets, Croissant, 🇺🇸 Region: US, layout-analysis, humanities, historical-documents
- 许可证: CC BY 4.0
- URL: https://hf-mirror.com/datasets/CATMuS/medieval-segmentation
数据集结构
- 分布:
- 类型: cr:FileObject
- 名称: repo
- 描述: HF Mirror git 仓库。
- 内容URL: https://hf-mirror.com/datasets/CATMuS/medieval-segmentation/tree/refs%2Fconvert%2Fparquet
- 编码格式: git+https
- 类型: cr:FileSet
- 名称: parquet-files-for-config-default
- 描述: 由 HF Mirror 转换的基础 Parquet 文件(参见:https://hf-mirror.com/docs/datasets-server/parquet)。
- 包含于: repo
- 编码格式: application/x-parquet
- 包含: default//.parquet
- 类型: cr:FileObject
记录集
- 类型: cr:RecordSet
- 名称: default
- 描述: CATMuS/medieval-segmentation - default 子集
- 3 个分割: train, validation, test
- 1 个跳过的列: objects
- 字段:
- 类型: cr:Field
- 名称: default/image
- 描述: 来自 HF Mirror parquet 文件的图像列 image。
- 数据类型: sc:ImageObject
- 来源: 文件集 parquet-files-for-config-default,提取列 image,转换 jsonPath bytes
- 类型: cr:Field
- 名称: default/width
- 描述: 来自 HF Mirror parquet 文件的列 width。
- 数据类型: sc:Integer
- 来源: 文件集 parquet-files-for-config-default,提取列 width
- 类型: cr:Field
- 名称: default/height
- 描述: 来自 HF Mirror parquet 文件的列 height。
- 数据类型: sc:Integer
- 来源: 文件集 parquet-files-for-config-default,提取列 height
- 类型: cr:Field
- 名称: default/shelfmark
- 描述: 来自 HF Mirror parquet 文件的列 shelfmark。
- 数据类型: sc:Text
- 来源: 文件集 parquet-files-for-config-default,提取列 shelfmark
- 类型: cr:Field
- 名称: default/century
- 描述: 来自 HF Mirror parquet 文件的列 century。
- 数据类型: sc:Integer
- 来源: 文件集 parquet-files-for-config-default,提取列 century
- 类型: cr:Field
- 名称: default/project
- 描述: 来自 HF Mirror parquet 文件的列 project。
- 数据类型: sc:Text
- 来源: 文件集 parquet-files-for-config-default,提取列 project
- 类型: cr:Field
搜集汇总
数据集介绍

背景与挑战
背景概述
CATMuS/medieval-segmentation是一个专门用于中世纪手稿布局分析的数据集,使用SegmOnto词汇进行区域和行分类,支持图像分割、目标检测等任务。数据集包含约1,684个样本,覆盖8至16世纪的200多份手稿和古版书,涉及10种语言,提供结构化注释和元数据,旨在为数字人文研究提供一致的基准。
以上内容由遇见数据集搜集并总结生成



