2022 年粤港澳大湾区(黄埔)国际算法算例大赛-古籍文档图像分析与识别初赛训练集
收藏魔搭社区2026-01-04 更新2025-09-13 收录
下载链接:
https://modelscope.cn/datasets/llmlearningX/GBA2022_AncientDoc_ImageAnalysis_TrainingSet
下载链接
链接失效反馈官方服务:
资源简介:
2022 年粤港澳大湾区(黄埔)国际算法算例大赛-古籍文档图像分析与识别训练集初赛数据集:训练集、验证集与测试集各包括1000幅古籍文档图像(共3000张图像),数据选自四库全书、历代古籍善本、乾隆大藏经等多种古籍数据。任务仅考虑古籍文档的正文内容,忽略如版心、卷号等边框外的内容。社区项目链接:https://aistudio.baidu.com/projectdetail/4525530;赛题官方链接:https://iacc.pazhoulab-huangpu.com/contestdetail?id=6497f74cd97a2dae9dcaeff8&award=1,000,000
2022 Guangdong-Hong Kong-Macao Greater Bay Area (Huangpu) International Algorithm Case Competition – Preliminary Training Dataset for the Ancient Book Document Image Analysis and Recognition Track: This dataset includes 3000 ancient book document images in total, with 1000 samples assigned to each of the training, validation, and test splits. The source data is collected from various ancient book collections, including the Complete Library of the Four Treasuries, rare ancient books of successive dynasties, and the Qianlong Edition of the Chinese Buddhist Canon, among others. The task solely focuses on the main body text of the ancient book documents, excluding content outside the page borders such as the central gutter area (banxin) and volume numbers. Community project link: https://aistudio.baidu.com/projectdetail/4525530; Official contest link: https://iacc.pazhoulab-huangpu.com/contestdetail?id=6497f74cd97a2dae9dcaeff8&award=1,000,000
提供机构:
maas
创建时间:
2025-09-07
搜集汇总
数据集介绍

背景与挑战
背景概述
该数据集是2022年粤港澳大湾区(黄埔)国际算法算例大赛的古籍文档图像分析与识别任务的训练集,旨在推动古籍数字化和智能研究,包含3000张图像,覆盖《四库全书》等多样古籍,具有复杂布局和丰富汉字风格。数据集提供行级标注(边界框、文本和阅读顺序),适用于古籍OCR、布局分析和阅读顺序建模等高价值应用场景。
以上内容由遇见数据集搜集并总结生成



