Dataset for Single Character Detection in Dongba Manuscripts
收藏DataCite Commons2025-07-02 更新2025-09-08 收录
下载链接:
https://springernature.figshare.com/articles/dataset/Dataset_for_Single_Character_Detection_in_Dongba_Manuscripts/26969755/1
下载链接
链接失效反馈官方服务:
资源简介:
Dataset for Single Character Detection in Dongba Manuscripts.It includes 1,800 curated JPEG image files and 1,800 text annotation files in TXT format. All files are named in a consistent format to ensure easy indexing and association between images and their corresponding annotations: JPEG images are named 'image_.jpg' (e.g., 'image_1.jpg'), and TXT files are named 'gt_image_.txt' (e.g., 'gt_image_1.txt'). In these TXT files, annotations of Dongba characters include a verified total of 111,702 characters, ensuring the accuracy and reliability of the data. Each character's spatial position is identified by a series of coordinate pairs that define the polygonal boundaries of the text boxes. For example, the coordinate sequence "161, 59, 202, 57, 256, 85, 239, 154, 182, 147, 163, 107" represents the vertices of a polygon, with each pair like "161, 59" indicating the x and y coordinates of a vertex. Coordinates are typically listed in a clockwise direction to comprehensively outline the full contour of the polygon. To differentiate between records, the annotation files use "###" as a delimiter to signify the end of a record. Additionally, to enhance the usability and applicability of the dataset, all data are stored and transmitted in standard formats, enabling researchers to readily use these data for training and testing machine learning models. By providing these detailed data records and formatting specifications, the Dongba1800 dataset not only supports the preservation and research of Dongba script and related cultural heritage but also offers valuable resources for technological development in related fields.
提供机构:
figshare
创建时间:
2025-07-02



