KALIMA-Dataset
收藏arXiv2023-12-10 更新2024-06-21 收录
下载链接:
https://github.com/bouchalhakim/KALIMA_Dataset.git
下载链接
链接失效反馈官方服务:
资源简介:
KALIMA-Dataset是由塔曼拉塞特大学LIMED实验室创建的数据集,专注于历史阿拉伯手稿中的单词位置标注。该数据集包含497张文本线图像,总计5062个边界框标注,涵盖4943个阿拉伯单词和119个其他标注。数据来源于RASM2019数据集中的26页文档,涉及两本不同的书籍。创建过程中,通过手动划定每个单词的边界框并关联其转录文本,确保了数据的高质量。该数据集主要用于开发和测试古阿拉伯单词的提取和识别系统,旨在解决阿拉伯手稿中单词分割的难题。
KALIMA-Dataset was developed by the LIMED Laboratory at the University of Tamanghasset, focusing on word position annotation for historical Arabic manuscripts. This dataset contains 497 text line images, with a total of 5062 bounding box annotations covering 4943 Arabic words and 119 additional annotations. The data is sourced from 26 pages of documents in the RASM2019 dataset, which originate from two distinct books. During its construction, the bounding boxes of each word were manually delineated and associated with their corresponding transcriptions, ensuring high data quality. This dataset is primarily used for developing and testing extraction and recognition systems for historical Arabic words, aiming to address the challenges of word segmentation in Arabic manuscripts.
提供机构:
塔曼拉塞特大学
创建时间:
2023-12-10



