caveman273/aida-typewritten
收藏Hugging Face2026-04-27 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/caveman273/aida-typewritten
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含来自AIDA项目的打字文本行图像及其转录,是完整AIDA数据集的一个子集,仅包含注释者对所有字符都确信的最佳质量打字注释。大多数文本行是芬兰语,也有一些瑞典语、英语、法语和德语的文本行。数据集主要用于光学字符识别(OCR)任务。数据来源于芬兰商业中央档案馆(ELKA),包括信件、船舶记录、商业出版物等各种文档类型。数据集未匿名化,可能包含个人姓名等敏感信息。
This dataset contains typewritten textline images and their transcriptions from the AIDA-project. It is a subset of the full AIDA dataset, containing only the best-quality typwritten annotations — lines where the annotator was confident about every character. The majority of lines are in Finnish, with some Swedish, English, French, and German. The dataset was created for optical char recognition (OCR). The data is collected from Central Archives for Finnish Business (ELKA), consisting of various document types including letters, ship records, business publications etc. The dataset is not anonymized, so individuals names can be found in the dataset.
提供机构:
caveman273



