five

caveman273/aida-ship-info

收藏
Hugging Face2026-04-27 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/caveman273/aida-ship-info
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集包含来自AIDA项目的手写文本行图像及其转录文本,是完整AIDA数据集的一个子集,仅包含船舶注册记录中标注者对所有字符都有信心的最佳质量手写注释。大多数文本行为芬兰语,也有部分瑞典语、英语、法语和德语。数据集支持手写文本识别(HTR)任务,包含训练集(6943条)、验证集(1151条)和测试集(1270条)。数据来源为芬兰商业中央档案馆(ELKA)的船舶注册记录,由芬兰国家档案馆和ELKA的员工进行标注。此外,还通过合成数据增加了训练数据的数量。数据集未匿名化,可能包含个人姓名等敏感信息。

This dataset contains handwritten textline images and their transcriptions from the AIDA-project. It is a subset of the full AIDA dataset, containing only the best-quality handwritten annotations from ship registry records — lines where the annotator was confident about every character. The majority of lines are in Finnish, with some Swedish, English, French, and German. The dataset was created for handwritten text recognition (HTR) and includes splits for training (6943 lines), validation (1151 lines), and testing (1270 lines). The data is collected from Central Archives for Finnish Business (ELKA) and annotated by employees of National Archives of Finland and ELKA. Synthetic data was also generated to augment the training data. The dataset is not anonymized, so individuals names can be found in the dataset.
提供机构:
caveman273
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作