The ICDAR 2003 Informal Competition for the Recognition of On-line Words: The Unipen-ICROW-03 benchmark set - Version 0.0

NIAID Data Ecosystem2026-05-02 收录

下载链接：

https://zenodo.org/record/7631141

下载链接

链接失效反馈

官方服务：

资源简介：

Proposal for an informal benchmark on word recognition. See for the related ImUnipen collection of word images from on-line vectorial handwriting data: https://zenodo.org/record/1195059 At the time (ICDAR 2003) there was not a lot of interest so the project was not pursued. Lambert Schomaker - February 2023 _______________________________________________________________________________ The ICDAR 2003 Informal Competition for the Recognition of On-line Words: The Unipen-ICROW-03 benchmark set Version 0.0 Lambert Schomaker / International Unipen Foundation The ICROW suite of test files for the recognition of isolated on-line free-style (handprint, mixed and cursive) words has been composed. Different tablets, nationalities and languages are involved. Only the ASCII set is used within word labels. The set contains: 13119 written words 884 unique lexical word entries 72 writers Language: Dutch, English, Italian. Nationalities: Dutch, Irish, Italian, + mixed The benchmark test is a good estimator for "walk-up" recognition performance. [Note: some of the writers (NIC-Pc95*.dat set) are present in the UNIPEN R01/V07 distribution, but the actual words are unseen outside of the Int. Unipen Foundation.] Please note the Copyright notice in the accompanying file 'Copyright' Wed Jul 16 21:20:10 CEST 2003 Lambert Schomaker --------------------------------------------------------------------------- Instructions for the ICDAR 2003 informal competition for the recognition of on-line words. 1 - unpack the .tgz file 2 - use the UNIPEN files as input for your recognizer. 3 - report, for each writer, a file .res Example: do-my-recognizer < NIC-Hi93b-marc.dat > NIC-Hi93b-marc.res Format of the .res file. No XML for this moment: simplicity does it. We assume that the recognizer is able to produce a top-10 list of likely words, sorted from most likely to least likely. The output for each word is on a single line. The correct target word is in the first column. <2nd-best word hyp.> ... <10th-best word hyp> <2nd-best word hyp.> ... <10th-best word hyp> Example with two words: summertime slumbertime slipknot summertime somatome spumante simulative semitone schoolmate sermonette semimature Aberdeen Adamson Aberdeen Addison Armageddon Abyssinian Araban Albanian Alabamian Abraham Adelaide 4 - pack the *.res files in a .tgz or .zip file and send them to schomaker@ai.rug.nl All *.dat files need to be processed. LS.

在线单词识别非正式基准测试提案。相关参考资料可查看来自在线矢量手写数据的单词图像数据集ImUnipen：https://zenodo.org/record/1195059 2003年国际文档分析与识别会议（ICDAR 2003）举办期间，该项目并未获得足够关注度，因此未能继续推进。兰伯特·朔马克（Lambert Schomaker）——2023年2月 _______________________________________________________________________________ 2003年国际文档分析与识别会议（ICDAR 2003）在线单词识别非正式竞赛：Unipen-ICROW-03基准数据集版本0.0 兰伯特·朔马克 / 国际Unipen基金会本次构建了ICROW测试数据集，用于孤立在线自由手写（印刷体、混合体与草书体）单词的识别。该数据集涵盖不同书写板、不同国籍与语言的手写样本，单词标注仅采用ASCII字符集。该数据集包含： 13119条手写单词样本 884个唯一词汇词条 72名书写者覆盖语言：荷兰语、英语、意大利语；涉及国籍：荷兰、爱尔兰、意大利，及混合国籍群体该基准测试可作为即开即用式识别系统性能的有效评估指标。 [注：部分书写者的样本（NIC-Pc95*.dat数据集）包含于UNIPEN R01/V07分发包中，但该数据集内的实际单词样本仅对国际Unipen基金会内部开放，外部无法获取。] 请留意随附的"Copyright"文件中的版权声明。 2003年7月16日星期三中欧夏令时间21:20:10 兰伯特·朔马克 --------------------------------------------------------------------------- 2003年国际文档分析与识别会议在线单词识别非正式竞赛参赛指南 1. 解压.tgz压缩包 2. 将UNIPEN格式文件作为识别器的输入数据 3. 为每位书写者生成对应的.res结果文件示例：do-my-recognizer < NIC-Hi93b-marc.dat > NIC-Hi93b-marc.res .res结果文件格式说明暂不使用XML格式，以简洁性为核心原则。假设识别器可生成按置信度从高到低排序的Top-10候选单词列表。每个单词的识别结果独占一行，正确目标单词位于第一列。 <第2候选单词> … <第10候选单词> <第2候选单词> … <第10候选单词> 双单词示例： summertime slumbertime slipknot summertime somatome spumante simulative semitone schoolmate sermonette semimature Aberdeen Adamson Aberdeen Addison Armageddon Abyssinian Araban Albanian Alabamian Abraham Adelaide 4. 将所有.res结果文件打包为.tgz或.zip压缩包，并发送至schomaker@ai.rug.nl。请注意，所有.dat输入文件均需完成处理。 LS.

创建时间：

2024-07-12