The ICDAR 2003 Informal Competition for the Recognition of On-line Words: The Unipen-ICROW-03 benchmark set - Version 0.0
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/7631141
下载链接
链接失效反馈官方服务:
资源简介:
Proposal for an informal benchmark on word recognition. See for the related ImUnipen collection
of word images from on-line vectorial handwriting data: https://zenodo.org/record/1195059
At the time (ICDAR 2003) there was not a lot of interest so the project was not pursued.
Lambert Schomaker - February 2023
_______________________________________________________________________________
The ICDAR 2003 Informal Competition for the Recognition of On-line Words:
The Unipen-ICROW-03 benchmark set
Version 0.0
Lambert Schomaker / International Unipen Foundation
The ICROW suite of test files for the recognition of isolated on-line
free-style (handprint, mixed and cursive) words has been
composed. Different tablets, nationalities and languages
are involved. Only the ASCII set is used within word labels.
The set contains:
13119 written words
884 unique lexical word entries
72 writers
Language: Dutch, English, Italian.
Nationalities: Dutch, Irish, Italian, + mixed
The benchmark test is a good estimator for
"walk-up" recognition performance.
[Note: some of the writers (NIC-Pc95*.dat set) are present in the
UNIPEN R01/V07 distribution, but the actual words are unseen
outside of the Int. Unipen Foundation.]
Please note the Copyright notice in the
accompanying file 'Copyright'
Wed Jul 16 21:20:10 CEST 2003
Lambert Schomaker
---------------------------------------------------------------------------
Instructions for the ICDAR 2003 informal competition for
the recognition of on-line words.
1 - unpack the .tgz file
2 - use the UNIPEN files as input for your recognizer.
3 - report, for each writer, a file .res
Example: do-my-recognizer < NIC-Hi93b-marc.dat > NIC-Hi93b-marc.res
Format of the .res file.
No XML for this moment: simplicity does it.
We assume that the recognizer is able to produce a top-10 list
of likely words, sorted from most likely to least likely.
The output for each word is on a single line. The correct
target word is in the first column.
<2nd-best word hyp.> ... <10th-best word hyp>
<2nd-best word hyp.> ... <10th-best word hyp>
Example with two words:
summertime slumbertime slipknot summertime somatome spumante simulative semitone schoolmate sermonette semimature
Aberdeen Adamson Aberdeen Addison Armageddon Abyssinian Araban Albanian Alabamian Abraham Adelaide
4 - pack the *.res files in a .tgz or .zip file and send them
to schomaker@ai.rug.nl
All *.dat files need to be processed.
LS.
在线单词识别非正式基准测试提案。相关参考资料可查看来自在线矢量手写数据的单词图像数据集ImUnipen:https://zenodo.org/record/1195059
2003年国际文档分析与识别会议(ICDAR 2003)举办期间,该项目并未获得足够关注度,因此未能继续推进。
兰伯特·朔马克(Lambert Schomaker)——2023年2月
_______________________________________________________________________________
2003年国际文档分析与识别会议(ICDAR 2003)在线单词识别非正式竞赛:Unipen-ICROW-03基准数据集 版本0.0
兰伯特·朔马克 / 国际Unipen基金会
本次构建了ICROW测试数据集,用于孤立在线自由手写(印刷体、混合体与草书体)单词的识别。该数据集涵盖不同书写板、不同国籍与语言的手写样本,单词标注仅采用ASCII字符集。
该数据集包含:
13119条手写单词样本
884个唯一词汇词条
72名书写者
覆盖语言:荷兰语、英语、意大利语;涉及国籍:荷兰、爱尔兰、意大利,及混合国籍群体
该基准测试可作为即开即用式识别系统性能的有效评估指标。
[注:部分书写者的样本(NIC-Pc95*.dat数据集)包含于UNIPEN R01/V07分发包中,但该数据集内的实际单词样本仅对国际Unipen基金会内部开放,外部无法获取。]
请留意随附的"Copyright"文件中的版权声明。
2003年7月16日 星期三 中欧夏令时间21:20:10
兰伯特·朔马克
---------------------------------------------------------------------------
2003年国际文档分析与识别会议在线单词识别非正式竞赛参赛指南
1. 解压.tgz压缩包
2. 将UNIPEN格式文件作为识别器的输入数据
3. 为每位书写者生成对应的.res结果文件
示例:do-my-recognizer < NIC-Hi93b-marc.dat > NIC-Hi93b-marc.res
.res结果文件格式说明
暂不使用XML格式,以简洁性为核心原则。
假设识别器可生成按置信度从高到低排序的Top-10候选单词列表。每个单词的识别结果独占一行,正确目标单词位于第一列。
<第2候选单词> … <第10候选单词>
<第2候选单词> … <第10候选单词>
双单词示例:
summertime slumbertime slipknot summertime somatome spumante simulative semitone schoolmate sermonette semimature
Aberdeen Adamson Aberdeen Addison Armageddon Abyssinian Araban Albanian Alabamian Abraham Adelaide
4. 将所有.res结果文件打包为.tgz或.zip压缩包,并发送至schomaker@ai.rug.nl。请注意,所有.dat输入文件均需完成处理。
LS.
创建时间:
2024-07-12



