five

USPTO算法挑战赛,由美国宇航局哈佛竞赛实验室和TopCoder问题:Pat数据集运行

收藏
帕依提提2024-03-04 收录
下载链接:
https://www.payititi.com/opendatasets/show-26309.html
下载链接
链接失效反馈
官方服务:
资源简介:
-- Creator: TopCoder, Inc. -- Released under Apache License, Version 2.0 http://www.apache.org/licenses/LICENSE-2.0.html Data Set Information: USPTO Algorithm Challenge, run by NASA-Harvard Tournament Lab and TopCoder Problem: Patent Labeling Attribute Information: Dataset Information: -- This folder contains 4 groups of USPTO patent images including ground truth information. -- The 4 groups are 'train1', 'train2', 'test', 'evaluation'. -- 'train1', 'test', 'evaluation' contains data in the original 'USPTO Algorithm Challenge' for training, testing and final evaluation, respectively. -- 'train2' contains additional data which was used in the 'USPTO Algorithm Followup Challenge.' Notice that 'train2' includes some cover page images of patent document which is not included in other groups. -- In each group, there are two folders contain original images and corresponding ground truth informations. -- The original images are in 'jpeg' format. -- There are two types of ground truth: figure label ground truth and part label ground truth. -- The ground truth files are text files with '.ans' extension. -- The structure of the ground truth files are described as below: -- The first line is one number indicating how many instances exist in corresponding image -- The following lines are polygon coordinates and corresponding label contents, each line corresponds to a figure label or part label, in the form 'N x1 y1 x2 y2 a€| xN yN x1 y1 content'. -- In each of those lines, the first number N indicates how many polygon vertices are recorded in current instance. -- The following numbers are x, y coordinates of those vertices. -- The final word in each line is the content of figure label or part label. Relevant Papers: Christoph Riedl, Richard Zanibbi, Marti A. Hearst, Siyu Zhu, Michael Minetti, Jason Crusan, Ivan Metelsky, and Karim R. Lakhani, 'Detecting Figures and Part Labels in Patents: A Competition-based Development of Image Processing Algorithms', working paper, [Web link]. Citation Request: Christoph Riedl, Richard Zanibbi, Marti A. Hearst, Siyu Zhu, Michael Minetti, Jason Crusan, Ivan Metelsky, and Karim R. Lakhani, 'Detecting Figures and Part Labels in Patents: A Competition-based Development of Image Processing Algorithms,' International Journal on document Analysis and Recognition, 1-18, DOI 10.1007/s10032-016-0260-8
提供机构:
帕依提提
二维码
社区交流群
二维码
科研交流群
商业服务