five

PatFigCLS Dataset - Patent Figure Classification Dataset

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14905550
下载链接
链接失效反馈
官方服务:
资源简介:
Dataset Summary The PatFigCLS dataset is introduced in the paper Patent Figure Classification using Large Vision-language Models accepted at ECIR 2025. The dataset is designed specifically for patent figure classification and evaluation across multiple aspects, including type, projection, objects and USPC class. The PatFigCLS dataset is used alongside another dataset called PatFigVQA, which is intended for fine-tuning and evaluating Large Vision-language Models (LVLMs) in few-shot learning setting for patent figure visual question answering. The dataset is sourced from two exisiting datasets: Extended CLEF-IP 2011, and DeepPatent2 Data Format The dataset is stored in .tar files for fast and efficient read access. Data Fields  __key__: unique sample id image.png: patent figure file label.txt:  classification label Data Splits For each classification aspect, three data splits exist: `train_150`, `val` and `test`. How to Use The recommended approach is using the Python library `webdataset`. Below is an example code. import io from PIL import Image from torchvision.transforms import Compose, ToTensor import webdataset as wds from braceexpand import braceexpand def transform(image):   return Compose([ToTensor()])(image) dataset = (   wds.WebDataset(     braceexpand('PatFigCLS/object/train_150/shard-{000000..000042}.tar'),     shardshuffle=1000   )   .shuffle(1000)   .to_tuple('__key__', 'image.png', 'label.txt')   .map_tuple(     lambda key: key,     lambda image: transform(Image.open(io.BytesIO(image))),     lambda label: label.decode('utf-8'),   ) ) dataloder = wds.WebLoader(dataset) Source Code The source code used to produce this dataset can be found at https://github.com/TIBHannover/patent-figure-classification Licensing Information PatFigCLS dataset is released under GNU General Public License v3.0.
创建时间:
2025-02-21
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作