CLEAR: Continual LEArning on Real-World Imagery
收藏NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://zenodo.org/record/5528797
下载链接
链接失效反馈官方服务:
资源简介:
Continual learning (CL) is considered as one of the next big challenges in AI. However, the existing CL benchmarks, e.g. Permuted-MNIST and Split-CIFAR, are artificially designed to be made continual and do not align with or generalize to real-world. In this paper, we introduce CLEAR, the first continual image recognition benchmark dataset with a natural temporal evolution of visual concepts in the real world that spans a decade (2004-2014). We build CLEAR from existing image collection (YFCC100M) by proposing a novel low-cost visio-linguistic dataset curation approach. It involves using pretrained vision-language models (e.g. CLIP) to quickly build high-quality labeled datasets on a tight budget. Finally, we post-process CLEAR via crowd-sourcing to remove errors and even inappropriate images hidden in original YFCC100M. The major strengths of CLEAR over prior CL benchmarks include (1) smooth and realistic temporal evolution of visual concepts with real-world imagery, enabling a more practical "online" (i.e., train on past, test on future) evaluation protocol (2) high-quality labeled data along with abundant unlabeled samples per time period for continual semi-supervised and unsupervised learning. Our extensive experiments reveal that mainstream "offline" evaluation protocols, which train and test on iid data, artificially inflate performance of CL systems, stressing the need for our "online" protocol since the models we train today will always be tested in future. Moreover, we find that state-of-the-arts CL algorithms that only utilize fully-supervised data fall short whereas unsupervised pretraining provides significant boost. Lastly, we introduce a biased reservoir-sampling algorithm that dynamically caches more recent training data, achieving the new state-of-the-arts while still leaving large room for improvement.
创建时间:
2021-09-27



