Caltech101
收藏Caltech101 数据集概述
数据集基本信息
- 语言: 英文 (eng)
- 多语言性: 单语言 (monolingual)
- 许可: 未知 (unknown)
- 注释创建方式: 衍生 (derived)
- 任务类别: 图像分类 (image-classification)
- 标签:
- mteb
- image
数据集结构
- 配置名称: default
- 数据文件:
- 训练集: data/train-*
- 测试集: data/test-*
数据集特征
- 特征:
- image: 图像类型
- label: 类别标签 (共102个类别,从0到101)
数据集统计
- 训练集:
- 样本数量: 3060
- 大小: 44260545.38 字节
- 测试集:
- 样本数量: 6084
- 大小: 74371922.14 字节
- 下载大小: 137964637 字节
- 数据集总大小: 118632467.52000001 字节
评估方法
使用以下代码评估嵌入模型在该数据集上的表现: python import mteb
task = mteb.get_tasks(["Caltech101"]) evaluator = mteb.MTEB(task)
model = mteb.get_model(YOUR_MODEL) evaluator.run(model)
引用
-
原始论文: bibtex @inproceedings{1384978, author = {Li Fei-Fei and Fergus, R. and Perona, P.}, booktitle = {2004 Conference on Computer Vision and Pattern Recognition Workshop}, doi = {10.1109/CVPR.2004.383}, keywords = {Bayesian methods;Testing;Humans;Maximum likelihood estimation;Assembly;Shape;Machine vision;Image recognition;Parameter estimation;Image databases}, number = {}, pages = {178-178}, title = {Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories}, volume = {}, year = {2004}, }
-
MTEB相关论文: bibtex @article{enevoldsen2025mmtebmassivemultilingualtext, title={MMTEB: Massive Multilingual Text Embedding Benchmark}, author={Kenneth Enevoldsen and Isaac Chung and Imene Kerboua and Márton Kardos and Ashwin Mathur and David Stap and Jay Gala and Wissam Siblini and Dominik Krzemiński and Genta Indra Winata and Saba Sturua and Saiteja Utpala and Mathieu Ciancone and Marion Schaeffer and Gabriel Sequeira and Diganta Misra and Shreeya Dhakal and Jonathan Rystrøm and Roman Solomatin and Ömer Çağatan and Akash Kundu and Martin Bernstorff and Shitao Xiao and Akshita Sukhlecha and Bhavish Pahwa and Rafał Poświata and Kranthi Kiran GV and Shawon Ashraf and Daniel Auras and Björn Plüster and Jan Philipp Harries and Loïc Magne and Isabelle Mohr and Mariya Hendriksen and Dawei Zhu and Hippolyte Gisserot-Boukhlef and Tom Aarsen and Jan Kostkan and Konrad Wojtasik and Taemin Lee and Marek Šuppa and Crystina Zhang and Roberta Rocca and Mohammed Hamdy and Andrianos Michail and John Yang and Manuel Faysse and Aleksei Vatolin and Nandan Thakur and Manan Dey and Dipam Vasani and Pranjal Chitale and Simone Tedeschi and Nguyen Tai and Artem Snegirev and Michael Günther and Mengzhou Xia and Weijia Shi and Xing Han Lù and Jordan Clive and Gayatri Krishnakumar and Anna Maksimova and Silvan Wehrli and Maria Tikhonova and Henil Panchal and Aleksandr Abramov and Malte Ostendorff and Zheng Liu and Simon Clematide and Lester James Miranda and Alena Fenogenova and Guangyu Song and Ruqiya Bin Safi and Wen-Ding Li and Alessia Borghini and Federico Cassano and Hongjin Su and Jimmy Lin and Howard Yen and Lasse Hansen and Sara Hooker and Chenghao Xiao and Vaibhav Adlakha and Orion Weller and Siva Reddy and Niklas Muennighoff}, publisher = {arXiv}, journal={arXiv preprint arXiv:2502.13595}, year={2025}, url={https://arxiv.org/abs/2502.13595}, doi = {10.48550/arXiv.2502.13595}, }
测试集统计详情
- 样本数量: 6084
- 唯一标签数量: 102
- 图像宽度:
- 最小值: 80
- 平均值: 311.7217291255753
- 最大值: 3481
- 图像高度:
- 最小值: 101
- 平均值: 241.84418145956607
- 最大值: 3999




