five

flwrlabs/caltech101

收藏
Hugging Face2024-08-29 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/flwrlabs/caltech101
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: image dtype: image - name: label dtype: class_label: names: '0': accordion '1': airplanes '2': anchor '3': ant '4': barrel '5': bass '6': beaver '7': binocular '8': bonsai '9': brain '10': brontosaurus '11': buddha '12': butterfly '13': camera '14': cannon '15': car_side '16': ceiling_fan '17': cellphone '18': chair '19': chandelier '20': cougar_body '21': cougar_face '22': crab '23': crayfish '24': crocodile '25': crocodile_head '26': cup '27': dalmatian '28': dollar_bill '29': dolphin '30': dragonfly '31': electric_guitar '32': elephant '33': emu '34': euphonium '35': ewer '36': faces '37': faces_easy '38': ferry '39': flamingo '40': flamingo_head '41': garfield '42': gerenuk '43': gramophone '44': grand_piano '45': hawksbill '46': headphone '47': hedgehog '48': helicopter '49': ibis '50': inline_skate '51': joshua_tree '52': kangaroo '53': ketch '54': lamp '55': laptop '56': leopards '57': llama '58': lobster '59': lotus '60': mandolin '61': mayfly '62': menorah '63': metronome '64': minaret '65': motorbikes '66': nautilus '67': octopus '68': okapi '69': pagoda '70': panda '71': pigeon '72': pizza '73': platypus '74': pyramid '75': revolver '76': rhino '77': rooster '78': saxophone '79': schooner '80': scissors '81': scorpion '82': sea_horse '83': snoopy '84': soccer_ball '85': stapler '86': starfish '87': stegosaurus '88': stop_sign '89': strawberry '90': sunflower '91': tick '92': trilobite '93': umbrella '94': watch '95': water_lilly '96': wheelchair '97': wild_cat '98': windsor_chair '99': wrench '100': yin_yang splits: - name: train num_bytes: 121007587.037 num_examples: 8677 download_size: 121217709 dataset_size: 121007587.037 configs: - config_name: default data_files: - split: train path: data/train-* license: unknown task_categories: - image-classification size_categories: - 1K<n<10K --- # Dataset Card for Caltech 101 This dataset contains images of objects from 101 distinct categories, with each category comprising approximately 40 to 800 images. The majority of categories include around 50 images each. The images were collected in September 2003 by Fei-Fei Li, Marco Andreetto, and Marc’Aurelio Ranzato. Each image has an approximate resolution of 300 x 200 pixels. ### Dataset Sources - **Website:** https://data.caltech.edu/records/mzrjq-6wc02 ## Use in FL In order to prepare the dataset for the FL settings, we recommend using [Flower Dataset](https://flower.ai/docs/datasets/) (flwr-datasets) for the dataset download and partitioning and [Flower](https://flower.ai/docs/framework/) (flwr) for conducting FL experiments. To partition the dataset, do the following. 1. Install the package. ```bash pip install flwr-datasets[vision] ``` 2. Use the HF Dataset under the hood in Flower Datasets. ```python from flwr_datasets import FederatedDataset from flwr_datasets.partitioner import IidPartitioner fds = FederatedDataset( dataset="flwrlabs/caltech101", partitioners={"train": IidPartitioner(num_partitions=10)} ) partition = fds.load_partition(partition_id=0) ``` ## Dataset Structure ### Data Instances The first instance of the train split is presented below: ``` { 'image': <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=397x150>, 'label': 1 } ``` ### Data Split ``` DatasetDict({ train: Dataset({ features: ['image', 'label'], num_rows: 8677 }) }) ``` ## Implementation details Note that in this implementation, the string labels are first transformed into lowercase and then sorted alphabetically before providing the integer mapping. This methodology can vary across implementations. ## Citation When working with the Caltech-101 dataset, please cite the original paper. If you're using this dataset with Flower Datasets and Flower, cite Flower. **BibTeX:** Dataset Bibtex: ``` @misc{li2022caltech, title = {Caltech 101}, author = {Li, Fei-Fei and Andreeto, Marco and Ranzato, Marc'Aurelio and Perona, Pietro}, year = {2022}, month = {Apr}, publisher = {CaltechDATA}, doi = {10.22002/D1.20086}, abstract = {Pictures of objects belonging to 101 categories. About 40 to 800 images per category. Most categories have about 50 images. Collected in September 2003 by Fei-Fei Li, Marco Andreetto, and Marc'Aurelio Ranzato. The size of each image is roughly 300 x 200 pixels. We have carefully clicked outlines of each object in these pictures, these are included under the 'Annotations.tar'. There is also a MATLAB script to view the annotations, 'show_annotations.m'.} } ```` Flower: ``` @article{DBLP:journals/corr/abs-2007-14390, author = {Daniel J. Beutel and Taner Topal and Akhil Mathur and Xinchi Qiu and Titouan Parcollet and Nicholas D. Lane}, title = {Flower: {A} Friendly Federated Learning Research Framework}, journal = {CoRR}, volume = {abs/2007.14390}, year = {2020}, url = {https://arxiv.org/abs/2007.14390}, eprinttype = {arXiv}, eprint = {2007.14390}, timestamp = {Mon, 03 Aug 2020 14:32:13 +0200}, biburl = {https://dblp.org/rec/journals/corr/abs-2007-14390.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} } ``` ## Dataset Card Contact If you have any questions about the dataset preprocessing and preparation, please contact [Flower Labs](https://flower.ai/).

--- dataset_info: 数据集信息 features: - name: 图像(image) dtype: 图像 - name: 标签(label) dtype: 类标签(class_label): 类别名称: '0': 手风琴 '1': 飞机 '2': 锚 '3': 蚂蚁 '4': 桶 '5': 低音提琴 '6': 海狸 '7': 双筒望远镜 '8': 盆景 '9': 大脑 '10': 雷龙 '11': 佛像 '12': 蝴蝶 '13': 相机 '14': 加农炮 '15': 侧面视角汽车 '16': 吊扇 '17': 手机 '18': 椅子 '19': 吊灯 '20': 美洲狮躯体 '21': 美洲狮头部 '22': 螃蟹 '23': 小龙虾 '24': 鳄鱼 '25': 鳄鱼头部 '26': 杯子 '27': 大麦町犬 '28': 美元纸币 '29': 海豚 '30': 蜻蜓 '31': 电吉他 '32': 大象 '33': 鸸鹋 '34': 上低音号 '35': 水罐 '36': 人脸 '37': 简易人脸 '38': 渡轮 '39': 火烈鸟 '40': 火烈鸟头部 '41': 加菲猫 '42': 长颈羚 '43': 留声机 '44': 三角钢琴 '45': 玳瑁 '46': 耳机 '47': 刺猬 '48': 直升机 '49': 朱鹭 '50': 直排轮滑鞋 '51': 约书亚树 '52': 袋鼠 '53': 番茄酱 '54': 灯具 '55': 笔记本电脑 '56': 豹 '57': 美洲驼 '58': 龙虾 '59': 荷花 '60': 曼陀林 '61': 蜉蝣 '62': 七枝烛台 '63': 节拍器 '64': 尖塔 '65': 摩托车 '66': 鹦鹉螺 '67': 章鱼 '68': 霍加狓 '69': 佛塔 '70': 大熊猫 '71': 鸽子 '72': 披萨 '73': 鸭嘴兽 '74': 金字塔 '75': 左轮手枪 '76': 犀牛 '77': 公鸡 '78': 萨克斯管 '79': 纵帆船 '80': 剪刀 '81': 蝎子 '82': 海马 '83': 史努比 '84': 足球 '85': 订书机 '86': 海星 '87': 剑龙 '88': 停车标志 '89': 草莓 '90': 向日葵 '91': 蜱虫 '92': 三叶虫 '93': 雨伞 '94': 手表 '95': 睡莲 '96': 轮椅 '97': 野猫 '98': 温莎椅 '99': 扳手 '100': 阴阳符号 splits: - name: 训练集 num_bytes: 121007587.037 num_examples: 8677 download_size: 121217709 dataset_size: 121007587.037 configs: - config_name: 默认配置 data_files: - split: 训练集 path: data/train-* license: 未知 task_categories: - 图像分类(image-classification) size_categories: - 1千 < 样本数 < 1万 --- # Caltech 101 数据集卡片 本数据集包含101个不同类别的物体图像,每个类别包含约40至800张图像,绝大多数类别各有约50张图像。该数据集于2003年9月由Fei-Fei Li、Marco Andreetto与Marc’Aurelio Ranzato收集。每张图像的分辨率约为300×200像素。 ### 数据集来源 - **网站:** https://data.caltech.edu/records/mzrjq-6wc02 ## 联邦学习(Federated Learning,简称FL)中的使用 为将该数据集适配联邦学习场景,我们推荐使用[Flower Dataset](https://flower.ai/docs/datasets/)(flwr-datasets)完成数据集的下载与划分,并使用[Flower](https://flower.ai/docs/framework/)(flwr)开展联邦学习实验。 进行数据集划分的步骤如下: 1. 安装依赖包: bash pip install flwr-datasets[vision] 2. 在Flower Datasets中底层调用Hugging Face数据集: python from flwr_datasets import FederatedDataset from flwr_datasets.partitioner import IidPartitioner fds = FederatedDataset( dataset="flwrlabs/caltech101", partitioners={"train": IidPartitioner(num_partitions=10)} ) partition = fds.load_partition(partition_id=0) ## 数据集结构 ### 数据样例 训练集的第一条样例如以下所示: { 'image': <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=397x150>, 'label': 1 } ### 数据集划分 DatasetDict({ train: Dataset({ features: ['image', 'label'], num_rows: 8677 }) }) ## 实现细节 请注意,在本实现中,字符串标签会先被转换为小写形式,随后按字母顺序排序,再映射为整数标签。不同实现的该映射方式可能存在差异。 ## 引用规范 使用Caltech-101数据集时,请引用其原始论文;若结合Flower Datasets与Flower框架使用该数据集,请同时引用Flower框架。 **BibTeX格式:** 数据集原始论文引用: @misc{li2022caltech, title = {Caltech 101}, author = {Li, Fei-Fei and Andreetto, Marco and Ranzato, Marc'Aurelio and Perona, Pietro}, year = {2022}, month = {Apr}, publisher = {CaltechDATA}, doi = {10.22002/D1.20086}, abstract = {本数据集包含101个类别的物体图像,每个类别约有40至800张图像,绝大多数类别各含约50张图像,由Fei-Fei Li、Marco Andreetto与Marc’Aurelio Ranzato于2003年9月收集。每张图像的尺寸约为300×200像素。我们已手动勾勒出每张图像中目标物体的轮廓,相关内容包含在'Annotations.tar'压缩包中,同时附带了用于查看标注的MATLAB脚本'show_annotations.m'。} } Flower框架引用: @article{DBLP:journals/corr/abs-2007-14390, author = {Daniel J. Beutel and Taner Topal and Akhil Mathur and Xinchi Qiu and Titouan Parcollet and Nicholas D. Lane}, title = {Flower: {A} Friendly Federated Learning Research Framework}, journal = {CoRR}, volume = {abs/2007.14390}, year = {2020}, url = {https://arxiv.org/abs/2007.14390}, eprinttype = {arXiv}, eprint = {2007.14390}, timestamp = {Mon, 03 Aug 2020 14:32:13 +0200}, biburl = {https://dblp.org/rec/journals/corr/abs-2007.14390.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} } ## 数据集卡片联络方式 若您对数据集的预处理与准备工作有任何疑问,请联系[Flower Labs](https://flower.ai/).
提供机构:
flwrlabs
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作