Insect Detect - insect classification dataset v2
收藏Mendeley Data2024-05-10 更新2024-06-29 收录
下载链接:
https://zenodo.org/records/8325384
下载链接
链接失效反馈官方服务:
资源简介:
The Insect Detect - insect classification dataset v2 contains mainly images of various insects sitting on or flying above an artificial flower platform. All images were automatically recorded with the Insect Detect DIY camera trap, a hardware combination of the Luxonis OAK-1, Raspberry Pi Zero 2 W and PiJuice Zero pHAT for automated insect monitoring (bioRxiv preprint). Most of the images were captured by camera traps deployed at different sites in 2023. For some classes (e.g. ant, bee_bombus, beetle_cocci, bug, bug_grapho, hfly_eristal, hfly_myathr, hfly_syrphus) additional images were captured with a lab setup of the camera trap. For some classes (e.g. bee_apis, fly, hfly_episyr, wasp) images from the first dataset version were transferred to this dataset. This dataset is also available on Roboflow Universe. The images in the dataset from Roboflow are automatically compressed, which decreases model accuracy when used for training. Therefore it is recommended to use this uncompressed Zenodo version and split the dataset into train/val/test subsets in the provided training notebook. Classes This dataset contains the following 27 classes: ant (Formicidae) bee (Anthophila excluding Apis mellifera and Bombus sp.) bee_apis (Apis mellifera) bee_bombus (Bombus sp.) beetle (Coleoptera excluding Coccinellidae and some Oedemeridae) beetle_cocci (Coccinellidae) beetle_oedem (visually distinct Oedemeridae) bug (Heteroptera excluding Graphosoma italicum) bug_grapho (Graphosoma italicum) fly (Brachycera excluding Empididae, Sarcophagidae, Syrphidae and small Brachycera) fly_empi (Empididae) fly_sarco (visually distinct Sarcophagidae) fly_small (small Brachycera) hfly_episyr (hoverfly Episyrphus balteatus) hfly_eristal (hoverfly Eristalis sp., mainly Eristalis tenax) hfly_eupeo (mainly hoverfly Eupeodes corollae and Scaeva pyrastri) hfly_myathr (hoverfly Myathropa florea) hfly_sphaero (hoverfly Sphaerophoria sp., mainly Sphaerophoria scripta) hfly_syrphus (mainly hoverfly Syrphus sp.) lepi (Lepidoptera) none_bg (images with no insect - background (platform)) none_bird (images with no insect - bird sitting on platform) none_dirt (images with no insect - leaves and other plant material, bird droppings) none_shadow (images with no insect - shadows of insects or surrounding plants) other (other Arthropods, including various Hymenoptera and Symphyta, Diptera, Orthoptera, Auchenorrhyncha, Neuroptera, Araneae) scorpionfly (Panorpa sp.) wasp (mainly Vespula sp. and Polistes dominula) For the classes hfly_eupeo and hfly_syrphus a precise taxonomic distinction is not possible with images only, due to a potentially high variability in the appearance of the respective species. While most specimens will show the visual features that are important for a classification into one of these classes, some specimens of Syrphus sp. might look more like Eupeodes sp. and vice versa. The images were sorted to the respective class by considering taxonomic and visual distinctions. However, this dataset is still rather small regarding the visually extremely diverse Insecta. Insects that are not included in this dataset can therefore be classified to the wrong class. All results should always be manually validated and false classifications can be used to extend this basic dataset and retrain your custom classification model. Deployment You can use this dataset as starting point to train your own insect classification models with the provided Google Colab training notebook. Read the model training instructions for more information. A insect classification model trained on this dataset is available in the insect-detect-ml GitHub repo. To deploy the model on your PC (ONNX format for fast CPU inference), follow the provided step-by-step instructions. License This dataset is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0).
昆虫检测(Insect Detect)——昆虫分类数据集v2主要包含各类昆虫停驻或飞于人工花台之上的图像。所有图像均由昆虫检测DIY相机陷阱自动采集,该设备为Luxonis OAK-1、树莓派Raspberry Pi Zero 2 W与PiJuice Zero pHAT的硬件组合,用于自动化昆虫监测(引用自bioRxiv预印本)。绝大多数图像由2023年部署于不同监测点位的相机陷阱采集所得。针对部分类别(如ant、bee_bombus、beetle_cocci、bug、bug_grapho、hfly_eristal、hfly_myathr、hfly_syrphus),额外采用实验室搭建的相机陷阱采集图像。针对bee_apis、fly、hfly_episyr、wasp等类别,将第一版数据集的图像迁移至本数据集。本数据集亦可在Roboflow Universe平台获取。Roboflow平台发布的数据集图像经自动压缩,用于模型训练时会降低精度。因此建议使用本未压缩的Zenodo版本,并通过提供的训练脚本将数据集划分为train/val/test子集。
本数据集共包含以下27个类别:
1. ant(蚁科Formicidae)
2. bee(非西方蜜蜂Apis mellifera与熊蜂属Bombus sp.的膜翅目蜂类Anthophila)
3. bee_apis(西方蜜蜂Apis mellifera)
4. bee_bombus(熊蜂属Bombus sp.)
5. beetle(除瓢甲科Coccinellidae与部分芫菁科Oedemeridae外的鞘翅目Coleoptera昆虫)
6. beetle_cocci(瓢甲科Coccinellidae)
7. beetle_oedem(具有显著视觉特征的芫菁科Oedemeridae昆虫)
8. bug(除意大利广肩蝽Graphosoma italicum外的异翅亚目Heteroptera昆虫)
9. bug_grapho(意大利广肩蝽Graphosoma italicum)
10. fly(除舞虻科Empididae、麻蝇科Sarcophagidae、食蚜蝇科Syrphidae与小型短角亚目Brachycera外的蝇类)
11. fly_empi(舞虻科Empididae)
12. fly_sarco(具有显著视觉特征的麻蝇科Sarcophagidae昆虫)
13. fly_small(小型短角亚目Brachycera昆虫)
14. hfly_episyr(食蚜蝇Episyrphus balteatus)
15. hfly_eristal(食蚜蝇属Eristalis sp.,主要为长尾管蚜蝇Eristalis tenax)
16. hfly_eupeo(主要为浅黑斑眼蚜蝇Eupeodes corollae与双带黑蚜蝇Scaeva pyrastri)
17. hfly_myathr(食蚜蝇Myathropa florea)
18. hfly_sphaero(食蚜蝇属Sphaerophoria sp.,主要为Sphaerophoria scripta)
19. hfly_syrphus(主要为食蚜蝇属Syrphus sp.)
20. lepi(鳞翅目Lepidoptera昆虫)
21. none_bg(无昆虫的背景图像——仅含人工花台)
22. none_bird(无昆虫的图像——平台上有鸟类停留)
23. none_dirt(无昆虫的图像——含叶片与其他植物碎屑、鸟粪)
24. none_shadow(无昆虫的图像——含昆虫或周边植物的阴影)
25. other(其他节肢动物,包括各类膜翅目Hymenoptera、广腰亚目Symphyta、双翅目Diptera、直翅目Orthoptera、头喙亚目Auchenorrhyncha、脉翅目Neuroptera、蜘蛛目Araneae)
26. scorpionfly(蝎蛉属Panorpa sp.)
27. wasp(主要为黄胡蜂属Vespula sp.与马蜂Polistes dominula)
针对hfly_eupeo与hfly_syrphus两个类别,仅通过图像无法实现精确的分类学区分,因为对应物种的外观存在较高变异性。尽管多数样本具备该类别分类所需的视觉特征,但部分Syrphus属物种的外观可能更接近Eupeodes属,反之亦然。本数据集的图像通过结合分类学特征与视觉差异进行类别标注。然而,鉴于昆虫纲Insecta视觉多样性极高,本数据集的样本量仍相对有限。未纳入本数据集的昆虫可能被误分类。所有分类结果均需人工验证,误分类样本可用于扩充本基础数据集,并重新训练自定义分类模型。
您可将本数据集作为起点,通过提供的Google Colab训练脚本训练自定义昆虫分类模型。如需了解更多信息,请参阅模型训练说明。基于本数据集训练得到的昆虫分类模型可在insect-detect-ml的GitHub仓库中获取。若需在个人电脑上部署该模型(采用ONNX格式以实现快速CPU推理),请遵循提供的分步操作指南。
本数据集采用知识共享署名-非商业性使用-相同方式共享4.0国际许可协议(Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License,CC BY-NC-SA 4.0)进行授权。
创建时间:
2023-09-12
搜集汇总
数据集介绍

背景与挑战
背景概述
该数据集包含27类昆虫的图像,主要用于昆虫分类和监测研究。图像通过DIY相机陷阱自动记录,部分类别通过实验室设置补充,适用于训练自定义昆虫分类模型。数据集采用CC BY-NC-SA 4.0许可,推荐使用未压缩的Zenodo版本以提高模型训练准确性。
以上内容由遇见数据集搜集并总结生成



