Image-based taxonomic classification of bulk biodiversity samples using deep learning and domain adaptation
收藏DataCite Commons2026-03-05 更新2026-04-25 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.05qfttf4f
下载链接
链接失效反馈官方服务:
资源简介:
Complex bulk samples of insects from biodiversity surveys present a
challenge for taxonomic identification, which could be overcome by
high-throughput imaging combined with machine learning for rapid
classification of specimens. These procedures require that taxonomic
labels from an existing source data set are used for model training and
prediction of an unknown target sample. However, such transfer learning
may be problematic for the study of new samples not previously encountered
in an image set, e.g. from unexplored ecosystems, and require methods of
domain adaptation that reduce the differences in the feature distribution
of the source and target domains (training and test sets). We assessed the
efficiency of domain adaptation for family-level classification of bulk
samples of Coleoptera, as a critical first step in the characterisation of
biodiversity samples. Neural network models trained with images from a
global database of Coleoptera were applied to a biodiversity sample from
understudied forests in Cyprus as the target. Within-dataset
classification accuracy reached 98% and depended on the number and quality
of training images and on dataset complexity. The accuracy of
between-datasets predictions (across disparate source-target pairs that do
not share any species or genera) was at most 82% and depended greatly on
the standardisation of the imaging procedure. Algorithms for domain
adaptation significantly improved the prediction performance of models
trained by non-standardised, low-quality images. Our findings demonstrate
that existing databases can be used to train models and successfully
classify images from unexplored biota, but the imaging conditions and
classification algorithms need careful consideration.
提供机构:
Dryad
创建时间:
2022-01-06



