Machine learning analysis of wing venation patterns accurately identifies Sarcophagidae, Calliphoridae and Muscidae fly species

NIAID Data Ecosystem2026-05-01 收录

下载链接：

http://datadryad.org/dataset/doi%253A10.5061%252Fdryad.vdncjsxzh

下载链接

链接失效反馈

官方服务：

资源简介：

In medical, veterinary, and forensic entomology, the ease and affordability of image data acquisition have resulted in whole-image analysis becoming an invaluable approach for species identification. Krawtchouk moment invariants are a classical mathematical transformation that can extract local features from an image, thus allowing subtle species-specific biological variations to be accentuated for subsequent analyses. We extracted Krawtchouk moment invariant features from binarised wing images of 759 male fly specimens from the Calliphoridae, Sarcophagidae, and Muscidae families (13 species and a species variant). Subsequently, we trained the Generalized, Unbiased, Interaction Detection and Estimation (GUIDE) random forests classifier using linear discriminants derived from these features and inferred the species identity of specimens from the test samples. Five-fold cross validation results show a 98.56 ± 0.38% (standard error) mean identification accuracy at the family level, and a 91.04 ± 1.33% mean identification accuracy at the species level. The mean F1-score of 0.89 ± 0.02 reflects good balance of precision and recall properties of the model. The present study consolidates findings from previous small pilot studies of the usefulness of wing venation patterns for inferring species identities. Thus, the stage is set for the development of a mature data analytic ecosystem for routine computer image-based identification of fly species that are of medical, veterinary, and forensic importance. Methods The specimens used in this study came from three separate collections. Collection 1 consists of specimens collected in Malaysia. It includes three Calliphoridae species: Ch. megacephala, Ch. nigripes, Ch. rufifacies, and all the five species of Sarcophagidae. The specimens were collected from various geographical localities and habitats (e.g., primary forests, farms, mangrove swamps, beaches, and national parks) in Malaysia. Flies were collected with a handheld insect net by sweeping method and decomposed beef was used as bait. Collection 2 consists of specimens collected in the province of Alicante, Spain. It includes three Calliphoridae species: C. vicina, Ch. albiceps (normal and wing mutant variant), L. sericata, and a Muscidae species: Sy. nudiseta. For specimens in Collection 2, C. vicina and L. sericata specimens were captured using pork liver baits. Specimens from Ch. albiceps and Sy. nudiseta were obtained by growing larvae obtained from a human autopsy at the Institute of Legal Medicine of Alicante (IMLA, Spain). Collection 3 consists of specimens collected mostly from some islands of Indonesia. It includes three Calliphoridae species: Ch. bezziana (collected from Java, Sulawesi, Sumatra Sumba islands and Malaysia; 4 specimens from Africa, 1 from India), Ch. megacephala (collected from Java, Kalimantan, Lombok, Sumatra, Sulawesi, West Papua and West Timor islands) and Ch. rufifacies (collected from Sumatera and Sumba islands). Ch. bezziana samples were grown in the laboratory from larvae found at a myiasis-infected wound. The Ch. megacephala and Ch. rufifacies samples were captured using a Lucitrap Modification (LTM) or sticky trap with Bezzilure as a bait. We binarised all raw image data to focus on the wing venation patterns and remove unnecessary features such as background noise and wing membrane details. This was done using ImageJ version 1.53k was used to binarise the images. PixlrE (https://pixlr.com/e/) was used for manual denoising of the binarised images. Different configurations were applied to different sets of images to accentuate the wing venation patterns. Raw images that could not be properly binarised or contained broken venation patterns were removed. The images were centered and then oriented with the wing costa parallel to the horizontal axis. Subsequently, they were cropped into images of dimension 724 x 254 pixels and saved in PNG file format. We further resized the images to 256 x 90 pixels to avoid the machine learning model from learning unnecessary features for identification and to improve model training speed. The time to pre-process each image ranged from 3 to 8 minutes, with noisier images requiring more time to process.

创建时间：

2023-07-18

5,000+

优质数据集

54 个

任务类型

进入经典数据集