Data fusion for integrative species identification using deep learning
收藏DataONE2025-11-12 更新2025-11-22 收录
下载链接:
https://search.dataone.org/view/sha256:d65f92e99018b97c2d28027abb84f15ac946363035b788d13cdaf3314ef463e0
下载链接
链接失效反馈官方服务:
资源简介:
DNA analyses have revolutionized species identification and taxonomic work. Yet, persistent challenges arise from little differentiation among species and considerable variation within species, particularly among closely-related groups. While images are commonly used as an alternative modality for automated identification tasks, their usability is limited by the same concerns. An integrative strategy, fusing molecular and image data through machine learning, holds significant promise for fine-grained species identification. However, a systematic overview and rigorous statistical testing concerning molecular and image preprocessing and fusion techniques, including practical advice for biologists, are missing so far. We introduce a machine learning scheme that integrates both molecular and morphological data for species identification. Initially, we systematically assess and compare three different DNA arrangement and two encoding methods. Later, artificial neural networks are used to ext..., , , # Data from: Data fusion for integrative species identification using deep learning
[https://doi.org/10.5061/dryad.4qrfj6qjk](https://doi.org/10.5061/dryad.4qrfj6qjk)
## Description of the data and file structure
### Data
The data folder contains the records and alignment files for each of the four datasets used in this study (i.e., Asteraceae, Poaceae, Coccinellidae, Lycaenidae). The text file contains the following information about the records: 'record_id' as a uniquely assigned custom ID for the record; 'species_name' is the name of the species; 'taxonomy' is the taxonomic information linked to the species provided by NCBI; 'genbank_accession' is the GenBank accession provided by NCBI and is included for completeness; 'image_url' is the original URL of the record's image; 'image_rights_holder' is the rights holder of the image if provided by GBIF. Sequences in the respective fasta files can be linked to their records via their unique record ID (e.g., 'BOLD642' in column 'record_...,
创建时间:
2025-11-13



