five

Integrating deep learning derived morphological traits and molecular data for total-evidence phylogenetics: lessons from digitized collections

收藏
DataONE2024-12-18 更新2025-04-26 收录
下载链接:
https://search.dataone.org/view/sha256:173760dbe3ce4ef9e92b05d2e7a333be22573b9dbb53bbc5436406f648ea036d
下载链接
链接失效反馈
官方服务:
资源简介:
Deep learning has previously shown success in automatically generating morphological traits which carry a phylogenetic signal. In this paper, we explore combining molecular data with deep learning derived morphological traits from images of pinned insects to generate total-evidence phylogenies and we reveal challenges. Deep learning derived morphological traits, while informative, underperform when used in isolation compared to molecular analyses. However, they can improve molecular results in total evidence settings. We use a dataset of rove beetle images to compare the effect of different dataset splits and deep metric loss functions on morphological and total evidence results. We find a slight preference for the cladistic dataset split and contrastive loss function. Additionally, we explore the effect of varying the number of genes used in inference and find that different gene combinations provide the best results when used on their own vs in total evidence analysis. Despite the pro..., Images of specimens associated with this dataset can be found in the Rove-Tree-11 dataset (https://doi.org/10.17894/ucph.39619bba-4569-4415-9f25-d6a0ff64f0e3). Molecular data was gathered from Genbank and aligned using MAFFT 7. Original Genbank accession numbers are provided. Alignments were concatenated with FASconCAT-G. Partition scheme and model selection were obtained using PartitionFinder 2.1.1.  Trees were obtained via Maximum Parsimony using TNT. Example TNT scripts are provided. , , # Integrating Deep Learning Derived Morphological Traits and Molecular Data for Total-Evidence Phylogenetics: Lessons from Digitized Collections [https://doi.org/10.5061/dryad.9cnp5hqqq](https://doi.org/10.5061/dryad.9cnp5hqqq) This dataset includes all supplemental material for the paper 'Integrating Deep Learning Derived Morphological Traits and Molecular Data for Total-Evidence Phylogenetics: Lessons from Digitized Collections' ## Description of the data and file structure The following gives a description of the files included in this dataset: * GenBank accession numbers of all sequences.xlsx: This excel file contains all the genbank accession numbers for the molecular data used in this paper. Column A lists the species in question, the other columns represent the 7 different genes included. Note that not all genes were available for all species. This is discussed in the paper. * rove_stratified_split.csv: This csv file contains the dataset split for the 'Stratified' dataset co...
创建时间:
2024-12-19
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作