Resolving ambiguity of concatenation in multi-locus sequence data for the construction of phylogenetic supermatrices
收藏DataONE2020-06-24 更新2025-07-19 收录
下载链接:
https://search.dataone.org/view/sha256:9bf0ece855c3942c763d5d55de10ace43a485a2134c7fa3a2a54fd11a8f370c7
下载链接
链接失效反馈官方服务:
资源简介:
The construction of supermatrices from mining of DNA metadata is problematic due to incomplete species identification and incongruence of gene trees that hamper sequence concatenation based on Linnaean binomials. We applied methods from graph theory to minimize ambiguity of concatenation globally over a large data set. An initial step establishes sequence clusters for each locus that broadly correspond to Linnaean species. These clusters frequently are not consistent with binomials and specimen identifiers, which greatly complicates the concatenation of clusters across multiple loci. A multipartite heuristic algorithm is used to match clusters across loci and to generate a global set of concatenates that minimizes conflict of taxonomic names. The procedure was applied to all available data on GenBank for the Coleoptera (beetles) including >10500 taxon labels for >23500 sequences of four loci. The BlastClust algorithm was used in the initial clustering step, resulting in 11241 clus...
创建时间:
2025-06-25



