five

Resolving the early divergence pattern of of teleost fish using genome-scale data

收藏
Mendeley Data2024-04-12 更新2024-06-27 收录
下载链接:
https://datadryad.org/stash/dataset/doi:10.5061/dryad.v9s4mw6rm
下载链接
链接失效反馈
官方服务:
资源简介:
Sequence Data Used Amino acid sequence data from three previous studies and nucleotide sequence from one study were analyzed (Tables 1 and S1). The data from Bian et al. (2016) were provided by the authors. Out of 418 genes for 12 species [coelacanth (Latimeria chalumnae) and eight ray-finned fish, including one non-teleost fish [gar (Leipdosteus oculatus)] and seven teleost fishes [three Osteoglossomorpha [arawana or Asian bonytongue (Scleopages formosus), butterflyfish (Pantodon buchholzi) and knifefish (Papyrocranus afer)], two Elopomorpha [European eel (Anguilla anguilla), tarpon (Megalops cyprinoides)], five Clupeocephala [zebrafish (Danio rerio), electric eel (Electrophorus electricus), medaka (Oryzias latipes), fugu (Takifugu rubripes), and stickleback (Gasterosteus aculeatus)]. Six genes whose number of shared amino acid sites smaller than 50 were excluded. Thus, a set of 412 genes from the 12 species was used for the analyses (Tables 1 and S2). Data from Chen et al. (2015), Hughes et al. (2018), and Faircloth et al. (2013) were downloaded from the Dryad Digital Repository. In the data from Chen et al. (2015), there were amino acid sequences of 14 ray-finned fish: 11 teleost fish, including one Elopomorpha [Japanese eel (Anguilla japonica)], one Osteoglossomorpha [silver arawana (Osteoglossum bicirrhosum)], nine Clupeocephala species [zebrafish (D. rerio), catfish (Ictalurus punctatus), tetra (Astyanax mexicanus), cod (Gadus morhua), tilapia (Oreochromis niloticus), platyfish (Xiphophorus maculatus), medaka (O. latipes), stickleback (G. aculeatus), fugu (T. rubripes)], and three non-teleost fish [gar (L. oculatus), sturgeon (Acipenser transmontanus), and bichir (Polypterus senegalus) (Table S2). The genes that included all 14 ray-finned fish species and the coelacanth (L. chalumnae) were extracted from the total gene set (4,682 genes) and those with less than 50 shared amino acid sites were excluded [Total set, 772 genes]. Within the Total set, genes included in the dataset in which teleost species formed a monophyletic cluster, the top-1000 and -500 slowly evolving gene sets (Chen et al, 2015) were extracted: Teleost set (542 genes), Slow1000 set (190 genes), and Slow500 set (96 genes). In the preliminary study, the sets of top-200 and -100 slowly evolving genes were created by choosing the genes with short total branch lengths estimated for the trees of 15 species. However, the results were essentially the same as those of the Slow1000 and Slow500 sets. Therefore, it was decided to use the Slow1000 and Slow500 sets. In the Hughes et al. (2018) data there were 1,105 individual genes. The individual genes contained 305 species in total: frog (Xenopus tropicalis), coelacanth (L. chalumnae), lungfish (Protopterus aethiopicus), 10 non-teleost ray-finned fishes [three Polypteriformes, four Acipenseriformes, four Holostei (one Amiiformes, three Lepisosteiformes)], and 292 teleost fishes [seven Elopomorpha, six Osteoglossomorpha, and 279 Clupeocephala species] (Tables S1, S3, and S4). Out of 1,105 genes, six genes that contained no Osteoglossomorpha sequences were excluded (1,099-gene set) (Tables S3 and S4). Because the focus of this study is to resolve the relationships of Elopomorpha, Osteoglossomorpha and Clupeocephala, nine Clupeocephala species [Atlantic herring (Clupea harengus), golden-line barbel (Sinocyclocheilus grahami), red-bellied piranha (Pygocentrus nattereri), northern pike (Esox lucius), grayling (Thymallus thymallus), silver eye (Polymixia japonica), blackbar soldierfish (Myripristis jacobus), yellowfin tuna (Thunnus albacares), and northern snakehead (Channa argus) that have low proportion of missing data and relatively low divergence were selected. Three Elopomorpha species (Gymnothorax reevesii, Conger cinereus, Kaupichthys hyoproroides), and one outgroup (Acipenser naccarii) which appeared in a small number of loci (≤ 171) were excluded (30 species in total and 25.5 ± 4.0 per locus, Table S2). From the 1,099-gene set, loci in which some species have unusually long branch from the common ancestral node of teleost fish (>3 substitutions per site) and whose number of sites was smaller than 50 were excluded (1,062 loci) (Tables 1 and S1) (Hughes data). Although nucleotide sequence data were available for the Bian data and Hughes data, this study analyzed amino acid sequence data, because synonymous nucleotide sites were likely to be subjected to saturation due to of the long time that separates Elopomorpha, Osteoglossomorpha, and Clupeocephala (more than 250 million years, e.g., Near et al. 2012; Hughes et. al. 2018). Multiple substitutions which are not correctly identified can generate spurious phylogenetic signals (e.g., Philippe et al. 2005a; Philippe et al. 2011). Using concatenated nucleotide sequence of the Bian data and Hughes data, branch lengths (the number of substitutions per site) were estimated at the third codon positions where most of substitutions are synonymous and at the first and second codon positions where most of substitutions are nonsynonymous separately, assuming the tree topologies corresponding to Tree 1. Indeed, synonymous substitutions were likely saturated, because the numbers of substitutions per site between Elopomorpha, Osteoglossomorpha, and Clupeocephala and the outgroup at the third codon positions were close to two for the two data (average 1.75, min. 1.24 and max. 2.47 for the Bian data and average 1.70, min. 1.19, and max. 2.61 for the Hughes data). In contrast nonsynonymous substitutions were not likely saturated because the numbers of substitutions per site at the first and second codon positions were much smaller than one (average 0.26 for the Bian data and 0.32 for the Hughes data). However, because there are more possible states in amino acid sequence (20 states) than nucleotide sequence at the first and second positions (16 states), the resolution power of amino acid sequence could be higher than that of nucleotide sequence. Therefore, amino acid sequence data were used in this study. In UCE data from Faircloth et al. (2013), there were four outgroups [bichir, lake sturgeon (Acipenser fluvescens), bowfin (Amia calva), and gar], two Elopomorpha [Megalops sp. and slender giant moray (Strophidon sathete) and two Osteoglossomorpha (silver arawana and butterflyfish) and 19 Clupeocephala species (Table S2). Of the 491 UCE loci in the downloaded data, 278 loci that contained at least one species in each of the four groups (outgroup, Elopomorpha, Osteoglossomorpha, and Clupeocephala) (Table S1) were used for gene-tree based approach.
创建时间:
2023-06-28
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作