Fern Tree of Life (FTOL) input data
收藏DataCite Commons2022-06-23 更新2024-07-29 收录
下载链接:
https://figshare.com/articles/dataset/Fern_Tree_of_Life_FTOL_input_data/19474316/1
下载链接
链接失效反馈官方服务:
资源简介:
The data included here are used in a pipeline that (mostly) automatically generates a maximally sampled fern phylogenetic tree (fern tree of life; "FTOL") based on plastid sequences in GenBank (https://github.com/fernphy/ftol).<br><br>The first step is to generate a set of reference FASTA files for 79 target loci (one per locus; ref_aln.tar.gz). These include 77 protein-coding genes based on a list of 83 genes (Wei et al. 2017) that was filtered to only genes that show no evidence of duplication (target_coding_genes.txt), plus two spacer regions (trnL-trnF and rps4-trnS). Each FASTA file in ref_aln.tar.gz includes one representative (longest) sequence per avaialable fern genus. This is done with custom R scripts contained in https://github.com/fernphy/ftol, in particular prep_ref_seqs_plan.R (https://github.com/fernphy/ftol/blob/main/prep_ref_seqs_plan.R).<br><br>Next, all available fern accessions for seven target “Sanger loci” (plastid regions typically sequenced using Sanger technology) and all available fern plastomes (accessions >7000 bp) are downloaded from GenBank. Non-fern accessions listed in plastome_outgroups.csv are downloaded as well. Sequences matching the target loci are then extracted from each accesion using the FASTA files contained in ref_aln.tar.gz as references with the “Reference_Blast_Extract.py” script of superCRUNCH (Portik and Wiens 2020). Any accessions matching those listed in accs_exclude.csv are excluded as putative rogues (i.e., misidentifications or contaminations).<br><br>The extracted sequences are aligned with MAFFT (Katoh et al. 2002), phylogenetic analysis is done using IQ-TREE (Nguyen et al. 2015) and divergence times estimated with treePL (Smith and O’Meara 2012). During molecular dating, equisetum_subgenera.csv is used to specify some clades within Equisetum whose ages are constrained by fossils, and ppgi_taxonomy_mod.csv is used to map higher-level clade names (e.g., family, order, etc.) to species (tips of the phylogeny).<br><br>For additional methodological details and references, see the README file included in this dataset and this paper:<br><br>Nitta JH, Schuettpelz E, Ramírez-Barahona S, Iwasaki W. 2022. An open and continuously updated fern tree of life (FTOL).<br>
提供机构:
figshare
创建时间:
2022-03-31



