Fern Tree of Life (FTOL) input data

Name: Fern Tree of Life (FTOL) input data
Creator: figshare.com
Published: 2024-10-30 00:00:00
License: 暂无描述

figshare.com2024-10-30 更新2025-03-26 收录

下载链接：

https://figshare.com/articles/dataset/Fern_Tree_of_Life_FTOL_input_data/19474316/9

下载链接

链接失效反馈

官方服务：

资源简介：

The data included here are used in a pipeline that (mostly) automatically generates a maximally sampled fern phylogenetic tree based on plastid sequences in GenBank (https://github.com/fernphy/ftol). The first step is to download the latest release of GenBank data from the NCBI GenBank FTP site (https://ftp.ncbi.nlm.nih.gov/genbank/) and use it to create a local database of fern sequences. This is done with custom R scripts contained in https://github.com/fernphy/ftol, in particular setup_gb.R (https://github.com/fernphy/ftol/blob/main/R/setup_gb.R). Next, a set of reference FASTA files for 79 target loci (one per locus; ref_aln.tar.gz) is generated. These include 77 protein-coding genes based on a list of 83 genes (Wei et al. 2017) that was filtered to only genes that show no evidence of duplication, plus two spacer regions (trnL-trnF and rps4-trnS). Each FASTA file in ref_aln.tar.gz includes one representative (longest) sequence per avaialable fern genus. This is done with prep_ref_seqs_plan.R (https://github.com/fernphy/ftol/blob/main/prep_ref_seqs_plan.R). Sequences matching the target loci are then extracted from each accession in the local database using the FASTA files contained in ref_aln.tar.gz as references with the “Reference_Blast_Extract.py” script of superCRUNCH (Portik and Wiens 2020). The extracted sequences are aligned with MAFFT (Katoh et al. 2002), phylogenetic analysis is done using IQ-TREE (Nguyen et al. 2015) and divergence times estimated with treePL (Smith and O’Meara 2012). For additional methodological details, see: Nitta JH, Schuettpelz E, Ramírez-Barahona S, Iwasaki W. 2022. An open and continuously updated fern tree of life. Frontiers in Plant Sciences 13 https://doi.org/10.3389/fpls.2022.909768.

本数据集所包含的内容用于构建一个（主要）自动生成的、采样最大化的大叶蕨类系统发育树，该树基于GenBank（https://github.com/fernphy/ftol）中的叶绿体序列。首先，需从NCBI GenBank FTP站点（https://ftp.ncbi.nlm.nih.gov/genbank/）下载最新版本的GenBank数据，并利用之构建一个本地蕨类序列数据库。此过程通过包含在https://github.com/fernphy/ftol中的自定义R脚本完成，特别是setup_gb.R（https://github.com/fernphy/ftol/blob/main/R/setup_gb.R）脚本。接下来，生成了79个目标位点的参考FASTA文件集（每个位点一个；ref_aln.tar.gz）。这些文件包括基于83个基因列表（Wei等，2017）的77个蛋白质编码基因，该列表经过筛选，仅包含无重复证据的基因，以及两个间隔区域（trnL-trnF和rps4-trnS）。ref_aln.tar.gz中的每个FASTA文件包含每个可用的蕨类属的代表序列（最长序列）。此步骤通过prep_ref_seqs_plan.R（https://github.com/fernphy/ftol/blob/main/prep_ref_seqs_plan.R）脚本完成。然后，使用superCRUNCH（Portik和Wiens，2020）中的“Reference_Blast_Extract.py”脚本，从本地数据库的每个访问号中提取与目标位点匹配的序列。提取的序列使用MAFFT（Katoh等，2002）进行对齐，系统发育分析使用IQ-TREE（Nguyen等，2015）进行，并使用treePL（Smith和O’Meara，2012）估计分歧时间。关于额外的方法论细节，请参阅：Nitta JH，Schuettpelz E，Ramírez-Barahona S，Iwasaki W. 2022. 开放且持续更新的蕨类生命树。植物科学前沿 13 https://doi.org/10.3389/fpls.2022.909768。

提供机构：

figshare.com

5,000+

优质数据集

54 个

任务类型

进入经典数据集