five

Datasets_S1_to_S15_PhyloToL

收藏
DataCite Commons2025-01-17 更新2024-08-19 收录
下载链接:
https://figshare.com/articles/dataset/Datasets_S1_to_S15_PhyloToL/26540599/1
下载链接
链接失效反馈
官方服务:
资源简介:
<b>Datasets S1 to S15 included within the PhyloToL analysis. (see paper for more details, including full legends)</b><b>Dataset S1: </b>A record of every taxon and the corresponding sequence data used in the study.<b>Dataset S2: </b>A summary of taxon code prefixes corresponding to “major” (first two characters) and “minor” (first 5 characters) clades, along with the number of species (out of 1000 total) in the study falling in each minor clade.<b>Dataset S3: </b>A summary of the number of species included in the study per “major” clade, and the number of whole genome assemblies vs. whole transcriptome assemblies used available for major clade.<b>Dataset S4: </b>The file that we input to the ‘contamination loop’ of PhyloToL part two that defines rules for removing putative contaminant sequences based on sister relationships.<b>Dataset S5: </b>The file that we input to the ‘contamination loop’ of PhyloToL part two that defines rules for removing putative contaminant sequences based on ‘subsister’ relationships, where sequence A’s subsister is defined as the sister of A’s parent node.<b>Dataset S6: </b>The rules for clade-based contamination removal of ciliate clades, primarily to mitigate contamination by parabasalids.<b>Dataset S7: </b>The rules for general clade-based contamination removal.<b>Dataset S8: </b>A rules file for a alternative round of clade-grabbing designed specifically to account for clades of photosynthetic taxa containing species not otherwise expected to appear monophyletically<b>Dataset S9: </b>A description of all “utility” scripts supplied on the GitHub (https://github.com/Katzlab/PhyloToL-6).<b>Dataset S10:</b> Descriptive statistics of the OGs in the Hook Database, used as a reference for OG assignment in PhyloToL 6 part 1.<b>Dataset S11: </b>A summary of the GO terms identified for each OG using EggNOG. See methods.<br><b>Dataset S12 : </b>A summary of the performance of a set of exemplar runs of PhyloToL part 1. See results.<br><b>Dataset S13: </b>A description of the taxa containing each of the 500 OGs used in this study at each stage of curation.<br><b>Dataset S14: </b>A description of the ‘missing data’ at each stage in the contamination removal process for each taxon.<b>Dataset S15: </b>A summary of all of the taxa included in the Hook Database, as seeded by data from OrthoMCL version 6.13.

<b>PhyloToL分析包含的数据集S1至S15。(详见论文,含完整图例)</b><b>数据集S1:</b>本研究使用的所有分类单元及其对应序列数据的记录。<b>数据集S2:</b>对应“大类”(前两个字符)与“小类”(前五个字符)演化支的分类单元代码前缀汇总,以及本研究中每个小类所包含的物种数(总物种数共1000种)。<b>数据集S3:</b>每个“大类”所包含的物种数汇总,以及对应大类可用的全基因组组装与全转录组组装的数量。<b>数据集S4:</b>输入至PhyloToL第二部分“污染循环”的文件,该文件定义了基于姊妹关系移除推定污染序列的规则。<b>数据集S5:</b>输入至PhyloToL第二部分“污染循环”的文件,该文件定义了基于“亚姊妹”关系移除推定污染序列的规则,其中序列A的亚姊妹被定义为A的父节点的姊妹节点。<b>数据集S6:</b>针对纤毛虫演化支的基于演化支的污染移除规则,主要用于减轻副基虫类带来的污染。<b>数据集S7:</b>通用的基于演化支的污染移除规则。<b>数据集S8:</b>针对替代轮次演化支抓取的规则文件,专门用于处理包含原本不被认为是单系群物种的光合类群演化支。<b>数据集S9:</b>GitHub仓库(https://github.com/Katzlab/PhyloToL-6)中提供的所有“实用工具”脚本的说明。<b>数据集S10:</b>Hook数据库中直系同源基因簇(Orthologous Groups,简称OGs)的描述性统计数据,用作PhyloToL 6第一部分中OG注释的参考。<b>数据集S11:</b>使用EggNOG为每个OG鉴定得到的基因本体(Gene Ontology,简称GO)术语汇总。详见方法部分。<br><b>数据集S12:</b>一组PhyloToL第一部分典型运行的性能汇总。详见结果部分。<br><b>数据集S13:</b>本研究使用的500个OGs在每个整理阶段所对应的分类单元说明。<b>数据集S14:</b>每个分类单元在污染移除过程各阶段的“缺失数据”说明。<b>数据集S15:</b>Hook数据库中包含的所有分类单元的汇总,该数据库由OrthoMCL 6.13版本的数据初始化而来。
提供机构:
figshare
创建时间:
2024-08-13
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作