Datasets for Lupo et al. (2024) Identification and Characterization of Archaeal Pseudomurein Biosynthesis Genes through Pangenomics
收藏DataCite Commons2024-11-02 更新2024-07-29 收录
下载链接:
https://figshare.com/articles/dataset/Datasets_for_Lupo_et_al_2022_Origin_and_Evolution_of_Pseudomurein_Biosynthetic_Gene_Clusters/21641612
下载链接
链接失效反馈官方服务:
资源简介:
<pre><code>Lupo et al. 2022 Pseudomurein: Archive content for v2 (Nov 2024)</code></pre><b>Overview</b><pre><code>... 59 directories, 777 files</code></pre>README.md: this file.<code>command-line.sh</code>: examples of bash commands to use or generate the files stored in this archive.10-archaeal-proteomesThis directory contains an archive (<code>.tar.gz</code>) consisting of FASTA (<code>.faa</code>) files of the 10 organisms used by OrthoFinder.archaeal-genomesThis directory contains an archive (<code>.tar.gz</code>) consisting of 819 FASTA (<code>.faa</code>) files corresponding to the archaeal database.bacterial-genomesThis directory contains an archive (<code>.tar.gz</code>) consisting of 598 FASTA (<code>.faa</code>) files corresponding to the bacterial database.configThis directory contains two configuration files (<code>.yaml</code>) used by the <code>classify-ali.pl</code> perl script from Bio::MUST modules:<code>classifier.yaml</code> is the configuration file used to filter the 6,321 orthologous groups (OGs).<code>five_org_classifier.yaml</code> is the configuration file used to filter the OGs after the round of Forty-Two TBLASTN.BLASTPList of configuration files to run Forty-Two BLASTP (FILTERING OF CANDIDATE PROTEINS; see figure S1 in the main manuscript).TBLASTNList of configuration files to run Forty-Two TBLASTN (FILTERING OF CANDIDATE PROTEINS; see figure S1 in the main manuscript).NCBI_CCDList of HMM (<code>.hmm</code>) profile files downloaded from NCBI CDD database and corresponding HMM search (<code>.hmms</code>) files.ompapa-resultsRaw Ompa-Pa results associated with NCBI CDD HMM profiles.ompapaThis directory contains a list of the OGs that have passed the taxonomic filter (see <code>config</code> sections).alignmentsList of alignments in FASTA format of the OGs included in the <code>retained_OGs.lis</code> file (see <code>ompapa</code> section).bacterial_dbResults of ompapa analyses against the bacterial database. This directory contains two sub-directories:<code>hmms</code> that contains a list of HMM search (<code>.hmms</code>) files.<code>ompapa-results</code> that contains raw ompapa results.hmm_profilesList of HMM (<code>.hmm</code>) profiles of the OGs included in the <code>retained_OGs.lis</code> file (see <code>ompapa</code> section).prokaryotic_dbResults of ompapa analyses against the prokaryotic database. This directory contains two sub-directories:<code>hmms</code> that contains a list of HMM search (<code>.hmms</code>) files.<code>ompapa-results</code> that contains raw ompapa results.orthologous-groupsThis directory contains an archive (<code>.tar.gz</code>) consisting of 6,321 FASTA (<code>.fasta</code>) files corresponding to the OGs generated by OrthoFinder.predictionsThis directory contains three sub-directories:<code>interproscan</code> that contains raw InterProScan results.<code>signalp5</code> that contains raw SignalP5 results.<code>tmhmm</code> that contains raw TMHMM results.For the consolidated results, see Table S1 in the main manuscript.prokaryotic-genomesThis directory contains a list of assembly accessions of the 80,490 organisms from the prokaryotic database.regulonThis directory contains all the data needed to reproduce the 'regulon pipeline' (see supplementary data from the main manuscript). All the command lines are included in the <code>command_line_regulon.sh</code> file.scriptsThis directory contains various perl scripts used to generate some of the files stored in this archive (see <code>command-line.sh</code> file).taxdumpMirror of the NCBI Taxonomy used in this study (downloaded on 4th of May 2020).treesThis directory contains three sub-directories:<code>atp-grasp</code><code>mray-family</code><code>mur-family</code>All the sub-directories contains tree (<code>.tre</code>) files presented in the main manuscript and corresponding raw IQ-TREE (<code>.ckp.gz</code>) results. They also contain various files needed to reproduce the phylogenetic analyses (see <code>command-line.sh</code> file).<br>
Lupo等人2022年发布的假肽聚糖(Pseudomurein)数据集存档:版本2(2024年11月更新)
<b>概述</b>
本存档包含59个目录,总计777个文件。
README.md:本文档。
command-line.sh:用于调用或生成本存档内各类文件的Bash命令示例。
10-archaeal-proteomes:该目录包含一个.tar.gz格式存档,内含OrthoFinder所用10种生物的FASTA(.faa)格式蛋白序列文件。
archaeal-genomes:该目录包含一个.tar.gz格式存档,内含对应古菌数据库的819个FASTA(.faa)格式蛋白序列文件。
bacterial-genomes:该目录包含一个.tar.gz格式存档,内含对应细菌数据库的598个FASTA(.faa)格式蛋白序列文件。
config:该目录包含两个供Bio::MUST模块中classify-ali.pl Perl脚本使用的.yaml格式配置文件:
classifier.yaml:用于过滤6321个直系同源簇(Orthologous Groups, OGs)的配置文件。
five_org_classifier.yaml:用于在Forty-Two TBLASTN一轮分析后过滤直系同源簇的配置文件。
BLASTP:用于运行Forty-Two BLASTP的配置文件列表,用于候选蛋白过滤(参见主手稿中的图S1)。
TBLASTN:用于运行Forty-Two TBLASTN的配置文件列表,用于候选蛋白过滤(参见主手稿中的图S1)。
NCBI_CCD:从NCBI保守结构域数据库(Conserved Domain Database, CDD)下载的HMM(.hmm)特征文件列表,以及对应的HMM搜索结果(.hmms)文件。
ompapa-results:与NCBI CDD的HMM特征文件相关的原始Ompa-Pa分析结果。
ompapa:该目录包含通过分类过滤的直系同源簇列表(参见"config"章节)。
alignments:内含retained_OGs.lis文件中所列直系同源簇的FASTA格式多序列比对文件(参见"ompapa"章节)。
bacterial_db:针对细菌数据库的ompapa分析结果。该目录包含两个子目录:
hmms:内含HMM搜索结果(.hmms)文件列表。
ompapa-results:内含原始ompapa分析结果。
hmm_profiles:内含retained_OGs.lis文件中所列直系同源簇的HMM(.hmm)特征文件列表(参见"ompapa"章节)。
prokaryotic_db:针对原核生物数据库的ompapa分析结果。该目录包含两个子目录:
hmms:内含HMM搜索结果(.hmms)文件列表。
ompapa-results:内含原始ompapa分析结果。
orthologous-groups:该目录包含一个.tar.gz格式存档,内含OrthoFinder生成的6321个FASTA(.fasta)格式的直系同源簇文件。
predictions:该目录包含三个子目录:
interproscan:内含原始InterProScan分析结果。
signalp5:内含原始SignalP5分析结果。
tmhmm:内含原始TMHMM分析结果。
综合分析结果参见主手稿中的表S1。
prokaryotic-genomes:该目录包含原核生物数据库中80490种生物的基因组组装登录号列表。
regulon:该目录包含复现“调控子分析流程”所需的全部数据(参见主手稿的补充数据)。所有命令行均包含在command_line_regulon.sh文件中。
scripts:该目录包含用于生成本存档内部分文件的各类Perl脚本(参见command-line.sh文件)。
taxdump:本研究所用的NCBI分类学数据库镜像(2020年5月4日下载)。
trees:该目录包含三个子目录:
atp-grasp
mray-family
mur-family
所有子目录均内含主手稿中展示的系统发育树(.tre)文件,以及对应的IQ-TREE原始结果文件(.ckp.gz),同时包含复现系统发育分析所需的各类辅助文件(参见command-line.sh文件)。
提供机构:
figshare
创建时间:
2022-11-30



