Datasets for Léonard et al. Was the last bacterial common ancestor a monoderm after all?

Name: Datasets for Léonard et al. Was the last bacterial common ancestor a monoderm after all?
Creator: figshare
Published: 2022-01-04 11:47:28
License: 暂无描述

DataCite Commons2022-01-04 更新2024-07-28 收录

下载链接：

https://figshare.com/articles/dataset/Datasets_for_L_onard_et_al_Was_the_last_bacterial_common_ancestor_a_monoderm_after_all_/14932386

下载链接

链接失效反馈

官方服务：

资源简介：

Léonard et al. 2021: Archive content for v3 (= v2 public)Overview<pre><code>... 27 directories, 320 files </code></pre>AlignmentsDCW_Single_GenesFor each of the 17 genes of the dcw gene cluster, the alignement in <code>.ali</code> format and the alignment in <code>.phy</code> (PHYLIP) format are available. The difference between the two formats is that the <code>.phy</code> files are cleaned and that their sequence names have been shortened. Cleaning statistics can be found in the corresponding <code>a2p-stat</code> files, whereas <code>.idm</code> files remap the original names to the new shorter names. <code>.idm</code> files in the <code>modified_idm</code> directory can tag sequences with a letter indicating if it is encoded in the main cluster (M), in a sub-cluster (S) or by a singleton gene (A).MCL groups corresponding to the dcw gene cluster genes are as follows:MCLdcw110095 MurAMCLdcw110144 FtsZMCLdcw110164 DdlBMCLdcw110196 FtsIMCLdcw110216 MraWMCLdcw110253 MurEMCLdcw110276 FtsWMCLdcw110295 MraYMCLdcw110307 MurCMCLdcw110309 MurGMCLdcw110351 MurDMCLdcw110389 MurFMCLdcw110652 FtsAMCLdcw110718 MurBMCLdcw110780 MraZMCLdcw113075 FtsQMCLdcw113678 FtsLDCW_SupermatrixThis directory contains the supermatrix based on 15 genes of the dcw cluster. FtsQ and FtsL being difficult to identify with certainty, there were excluded from the supermatrix. The latter is provided in <code>.phy</code> (PHYLIP) format before and after cleaning (in the <code>cleaned</code> sub-directory). Cleaning statistics can be found in the <code>a2p-stat</code> file, whereas <code>.idm</code> files remap the original names to the new shorter names.OGs_117_main_treeThis directory contains the supermatrices based on the 117 single-copy orthologous groups of genes that are the most common in our selection of genomes:<code>misgen_14.ali</code> is the supermatrix for 101 species<code>misgen_14-fltrd.ali</code> is the supermatrix for 85 species<code>scafos.fasta</code> is the supermatrix <code>misgen_14-fltrd.ali</code> after further cleaning based on the results of Phylo-MCOAOM_single_genesFor each of the 16 genes related to the outer membrane, the alignement in <code>.fasta</code> format and the alignment in <code>.phy</code> (PHYLIP) format are available. Again, the difference between the two formats is that the <code>.phy</code> files are cleaned and that their sequence names have been shortened. Cleaning statistics can be found in the corresponding <code>a2p-stat</code> files, whereas <code>.idm</code> files remap the original names to the new shorter names.BayesTraits (BT)For each cell-wall character, both raw results and summarized results are available in <code>csv</code> format:membrane: <code>BayesTraits_results_monoderm_membrane.csv</code> / <code>BayesTraits_resume_monoderm_membrane.csv</code>peptidoglycan: <code>BayesTraits_results_monoderm_peptidoglycan.csv</code> / <code>BayesTraits_resume_monoderm_peptidoglycan.csv</code>Moreover, raw results of our attempt to force the LBCA to be a diderm are also provided in <code>BayesTraits_results_monoderm_membrane_weighted_rates.csv</code>.Outer_membrane (OM)The detailed HMM search results used to create Figure 3b are in <code>OM_genes_presence-hmms.csv</code>. Raw files can be found in the directories <code>profiles</code>, <code>hmmer</code> and <code>ompapa</code>. The <code>.fasta</code> files of the latter directory are the unaligned versions of those of the <code>Alignments/OM_single_genes</code> directory (see above). The 4 <code>.pdf</code> files in the <code>synteny_output</code> directory are the figures produced by our tool for visualising the synteny of OM-related genes in our selection of 85 bacteria:<code>LptABC_85_sorted.pdf</code><code>LptFG_a_85_sorted.pdf</code><code>LptFG_b_85_sorted.pdf</code><code>Tol_Pal_system_85_sorted.pdf</code>ProCARsThe <code>raph-cluster_all-ftsQL.xlsx</code> file is a summary of the input given to ProCARs and its output, which was then used to produce Figure 3a. The <code>misgen_14-fltrd-CATGTRG-1000-1-5000-CF_root-mono_-nums.png</code> is our reference tree. Numbered nodes correspond to column 1 of the <code>.xlsx</code> file. The <code>synteny_85_dcw.pdf</code> file is the figure produced by our tool for visualising the synteny of the dcw gene cluster in our selection of 85 bacteria.ScriptsThe <code>LBCA_pipeline.md</code> file contains the command lines used to launch the different scripts and files stored in the sub-directories (see details below). The <code>R_session_info.md</code> file contains the result of the <code>sessionInfo()</code> command line in R for the main laptop used for light computations and the main HPC system used for heavy computations.bayestraitsThe five <code>.sh</code> are bash scripts used to launch BayesTraits using the main tree with the Terrabacteria rooting. There is one bash script for each of the five models used with BayesTraits. The <code>setup-bayestraits.pl</code> perl script is used to convert a tree file from the <code>.tre</code> (Newick) format to the <code>.nex</code> (NEXUS) format.procarsThis directory contains a number of data files and bash/perl scripts:<code>bacteria.cls</code> determines the color used for each phylum<code>block_mcl.txt</code> maps the MCL id to the block id used by ProCARs<code>mcl_protname.txt</code> maps the MCL id to the protein name<code>setup-procars.pl</code> prepares the main tree to be used by ProCARs<code>procars-filter.pl</code> and <code>procars_help.pl</code> help the user to create the input file for ProCARs with gene position data<code>procars_jobs.sh</code> launches ProCARs for every node of the tree<code>procars-postpro_v4.pl</code> uses the output of ProCARs to create a <code>.xlsx</code> file summarizing the input and the output results in human friendly formsyntenyThe two <code>.R</code> scripts are the synteny program and its configuration file. The <code>synteny_GUI.R</code> produced the <code>.pdf</code> files in the <code>Outer_membrane</code> directory.tqmdThe <code>.R</code> and the <code>.pl</code> files are the early version of ToRQuEMaDA used in this study and which was split in several independent scripts at that time. Briefly, <code>tabulate_kmer.pl</code> converts the <code>compseq</code> output into an easier-to-handle format for the clustering script, <code>tqmd_v1.R</code>. <code>stat_names.pl</code> prepares the result file with the quality score of each proteome for use by the <code>best_choice_v2.pl</code> script, which will select the best representative for each group produced by the clustering script.TreesThe directories in <code>Trees</code> follow the same structure as in <code>Alignments</code>. The <code>DCW_17_SG.pdf</code> and <code>LBCA_OM_16_SG.pdf</code> files are the concatenation of the different single-gene trees for the dcw gene cluster and for the OM-related genes, respectively. The raw <code>.tre</code> files are available in the corresponding directories (<code>DCW_Single_Genes</code> and <code>OM_Single_Genes</code>). For the dcw genes, trees were computed both under the PROTGAMMALGF and C60 models, whereas the OM gene trees were only inferred using PROTGAMMALGF.The other files are also in <code>.tre</code> (Newick) format and correspond to supermatrices:<code>misgen_14-CATG-500-45_root.tre</code> is the preliminary tree using 101 species<code>misgen_14-fltrd-CATGTRG-1000-1-5000-all_root.tre</code> is the tree using 85 species and 6 MCMC chains<code>misgen_14-fltrd-CATGTRG-1000-1-5000-CF_root.tre</code> is the tree using 85 species and the 2 best MCMC chains<code>scafos_supermatrix_CATG-AB-1000-025.tre</code> is the tree built from the dcw genes only

提供机构：

figshare

创建时间：

2021-11-30