five

Chlamydia trachomatis cgMLST schema and necessary resources for its implementation

收藏
Zenodo2026-03-21 更新2026-05-26 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.19120159
下载链接
链接失效反馈
官方服务:
资源简介:
This repository contains the novel cgMLST schema of Chlamydia trachomatis, as well as all the necessary resources for its implementation in a routine surveillance scenario or reseach frameworks. All data available in this repository were generated in the frame of the study "Advancing Chlamydia trachomatis genomic surveillance and research with a novel core-genome MLST (cgMLST) approach" by Lodhia et al. 2025. Specifically, in this repository you will find: The cgMLST schema (00_cgMLST_schema.zip) This folder contains the cgMLST schema created and prepared to be run with chewBBACA allele caller, as well as the respective training file. This schema is already populated with a large and global dataset of C. trachomatis (n = 1230). An adapted version of this schema is available in chewie-NS. This adapted version has an alternative loci nomenclature, which is distinct of the historical "CT" numbering that is also adopted in this Zenodo repository. Therefore, inside this folder, you can also find the file chewie-NS_loci_correspondence.xlsx, which provides the correspondence between the loci nomenclature used in the schema version available in this Zenodo repository and the one available in chewie-NS. The allele matrix (01_chewBBACA_1230_genomes.zip) This folder contains the allele matrix obtained with chewBBACA for the dataset of C. trachomatis that was used to populate the schema (hereinafter referred as the background dataset). The file Ct_alleles_missing_code_original.tsv corresponds to the original output of chewBBACA, being a useful resource to evaluate the reason why a given locus was not called in a certain (group of) sample(s). The Ct_alleles_missing_code_0.tsv corresponds to the same matrix but with missing data indicated by "0", thus representing the necessary input for clustering analysis (e.g. with ReporTree). The background to setup cgMLST (02_input_to_setup_cgMLST_1230_background.zip) This folder contains all the files necessary to setup a routine surveillance scenario or a research framework in which new C. trachomatis isolates are integrated in the diverse genetic background of the C. trachomatis dataset used in this study. Specifically, it contains: i) the file Ct_alleles_missing_code_0.tsv, which corresponds to the allele matrix of the background dataset (i.e. 1230 genomes that originated the schema) already prepared for clustering, i.e. with missing data indicated by "0";  ii) the file Ct_metadata.tsv, which corresponds to the compilation of the clinical and epidemiological information available for the background dataset, as well as genomically determined information about their lineage, ompA genotype, ST, among others. This information is also provided outside this folder in .xlsx format (Ct_metadata.xlsx) for an easier consultation of the data. iii) the files Ct_partitions_to_setup_HC_nomenclature.tsv and Ct_partitions_to_setup_GT_nomenclature.tsv, which comprise the clustering information obtained at all possible threshold levels for the background dataset using single-linkage hierarchical clustering (HC) or the GrapeTree MSTreeV2 (GT) algorithms. These files represent a useful resource for routine genomics-based surveillance by providing the necessary information to implement a nomenclature system with ReporTree, including a three-level hierarchical nomenclature code as used for genogroup definition in this study. Example of a ReporTree output (03_example_ReporTree_output_1230_genomes.zip) This folder contains the outputs obtained from a ReporTree run for the background dataset using single-linkage hierarchical clustering (HC) or the GrapeTree MSTreeV2 (GT) algorithms. Therefore, it not only provides an example of how to run ReporTree and what to expect from this tool but also presents comprehensive information about C. trachomatis dataset, integrating clinical, epidemiological and genomic data. Tree visualization (04_visualization.zip) This folder contains the hierarchical clustering tree and metadata files of the background dataset for interactive visualization and exploration of the data in microreact and in auspice. ompA_allele_vs_genotype.xlsx This file contains the correlation between the alleles of the cgMLST "CT_681_ompA" locus and traditional ompA genotypes. How can I use this cgMLST schema to integrate new isolates in the diverse C. trachomatis dataset? 1. Download the Zenodo repository and unzip the folders 2. Go to 00_cgMLST_schema/ and unzip C_trachomatis_cgMLST_schema.zip 3. Run chewBBACA allele caller on your assemblies using the downloaded schema Command line example: chewie AlleleCall -i list_your_assemblies.txt  --schema-directory C_trachomatis_cgMLST_schema/ --output-directory chewie 4. Replace missing data by "0" Command line example: chewie ExtractCgMLST -i chewie/results_alleles.tsv -o chewie --t 0 5. Merge the obtained clean allelic matrix of your assemblies to the global allelic matrix: 02_input_to_setup_cgMLST_1230_background/Ct_alleles_missing_code_0.tsv 6. Add clinical and/or epidemiological information for each of your assemblies to 02_input_to_setup_cgMLST_1230_background/Ct_metadata.tsv 7. Run ReporTree providing Ct_alleles_missing_code_0.tsv (-a) and Ct_metadata.tsv (-m) as input. ReporTree is a surveillance-oriented tool that has many functionalities, including for a smooth integration of genetic and clinical/epidemiological data. From these, we highlight:  - the possibility of indicating the threshold levels at which you intend to determine the genetic clusters with "--HC-threshold" or "-thr" arguments for HC and GT clustering, respectively - the possibility to indicate the metadata columns for which you wish to obtain summary reports with cluster characterization with the "--columns_summary_report" argument - the possibility to indicate two metadata columns for which you intend to also generate summary reports with the "--metadata2report" argument If you intend to use the nomenclature system provided in this Zenodo repository, you can indicate the file Ct_partitions_to_setup_HC_nomenclature.tsv or Ct_partitions_to_setup_GT_nomenclature.tsv as "--nomenclature-file", depending on whether you want a hierarchical clustering (HC) or a GrapeTree MSTreeV2 (GT) analysis (--analysis), respectively. Additionally, you can request a hierarchical nomenclature code to be attributed to each isolate reflecting the genogroups determined in this study (--nomenclature-code-levels). Note: for the usage of this schema we strongly advise you to only perform clustering with isolates with at least 95% loci called (--loci-called). Command line example: python reportree.py -a Ct_alleles_missing_code_0.tsv -m Ct_metadata.tsv -out ReporTree --loci-called 0.95 --analysis HC --HC-threshold single-475,single-75,single-25,single-10 --columns_summary_report "lineage,country,anatomical_site" --metadata2report lineage --nomenclature-file Ct_partitions_to_setup_HC_nomenclature.tsv --nomenclature-code-levels single-475,single-75,single-25   Citation If you use this schema or the provided datasets in your work, please cite: Zohra Lodhia, Verónica Mixão, Joana Isidro, Rita Ferreira, Dora Cordeiro, Cristina Correia, Inês João, João Paulo Gomes, Maria José Borrego and Vítor Borges (2025) Advancing Chlamydia trachomatis genomic surveillance and research with a novel core-genome MLST (cgMLST) approach. Research Square. doi: 10.21203/rs.3.rs-7743240/v1 This repository   Funding This work was funded by national funds through FCT - Foundation for Science and Technology, I.P., in the frame of Individual CEEC 2022.00851.CEECIND/CP1748/CT0001 (doi: 10.54499/2022.00851.CEECIND/CP1748/CT0001) and of the doctoral fellowship SFRH/BD/147446/2019 (doi: 10.54499/SFRH/BD/147446/2019), and by the European Union project “Sustainable use and integration of enhanced infrastructure into routine genome-based surveillance and outbreak investigation activities in Portugal” - GENEO [101113460] on behalf of the EU4H programme [EU4H-2022-DGA-MS-IBA-01-02].
提供机构:
Zenodo
创建时间:
2026-03-21
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作