Genome Sizes of Bacterial Species Detected in Cell-Free DNA of Patients with Acute Leukemia and Sepsis, Including Those Undergoing Bone Marrow Transplantation
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/13356510
下载链接
链接失效反馈官方服务:
资源简介:
Next Generation Sequencing (NGS) analysis of Cell-Free DNA provides valuable insights into a spectrum of pathogenic species (particularly bacterial) in blood. Patients with Sepsis often face problems like delays in treatment regimens (combination or cocktail of antibiotics) due to the long turnaround time (TAT) of classical and standard blood culture procedures. NGS gives results with lower TAT along with high-depth coverage. The use of NGS may be a possible solution to deciding treatment regimens for patients without losing precious time and more accurately possibly saving lives.
Our curated dataset is of bacterial species or strains detected along with their genome size in 107 AML patients diagnosed with Sepsis clinically. Cell-free DNA profiles of patients were built and sequencing was done in Illumina (NovaSeq and NextSeq). Bioinformatic analysis was performed using two classification algorithms namely kraken2 and kaiju. For kraken2 based classification reference bacterial index developed by Carlo Ferravante et al (Zenodo 2020) (link: https://zenodo.org/records/4055180) was used, while for kaiju-based classification reference database named "nr_euk" dated "2023-05-10" (link: https://bioinformatics-centre.github.io/kaiju/downloads.html) was used.
Genome size annotation is important in metagenomics since for the use of depth of coverage (abundance), genome size is required. In metagenomic classification algorithms like kraken/kraken2 and kaiju output computes reads assigned only and not abundance. In kaiju, the problem is more complicated since the reference database does not have a fasta file but only an index file from which alignment is done.
To address the above challenges to compute "depth of coverage" or simply abundance, we build a Genome size annotator tool (https://github.com/patkarlab/Genome-Size-Annotation) which provides genome size for each species detected given its taxid is available. In this tool, the NCBI Datasets tool, NCBI Genome API check tool, and Data Mining from AI search engines like perplexity.ai are used.
We have curated two datasets
Kraken2 dataset named "FINAL METAGENOMIC DATA MASTERSHEET - kraken_genome_annotation"Kaiju dataset named "FINAL METAGENOMIC DATA MASTERSHEET - kaiju_genome_annotation"
*Please note that for kraken2 curated dataset, we used data mining from the AI search engine perplexity.ai while for kaiju we did not use perplexity, ai, and any species whose genome size was not found was labeled "NA"
创建时间:
2024-08-24



