five

Infant nasal gene catalogue

收藏
Figshare2025-10-03 更新2026-04-08 收录
下载链接:
https://figshare.com/articles/dataset/Infant_nasal_gene_catalogue/30227902
下载链接
链接失效反馈
官方服务:
资源简介:
Functional studies of how early-life interventions shape the airway microbiome remain scarce. Here, we performed metagenomic sequencing of 706 longitudinal nasal swabs from infants with and without cystic fibrosis (CF) to construct and characterize the largest, non-redundant gene atlas of the infant nasal microbiome. This study includes data of infants from two prospective birth cohort studies: the Swiss Cystic Fibrosis Infant Lung Development (SCILD) and the Basel Bern Infant Lung Development (BILD) cohort. Parents of study participants collected biweekly anterior nasal swabs, starting from the fifth week of life until completion of the first year of life after detailed instruction and demonstration by a study nurse. Swabs (FLOQSwabs® 516CS01; Copan, Brescia, Italy) were placed in medium (UTM-RT™ in Screw-Cap Tube; Copan, Brescia, Italy), on average within 3 days, then pipetted into micro-screw tubes (Sarstedt, Nürnbrecht, Germany) and stored in a −80 °C freezer until further processing. Metagenomics library preparation and sequencing was performed by Novogene. The genomic DNA was randomly sheared into short fragments. The obtained fragments were end repaired, A-tailed, and further ligated with Illumina adapters. The fragments with adapters were PCR amplified, size selected, and purified. The library was checked with Qubit and real-time PCR for quantification and bioanalyzer for size distribution detection. Quantified libraries were pooled and sequenced on the Illumina NovaSeq 6000 platform. Raw reads were quality controlled, and reads with low quality scores, adapter sequences, or more than 10% of unidentified nucleotides were removed. We obtained approximately 40,000,000 raw reads per sample (mean 5.8 G raw data per sample).Generation and annotation of microbial gene catalogHuman DNA contamination was removed by mapping MGX reads to the GRCh38.human human reference genome with BBmap (v38.84) (sourceforge.net/projects/bbmap/) in two rounds. The first round removed the majority of human reads (parameters: mode fast = true, minratio = 0.9; maxindel = 3, minhits = 2, kmer length = 14). The second run used default parameters to increase sensitivity for residual host contamination. De novo individual assembly was then performed as previously described. In brief, for every sample, reads were assembled into contigs with MegaHIT (v1.2.9, presets meta-sensitive) and full-length coding sequences were obtained with Prodigal (default, -p meta). Coding sequences were merged across samples and clustered at 95% sequence identity and 80% target coverage with MMseqs259 to generate a non-redundant gene atlas. CDS within the atlas were taxonomically annotated using three complementary strategies: (i) MMseqs2 (version 2fad714b525f1975b62c2d2b5aff28274ad57466) easy-taxonomy against the Genome Taxonomy Database (--lca-mode 4 (e-value < 0.00001); -- tax-lineage 1), (ii) as well as sequence alignments against the human airway reference genome database (blastn; e-value < 0.001, sequence identity >= 0.95, max_target_seqs 5) and (iii) blasting CDS against a fungal database (FunOMIC.T.v1) with diamond blastx (--query-cover 95 --id 99). For functional annotations we ran eggNOG-mapper (emapper-2.1.12).
提供机构:
Steinberg, Ruth; Pust, Marie-Madlen
创建时间:
2025-10-03
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作