Infant nasal gene catalogue

NIAID Data Ecosystem2026-05-10 收录

下载链接：

https://figshare.com/articles/dataset/Infant_nasal_gene_catalogue/30227902

下载链接

链接失效反馈

官方服务：

资源简介：

Functional studies of how early-life interventions shape the airway microbiome remain scarce. Here, we performed metagenomic sequencing of 706 longitudinal nasal swabs from infants with and without cystic fibrosis (CF) to construct and characterize the largest, non-redundant gene atlas of the infant nasal microbiome. This study includes data of infants from two prospective birth cohort studies: the Swiss Cystic Fibrosis Infant Lung Development (SCILD) and the Basel Bern Infant Lung Development (BILD) cohort. Parents of study participants collected biweekly anterior nasal swabs, starting from the fifth week of life until completion of the first year of life after detailed instruction and demonstration by a study nurse. Swabs (FLOQSwabs® 516CS01; Copan, Brescia, Italy) were placed in medium (UTM-RT™ in Screw-Cap Tube; Copan, Brescia, Italy), on average within 3 days, then pipetted into micro-screw tubes (Sarstedt, Nürnbrecht, Germany) and stored in a −80 °C freezer until further processing. Metagenomics library preparation and sequencing was performed by Novogene. The genomic DNA was randomly sheared into short fragments. The obtained fragments were end repaired, A-tailed, and further ligated with Illumina adapters. The fragments with adapters were PCR amplified, size selected, and purified. The library was checked with Qubit and real-time PCR for quantification and bioanalyzer for size distribution detection. Quantified libraries were pooled and sequenced on the Illumina NovaSeq 6000 platform. Raw reads were quality controlled, and reads with low quality scores, adapter sequences, or more than 10% of unidentified nucleotides were removed. We obtained approximately 40,000,000 raw reads per sample (mean 5.8 G raw data per sample).Generation and annotation of microbial gene catalogHuman DNA contamination was removed by mapping MGX reads to the GRCh38.human human reference genome with BBmap (v38.84) (sourceforge.net/projects/bbmap/) in two rounds. The first round removed the majority of human reads (parameters: mode fast = true, minratio = 0.9; maxindel = 3, minhits = 2, kmer length = 14). The second run used default parameters to increase sensitivity for residual host contamination. De novo individual assembly was then performed as previously described. In brief, for every sample, reads were assembled into contigs with MegaHIT (v1.2.9, presets meta-sensitive) and full-length coding sequences were obtained with Prodigal (default, -p meta). Coding sequences were merged across samples and clustered at 95% sequence identity and 80% target coverage with MMseqs259 to generate a non-redundant gene atlas. CDS within the atlas were taxonomically annotated using three complementary strategies: (i) MMseqs2 (version 2fad714b525f1975b62c2d2b5aff28274ad57466) easy-taxonomy against the Genome Taxonomy Database (--lca-mode 4 (e-value < 0.00001); -- tax-lineage 1), (ii) as well as sequence alignments against the human airway reference genome database (blastn; e-value < 0.001, sequence identity >= 0.95, max_target_seqs 5) and (iii) blasting CDS against a fungal database (FunOMIC.T.v1) with diamond blastx (--query-cover 95 --id 99). For functional annotations we ran eggNOG-mapper (emapper-2.1.12).

关于早期生命干预如何塑造气道微生物组的功能研究仍较为匮乏。本研究对706份来自伴或不伴囊性纤维化（cystic fibrosis, CF）婴儿的纵向鼻拭子样本开展宏基因组测序（metagenomic sequencing, MGX），旨在构建并表征当前规模最大的婴儿鼻腔微生物组非冗余基因图谱。本研究数据来源于两项前瞻性出生队列研究：瑞士囊性纤维化婴儿肺发育（Swiss Cystic Fibrosis Infant Lung Development, SCILD）队列与巴塞尔-伯尔尼婴儿肺发育（Basel Bern Infant Lung Development, BILD）队列。研究参与者的父母在研究护士的详细指导与演示下，自婴儿出生后第5周起，直至其满1周岁前，每两周采集一次前鼻拭子。拭子采用FLOQSwabs® 516CS01（Copan，意大利布雷西亚），置于螺旋盖管装UTM-RT™培养基（Copan，意大利布雷西亚）中，平均在采样后3天内经移液转移至微量螺旋盖管（Sarstedt，德国纽伦堡），并置于-80℃冰箱保存以待后续处理。宏基因组文库构建与测序由Novogene完成。基因组DNA经随机剪切为短片段，随后进行末端修复、加A尾，并与Illumina接头连接。带接头的片段经PCR扩增、片段大小筛选与纯化后，通过Qubit与实时PCR进行文库定量，同时利用生物分析仪检测片段长度分布。定量后的文库经混合后，在Illumina NovaSeq 6000平台上完成测序。原始读段先进行质量控制，移除低质量读段、接头序列以及含10%以上未识别碱基的读段。每个样本平均获得约4000万条原始读段（原始数据量平均为5.8 Gb/样本）。微生物基因图谱的生成与注释首先通过两轮比对将宏基因组读段比对至GRCh38人类参考基因组，以去除人类DNA污染：第一轮移除绝大多数人类读段（参数：mode fast = true, minratio = 0.9; maxindel = 3, minhits = 2, kmer长度=14）；第二轮采用默认参数以提升对残留宿主污染的检出灵敏度。随后按照此前报道的方法进行从头个体组装：针对每个样本，使用MegaHIT（v1.2.9，预设参数meta-sensitive）将读段组装为重叠群（contigs），并通过Prodigal（默认参数，-p meta）获取全长编码序列（coding sequence, CDS）。将所有样本的编码序列合并，并使用MMseqs2以95%序列同一性与80%目标覆盖度进行聚类，从而生成非冗余基因图谱。本基因图谱中的编码序列采用三种互补策略进行分类学注释：（1）使用MMseqs2（版本2fad714b525f1975b62c2d2b5aff28274ad57466）easy-taxonomy工具比对至基因组分类数据库（Genome Taxonomy Database, GTDB）（参数：--lca-mode 4（e值<0.00001）；--tax-lineage 1）；（2）将序列比对至人类气道参考基因组数据库（blastn；e值<0.001，序列同一性≥0.95，max_target_seqs 5）；（3）使用diamond blastx将编码序列比对至真菌数据库FunOMIC.T.v1（参数：--query-cover 95 --id 99）。功能注释则通过eggNOG-mapper（emapper-2.1.12）完成。

创建时间：

2025-10-03

5,000+

优质数据集

54 个

任务类型

进入经典数据集